Re: [Xen-devel] [PATCH] xen: xen-pciback: Reset MSI-X state when exposing a device

2018-12-12 Thread Jan Beulich
>>> On 13.12.18 at 04:46,  wrote:
> On Wed, Dec 12, 2018 at 08:21:39AM -0700, Jan Beulich wrote:
> On 12.12.18 at 16:18,  wrote:
>>> On Wed, Dec 12, 2018 at 01:51:01AM -0700, Jan Beulich wrote:
>>> On 12.12.18 at 08:06,  wrote:
> On Wed, Dec 05, 2018 at 09:01:33AM -0500, Boris Ostrovsky wrote:
>>On 12/5/18 4:32 AM, Roger Pau Monné wrote:
>>> On Wed, Dec 05, 2018 at 10:19:17AM +0800, Chao Gao wrote:
 I find some pass-thru devices don't work any more across guest reboot.
 Assigning it to another guest also meets the same issue. And the only
 way to make it work again is un-binding and binding it to pciback.
 Someone reported this issue one year ago [1]. More detail also can be
 found in [2].

 The root-cause is Xen's internal MSI-X state isn't reset properly
 during reboot or re-assignment. In the above case, Xen set maskall bit
 to mask all MSI interrupts after it detected a potential security
 issue. Even after device reset, Xen didn't reset its internal maskall
 bit. As a result, maskall bit would be set again in next write to
 MSI-X message control register.

 Given that PHYSDEVOPS_prepare_msix() also triggers Xen resetting MSI-X
 internal state of a device, we employ it to fix this issue rather than
 introducing another dedicated sub-hypercall.

 Note that PHYSDEVOPS_release_msix() will fail if the mapping between
 the device's msix and pirq has been created. This limitation prevents
 us calling this function when detaching a device from a guest during
 guest shutdown. Thus it is called right before calling
 PHYSDEVOPS_prepare_msix().
>>> s/PHYSDEVOPS/PHYSDEVOP/ (no final S). And then I would also drop the
>>> () at the end of the hypercall name since it's not a function.
>>>
>>> I'm also wondering why the release can't be done when the device is
>>> detached from the guest (or the guest has been shut down). This makes
>>> me worry about the raciness of the attach/detach procedure: if there's
>>> a state where pciback assumes the device has been detached from the
>>> guest, but there are still pirqs bound, an attempt to attach to
>>> another guest in such state will fail.
>>
>>I wonder whether this additional reset functionality could be done out
>>of xen_pcibk_xenbus_remove(). We first do a (best effort) device reset
>>and then do the extra things that are not properly done there.
> 
> No. It cannot be done in xen_pcibk_xenbus_remove() without modifying
> the handler of PHYSDEVOP_release_msix. To do a successful Xen internal
> MSI-X state reset, PHYSDEVOP_{release, prepare}_msix should be finished
> without error. But ATM, xen expects that no msi is bound to pirq when
> doing PHYSDEVOP_release_msix. Otherwise it fails with error code -EBUSY.
> However, the expectation isn't guaranteed in xen_pcibk_xenbus_remove().
> In some cases, if qemu fails to unmap MSIs, MSIs are unmapped by Xen
> at last minute, which happens after device reset in 
> xen_pcibk_xenbus_remove().

But that may need taking care of: I don't think it is a good idea to have
anything left from the prior owning domain when the device gets reset.
I.e. left over IRQ bindings should perhaps be forcibly cleared before
invoking the reset;
>>> 
>>> Agree. How about pciback to track the established IRQ bindings? Then
>>> pciback can clear irq binding before invoking the reset.
>>
>>How would pciback even know of those mappings, when it's qemu
>>who establishes (and manages) them?
> 
> I meant to expose some interfaces from pciback. And pciback serves
> as the proxy of IRQ (un)binding APIs.

If at all possible we should avoid having to change more parties (qemu,
libxc, kernel, hypervisor) than really necessary. Remember that such
a bug fix may want backporting, and making sure affected people have
all relevant components updated is increasingly difficult with their
number growing.

in fact I'd expect this to happen in the course of
domain destruction, and I'd expect the device reset to come after the
domain was cleaned up. Perhaps simply an ordering issue in the tool
stack?
>>> 
>>> I don't think reversing the sequences of device reset and domain
>>> destruction would be simple. Furthermore, during device hot-unplug,
>>> device reset is done when the owner is alive. So if we use domain
>>> destruction to enforce all irq binding cleared, in theory, it won't be
>>> applicable to hot-unplug case (if qemu's hot-unplug logic is
>>> compromised).
>>
>>Even in the hot-unplug case the tool stack could issue unbind
>>requests, behind the back of the possibly compromised qemu,
>>once neither the guest nor qemu have access to the device
>>anymore.
> 
> But currently, tool stack doesn't know the remaining IRQ bindings.
> If tool stack can 

Re: [Xen-devel] [PATCH v2 1/2] x86/dom0: rename paging function

2018-12-12 Thread Jan Beulich
>>> On 12.12.18 at 18:05,  wrote:
> On Wed, Dec 12, 2018 at 09:15:09AM -0700, Jan Beulich wrote:
>> The MMIO side of things of course still remains unclear.
> 
> Right, for the MMIO and the handling of grant and foreign mappings it's
> not clear how we want to proceed.
> 
> Maybe account for all host RAM (total_pages) plus MMIO BARs?

Well, I thought we've already settled on it being impossible to
account for all MMIO BARs at this point.

>> What I don't understand in any case though is
>> "PAGE_SIZE << PAGE_ORDER_4K". This is x86 code - why not
>> just PAGE_SIZE?
> 
> Oh, I've done it like that because this is related to p2m code, which
> uses this way to get the page size. IIRC you told me to use this for
> things like pvh_setup_e820. I don't mind switching to just PAGE_SIZE.

Oh, I see. It's fine either way then. My general way of thinking
here is that outside of x86 code we better use these
PAGE_ORDER_* values, while in x86 specific code I don't see
the point. But indeed the p2m code is littered with them.

Jan



___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [freebsd-master test] 131265: all pass - PUSHED

2018-12-12 Thread osstest service owner
flight 131265 freebsd-master real [real]
http://logs.test-lab.xenproject.org/osstest/logs/131265/

Perfect :-)
All tests in this flight passed as required
version targeted for testing:
 freebsd  25cda747b021a1be758d3d1b128a670a23841f7f
baseline version:
 freebsd  f71d2bdf0c00378411ab60ec1ab76196b920e666

Last test of basis   131205  2018-12-10 09:21:49 Z2 days
Testing same since   131265  2018-12-12 09:19:37 Z0 days1 attempts


People who touched revisions under test:
  ae 
  andrew 
  arybchik 
  avos 
  bz 
  cem 
  cy 
  dab 
  delphij 
  dim 
  emaste 
  eugen 
  hselasky 
  imp 
  jhb 
  jhibbits 
  kib 
  kp 
  luporl 
  markj 
  mav 
  mckusick 
  mjg 
  shurd 
  yuripv 

jobs:
 build-amd64-freebsd-againpass
 build-amd64-freebsd  pass
 build-amd64-xen-freebsd  pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/freebsd.git
   f71d2bdf0c0..25cda747b02  25cda747b021a1be758d3d1b128a670a23841f7f -> 
tested/master

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] x86/VT-x: Don't activate VMCS Shadowing outside of nested vmx mode

2018-12-12 Thread Tian, Kevin
> From: Andrew Cooper [mailto:andrew.coop...@citrix.com]
> Sent: Saturday, December 8, 2018 4:07 AM
> nested vmx mode
> 
> By default on capable hardware,
> SECONDARY_EXEC_ENABLE_VMCS_SHADOWING is
> activated unilaterally.  The VMCS Link pointer is initialised to ~0, but the
> VMREAD/VMWRITE bitmap pointers are not.
> 
> This causes the 16bit IVT and Bios Data Area get interpreted as the
> read/write
> permission bitmap for guests which blindly execute VMREAD/VMWRITE
> instructions.
> 
> This is not a security issue because the VMCS Link pointer being ~0 causes
> VMREAD/VMWRITE to complete with VMFailInvalid (rather than modifying
> a
> potential shadow VMCS), and the contents of MFN 0 has already been
> determined
> not to contain any interesting data because of L1TF's ability to read that 4k
> frame.
> 
> Leave VMCS Shadowing disabled by default, and toggle it in
> nvmx_{set,clear}_vmcs_pointer().  This isn't the most efficient course of
> action, but it is the most simple way of leaving nested-virt working as it did
> before.
> 
> While editing construct_vmcs(), collect all default secondary_exec_control
> modifications together.  The disabling of PML is latently buggy because it
> happens after secondary_exec_control are written into the VMCS, although
> there
> is an unconditional update later which writes the correct value into
> hardware.
> 
> Signed-off-by: Andrew Cooper 

Acked-by: Kevin Tian 
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v4 3/4] iommu: elide flushing for higher order map/unmap operations

2018-12-12 Thread Tian, Kevin
> From: Paul Durrant [mailto:paul.durr...@citrix.com]
> Sent: Thursday, December 6, 2018 11:34 PM
> 
> This patch removes any implicit flushing that occurs in the implementation
> of map and unmap operations and adds new iommu_map/unmap()
> wrapper
> functions. To maintain sematics of the iommu_legacy_map/unmap()
> wrapper
> functions, these are modified to call the new wrapper functions and then
> perform an explicit flush operation.
> 
> Because VT-d currently performs two different types of flush dependent
> upon
> whether a PTE is being modified versus merely added (i.e. replacing a non-
> present PTE) 'iommu flush flags' are defined by this patch and the
> iommu_ops map_page() and unmap_page() methods are modified to OR
> the type
> of flush necessary for the PTE that has been populated or depopulated into
> an accumulated flags value. The accumulated value can then be passed into
> the explicit flush operation.
> 
> The ARM SMMU implementations of map_page() and unmap_page()
> currently
> perform no implicit flushing and therefore the modified methods do not
> adjust the flush flags.
> 
> NOTE: The per-cpu 'iommu_dont_flush_iotlb' is respected by the
>   iommu_legacy_map/unmap() wrapper functions and therefore this now
>   applies to all IOMMU implementations rather than just VT-d.
> 
> Signed-off-by: Paul Durrant 

Reviewed-by: Kevin Tian 
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v4 2/4] iommu: rename wrapper functions

2018-12-12 Thread Tian, Kevin
> From: Paul Durrant [mailto:paul.durr...@citrix.com]
> Sent: Thursday, December 6, 2018 11:34 PM
> 
> A subsequent patch will add semantically different versions of
> iommu_map/unmap() so, in advance of that change, this patch renames
> the
> existing functions to iommu_legacy_map/unmap() and modifies all call-
> sites.
> It also adjusts a comment that refers to iommu_map_page(), which was re-
> named by a previous patch.
> 
> This patch is purely cosmetic. No functional change.
> 
> Signed-off-by: Paul Durrant 
> Acked-by: Jan Beulich A

Reviewed-by: Kevin Tian 
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 2/2] x86/hvm: Corrections to RDTSCP intercept handling

2018-12-12 Thread Tian, Kevin
> From: Andrew Cooper [mailto:andrew.coop...@citrix.com]
> Sent: Saturday, December 1, 2018 1:07 AM
> 
> For both VT-x and SVM, the RDTSCP intercept will trigger if the pipeline
> supports the instruction, but the guest may have not have rdtscp in its
> featureset.  Bring the vmexit handlers in line with the main emulator
> behaviour by optionally handing back #UD.
> 
> Next on the AMD side, if RDTSCP actually ends up being intercepted on a
> debug
> build, we first update regs->rcx, then call __get_instruction_length() asking
> for RDTSC.  As the two instructions are different (and indeed, different
> lengths!), __get_instruction_length_from_list() fails and hands back a #GP
> fault.
> 
> This can demonstrated by putting a guest into tsc_mode="always emulate"
> and
> executing an rdtscp instruction:
> 
>   (d1) --- Xen Test Framework ---
>   (d1) Environment: HVM 64bit (Long mode 4 levels)
>   (d1) Test rdtscp
>   (d1) TSC mode 1
>   (XEN) emulate.c:159:d1v0 __get_instruction_length_from_list: Mismatch
> between expected and actual instruction:
>   (XEN) emulate.c:163:d1v0   list[0] val 8, { opc 0xf0031, modrm 0 }, list
> entries: 1
>   (XEN) emulate.c:165:d1v0   rip 0x10475f, nextrip 0x104762, len 3
>   (XEN) Insn_len emulation failed (1): d1v0 64bit @ 0008:0010475f -> 0f 01
> f9 5b 31 ff 31 c0 e9 c4 db ff ff 00 00 00
>   (d1) **
>   (d1) PANIC: Unhandled exception at 0008:0010475f
>   (d1) Vec 13 #GP[]
>   (d1) **
> 
> First, teach __get_instruction_length() to cope with RDTSCP, and improve
> svm_vmexit_do_rdtsc() to ask for the correct instruction.  Move the regs-
> >rcx
> adjustment into this function to ensure it gets done after we are done
> potentially raising faults.
> 
> Reported-by: Paul Durrant 
> Signed-off-by: Andrew Cooper 

Reviewed-by: Kevin Tian 
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [linux-4.19 test] 131255: regressions - FAIL

2018-12-12 Thread osstest service owner
flight 131255 linux-4.19 real [real]
http://logs.test-lab.xenproject.org/osstest/logs/131255/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsm 7 xen-boot fail REGR. 
vs. 129313
 test-amd64-amd64-xl-qemuu-debianhvm-amd64-xsm 7 xen-boot fail REGR. vs. 129313
 test-amd64-amd64-amd64-pvgrub  7 xen-bootfail REGR. vs. 129313
 test-amd64-amd64-xl-qemuu-dmrestrict-amd64-dmrestrict 7 xen-boot fail REGR. 
vs. 129313
 test-amd64-amd64-i386-pvgrub  7 xen-boot fail REGR. vs. 129313
 test-amd64-amd64-libvirt-pair 10 xen-boot/src_host   fail REGR. vs. 129313
 test-amd64-amd64-libvirt-pair 11 xen-boot/dst_host   fail REGR. vs. 129313
 test-amd64-i386-examine   8 reboot   fail REGR. vs. 129313
 test-amd64-amd64-pair10 xen-boot/src_hostfail REGR. vs. 129313
 test-amd64-amd64-pair11 xen-boot/dst_hostfail REGR. vs. 129313
 test-amd64-i386-pair 10 xen-boot/src_hostfail REGR. vs. 129313
 test-amd64-i386-pair 11 xen-boot/dst_hostfail REGR. vs. 129313
 test-amd64-i386-xl-raw7 xen-boot fail REGR. vs. 129313
 test-amd64-i386-qemut-rhel6hvm-intel  7 xen-boot fail REGR. vs. 129313
 test-amd64-amd64-examine  8 reboot   fail REGR. vs. 129313
 test-amd64-i386-qemut-rhel6hvm-amd 12 guest-start/redhat.repeat fail REGR. vs. 
129313

Tests which are failing intermittently (not blocking):
 test-amd64-i386-xl-qemut-debianhvm-amd64-xsm  7 xen-boot   fail pass in 131208
 test-amd64-i386-libvirt-pair 10 xen-boot/src_host  fail pass in 131208
 test-amd64-i386-libvirt-pair 11 xen-boot/dst_host  fail pass in 131208
 test-amd64-amd64-libvirt-vhd 17 guest-start/debian.repeat  fail pass in 131208

Regressions which are regarded as allowable (not blocking):
 test-amd64-amd64-rumprun-amd64 17 rumprun-demo-xenstorels/xenstorels.repeat 
fail REGR. vs. 129313

Tests which did not succeed, but are not blocking:
 test-amd64-i386-xl-qemuu-debianhvm-amd64-xsm  7 xen-boot   fail never pass
 test-amd64-i386-xl-pvshim12 guest-start  fail   never pass
 test-amd64-i386-libvirt  13 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  13 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-armhf-armhf-xl-arndale  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  14 saverestore-support-checkfail   never pass
 test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2  fail never pass
 test-amd64-amd64-xl-qemut-win7-amd64 17 guest-stop fail never pass
 test-amd64-amd64-libvirt-vhd 12 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 13 migrate-support-checkfail   never pass
 test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stop fail never pass
 test-armhf-armhf-xl-credit1  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 13 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 14 saverestore-support-checkfail never pass
 test-armhf-armhf-libvirt 13 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt 14 saverestore-support-checkfail   never pass
 test-amd64-i386-xl-qemut-ws16-amd64 17 guest-stop  fail never pass
 test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop  fail never pass
 test-amd64-amd64-xl-qemuu-ws16-amd64 17 guest-stop fail never pass
 test-amd64-i386-xl-qemut-win7-amd64 17 guest-stop  fail never pass
 test-amd64-i386-xl-qemuu-ws16-amd64 17 guest-stop  fail never pass
 test-armhf-armhf-libvirt-raw 12 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 13 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 14 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-vhd  12 migrate-support-checkfail   never 

[Xen-devel] [libvirt test] 131256: regressions - FAIL

2018-12-12 Thread osstest service owner
flight 131256 libvirt real [real]
http://logs.test-lab.xenproject.org/osstest/logs/131256/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-amd64-libvirt   6 libvirt-buildfail REGR. vs. 131219
 test-armhf-armhf-libvirt16 guest-start/debian.repeat fail REGR. vs. 131219

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt-vhd  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked n/a
 test-amd64-amd64-libvirt-xsm  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-pair  1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt-raw 13 saverestore-support-checkfail  like 131219
 test-armhf-armhf-libvirt 14 saverestore-support-checkfail  like 131219
 test-amd64-i386-libvirt-xsm  13 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  13 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-armhf-armhf-libvirt-raw 12 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt 13 migrate-support-checkfail   never pass

version targeted for testing:
 libvirt  daf4e2abca914fd8340d2243e2cb070a43dd7149
baseline version:
 libvirt  fa30ee04a2a7205c3d664c67b88dd8df9cb1fb40

Last test of basis   131219  2018-12-11 04:19:05 Z2 days
Testing same since   131256  2018-12-12 04:19:19 Z1 days1 attempts


People who touched revisions under test:
  Daniel P. Berrangé 
  Erik Skultety 
  Julio Faracco 
  Michal Privoznik 
  Nikolay Shirokovskiy 

jobs:
 build-amd64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-libvirt  fail
 build-armhf-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-armhf-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm   blocked 
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsmpass
 test-amd64-amd64-libvirt-xsm blocked 
 test-amd64-i386-libvirt-xsm  pass
 test-amd64-amd64-libvirt blocked 
 test-armhf-armhf-libvirt fail
 test-amd64-i386-libvirt  pass
 test-amd64-amd64-libvirt-pairblocked 
 test-amd64-i386-libvirt-pair pass
 test-armhf-armhf-libvirt-raw pass
 test-amd64-amd64-libvirt-vhd blocked 



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Not pushing.


commit daf4e2abca914fd8340d2243e2cb070a43dd7149
Author: Julio Faracco 
Date:   Fri Nov 30 20:43:37 2018 +0800

tests: Adding test case to include multiple network definitions.

This commit includes a test case for multiple network definitions. It is
useful right now, but it will be more useful when the index used by LXC
version 3.X is implemented to support this new settings. The version 3.X
is using indexes to specify each network settings.

Signed-off-by: Julio Faracco 
ACKed-by: Michal Privoznik 

commit 53762677a84c7bf124714d9e607da8c115084775
Author: Julio Faracco 
Date:   Fri Nov 30 20:43:36 2018 +0800

lxc: Initializing IPv6 and IPv4 gateway to overwrite old settings.

This commit fixes a bug when you have multiple network settings defined.
Basically, if you set an IPv6 or IPv4 gateway, it carries on next
network settings. It is 

Re: [Xen-devel] [PATCH] xen: xen-pciback: Reset MSI-X state when exposing a device

2018-12-12 Thread Chao Gao
On Wed, Dec 12, 2018 at 08:21:39AM -0700, Jan Beulich wrote:
 On 12.12.18 at 16:18,  wrote:
>> On Wed, Dec 12, 2018 at 01:51:01AM -0700, Jan Beulich wrote:
>> On 12.12.18 at 08:06,  wrote:
 On Wed, Dec 05, 2018 at 09:01:33AM -0500, Boris Ostrovsky wrote:
>On 12/5/18 4:32 AM, Roger Pau Monné wrote:
>> On Wed, Dec 05, 2018 at 10:19:17AM +0800, Chao Gao wrote:
>>> I find some pass-thru devices don't work any more across guest reboot.
>>> Assigning it to another guest also meets the same issue. And the only
>>> way to make it work again is un-binding and binding it to pciback.
>>> Someone reported this issue one year ago [1]. More detail also can be
>>> found in [2].
>>>
>>> The root-cause is Xen's internal MSI-X state isn't reset properly
>>> during reboot or re-assignment. In the above case, Xen set maskall bit
>>> to mask all MSI interrupts after it detected a potential security
>>> issue. Even after device reset, Xen didn't reset its internal maskall
>>> bit. As a result, maskall bit would be set again in next write to
>>> MSI-X message control register.
>>>
>>> Given that PHYSDEVOPS_prepare_msix() also triggers Xen resetting MSI-X
>>> internal state of a device, we employ it to fix this issue rather than
>>> introducing another dedicated sub-hypercall.
>>>
>>> Note that PHYSDEVOPS_release_msix() will fail if the mapping between
>>> the device's msix and pirq has been created. This limitation prevents
>>> us calling this function when detaching a device from a guest during
>>> guest shutdown. Thus it is called right before calling
>>> PHYSDEVOPS_prepare_msix().
>> s/PHYSDEVOPS/PHYSDEVOP/ (no final S). And then I would also drop the
>> () at the end of the hypercall name since it's not a function.
>>
>> I'm also wondering why the release can't be done when the device is
>> detached from the guest (or the guest has been shut down). This makes
>> me worry about the raciness of the attach/detach procedure: if there's
>> a state where pciback assumes the device has been detached from the
>> guest, but there are still pirqs bound, an attempt to attach to
>> another guest in such state will fail.
>
>I wonder whether this additional reset functionality could be done out
>of xen_pcibk_xenbus_remove(). We first do a (best effort) device reset
>and then do the extra things that are not properly done there.
 
 No. It cannot be done in xen_pcibk_xenbus_remove() without modifying
 the handler of PHYSDEVOP_release_msix. To do a successful Xen internal
 MSI-X state reset, PHYSDEVOP_{release, prepare}_msix should be finished
 without error. But ATM, xen expects that no msi is bound to pirq when
 doing PHYSDEVOP_release_msix. Otherwise it fails with error code -EBUSY.
 However, the expectation isn't guaranteed in xen_pcibk_xenbus_remove().
 In some cases, if qemu fails to unmap MSIs, MSIs are unmapped by Xen
 at last minute, which happens after device reset in 
 xen_pcibk_xenbus_remove().
>>>
>>>But that may need taking care of: I don't think it is a good idea to have
>>>anything left from the prior owning domain when the device gets reset.
>>>I.e. left over IRQ bindings should perhaps be forcibly cleared before
>>>invoking the reset;
>> 
>> Agree. How about pciback to track the established IRQ bindings? Then
>> pciback can clear irq binding before invoking the reset.
>
>How would pciback even know of those mappings, when it's qemu
>who establishes (and manages) them?

I meant to expose some interfaces from pciback. And pciback serves
as the proxy of IRQ (un)binding APIs.

>
>>>in fact I'd expect this to happen in the course of
>>>domain destruction, and I'd expect the device reset to come after the
>>>domain was cleaned up. Perhaps simply an ordering issue in the tool
>>>stack?
>> 
>> I don't think reversing the sequences of device reset and domain
>> destruction would be simple. Furthermore, during device hot-unplug,
>> device reset is done when the owner is alive. So if we use domain
>> destruction to enforce all irq binding cleared, in theory, it won't be
>> applicable to hot-unplug case (if qemu's hot-unplug logic is
>> compromised).
>
>Even in the hot-unplug case the tool stack could issue unbind
>requests, behind the back of the possibly compromised qemu,
>once neither the guest nor qemu have access to the device
>anymore.

But currently, tool stack doesn't know the remaining IRQ bindings.
If tool stack can maintaine IRQ binding information of a pass-thru
device (stored in Xenstore?), we can come up with a clean solution
without modifying linux kernel and Xen.

Thanks
Chao

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Interrupt injection with ISR set on Intel hardware

2018-12-12 Thread Tian, Kevin
btw can you also capture ISR/IRR/PPR right before local_irq_enable()?
though I didn't see a reason why code in-between may impact those 
bits, it doesn't hurt to capture the context right before interrupt is
raised. :-)

> -Original Message-
> From: Tian, Kevin
> Sent: Thursday, December 13, 2018 9:28 AM
> To: Roger Pau Monné 
> Cc: xen-devel@lists.xenproject.org; Wei Liu ; Jan
> Beulich ; Andrew Cooper
> ; Nakajima, Jun 
> Subject: RE: Interrupt injection with ISR set on Intel hardware
> 
> > From: Roger Pau Monné [mailto:roger@citrix.com]
> > Sent: Wednesday, December 12, 2018 8:18 PM
> >
> > On Wed, Dec 12, 2018 at 11:48:52AM +, Tian, Kevin wrote:
> > > > From: Roger Pau Monné [mailto:roger@citrix.com]
> > > > Sent: Wednesday, December 12, 2018 7:25 PM
> > > >
> > > > On Wed, Dec 12, 2018 at 10:36:44AM +, Tian, Kevin wrote:
> > > > > > From: Roger Pau Monné [mailto:roger@citrix.com]
> > > > > > Sent: Monday, October 15, 2018 6:30 PM
> > > > > > (XEN)   [22642] POWERTYPE 4
> > > > > > (XEN)   [22643] IDLE PPR 0x0020
> > > > > > (XEN)IRR
> > > > > >
> > > >
> >
> 00
> > > > > > 00
> > > > > > (XEN)ISR
> > > > > >
> > > >
> >
> 02
> > > > > > 00
> > > > > > (XEN)   [22644] WAKE PPR 0x0020
> > > > > > (XEN)IRR
> > > > > >
> > > >
> >
> 02
> > > > > > 00
> > > > > > (XEN)ISR
> > > > > >
> > > >
> >
> 02
> > > > > > 00
> > > > >
> > > > > looks pending IRR (0x21) doesn't always trigger a spurious interrupt?
> > > >
> > > > Yes, that's correct. Having a pending IRR and going idle doesn't
> > > > always trigger the spurious interrupt re-injection.
> > > >
> > > > > is it a fixed pattern after how many rounds of Cstate enter/exit with
> > > > > pending IRR(0x21) then you see assertion happened (in this example
> > > > > it happens at 3rd time)?
> > > >
> > > > It's not a fixed pattern, here's another trace with IRR(0x21) being
> > > > pending just once during the Cstate transitions:
> > >
> > > did you observe a case where such asset may occur when IRR(0x21)
> > > is cleared but ISR (0x21) is set?
> >
> > No, I've always seen both ISR and IRR set when the interrupt injection
> > happens. This of course doesn't mean it's not possible, but I have not
> > seen any trace with ISR(0x21) set and IRR(0x21) clear.
> >
> 
> sorry but let me double confirm. You always see ISR[21]/IRR[21] being
> set "before and after entering C3" to hit the problem, right? When
> interrupt injection happens later, ISR[21] is set but IRR[21] is cleared (as
> expected for normal interrupt delivery process).
> 
> btw I checked your original mail:
> 
> (XEN)[] mwait-idle.c#mwait_idle+0x2a5/0x381
> xen/arch/x86/cpu/mwait-idle.c:802
> 
>788if (cpu_is_haltable(cpu))
>789mwait_idle_with_hints(eax,
> MWAIT_ECX_INTERRUPT_BREAK);
>790
>791after = cpuidle_get_tick();
>792
>793cstate_restore_tsc();
>794trace_exit_reason(irq_traced);
>795TRACE_6D(TRC_PM_IDLE_EXIT, cx->type, after,
>796irq_traced[0], irq_traced[1], irq_traced[2],
> irq_traced[3]);
>797
>798/* Now back in C0. */
>799update_idle_stats(power, cx, before, after);
>800local_irq_enable();
>801
> -> 802if (!(lapic_timer_reliable_states & (1 << cstate)))
>803lapic_timer_on();
>804
>805sched_tick_resume();
>806cpufreq_dbs_timer_resume();
> 
> Looks above code is different from staging:
> 
> acpi_processor_idle:
>   acpi_idle_do_entry:
>   acpi_processor_ffh_cstate_enter:
>   mwait_idle_with_hints
> 
> there is no mwait_idle alone. and even with compiler optimization I didn't
> find code sequence like above...
> 
> Thanks
> Kevin

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v4 6/6] x86/microcode: Synchronize late microcode loading

2018-12-12 Thread Boris Ostrovsky
On 12/12/18 2:26 AM, Jan Beulich wrote:
 On 11.12.18 at 19:16,  wrote:
>> BTW: Apart from the fact its ugly and take a lng time to complete, do you
>> have any practical isssues you want to highlight? maybe that can 
>> help upstream as well.
> The situation for a kernel and a hypervisor might be different here
> (but in the Linux case may then be more hypervisor-like when
> considering KVM): The hypervisor needs to make sure in particular
> time management within guests won't break. Stopping the
> machine for an extended period of time may not be helpful there.
> For processes in an OS the constraints might not be as tight, but
> I could imaging problems even there in some less common cases.
>
> That said, I'm not convinced at all it is a good idea to load new
> ucode while _any_ guests are running, but we're talking about a
> last resort approach here anyway.
>

BTW, one thing I meant to mention about this patch earlier: we observed
that updating microcode from only one sibling resulted in guest reading
stale version value from another thread. This causes at least some
Windows versions to crash.

We ended up executing "wrmsr(0x8b, 0); cpuid_eax(1);" on all threads
after the update. (Another alternative could be to intercept the reads)

-boris

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [xen-unstable-smoke test] 131284: tolerable all pass - PUSHED

2018-12-12 Thread osstest service owner
flight 131284 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/131284/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt 13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  14 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  00c96d77422a4b84247bec5dadf434363d312cac
baseline version:
 xen  9c35572f359cd6f71aa20b0991c74e032b8721d2

Last test of basis   131274  2018-12-12 17:00:36 Z0 days
Testing same since   131284  2018-12-13 00:00:41 Z0 days1 attempts


People who touched revisions under test:
  Julien Grall 
  Matthew Daley 
  Shameer Kolothum 

jobs:
 build-amd64  pass
 build-armhf  pass
 build-amd64-libvirt  pass
 test-armhf-armhf-xl  pass
 test-amd64-amd64-xl-qemuu-debianhvm-i386 pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/xen.git
   9c35572f35..00c96d7742  00c96d77422a4b84247bec5dadf434363d312cac -> smoke

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Interrupt injection with ISR set on Intel hardware

2018-12-12 Thread Tian, Kevin
> From: Roger Pau Monné [mailto:roger@citrix.com]
> Sent: Wednesday, December 12, 2018 8:18 PM
> 
> On Wed, Dec 12, 2018 at 11:48:52AM +, Tian, Kevin wrote:
> > > From: Roger Pau Monné [mailto:roger@citrix.com]
> > > Sent: Wednesday, December 12, 2018 7:25 PM
> > >
> > > On Wed, Dec 12, 2018 at 10:36:44AM +, Tian, Kevin wrote:
> > > > > From: Roger Pau Monné [mailto:roger@citrix.com]
> > > > > Sent: Monday, October 15, 2018 6:30 PM
> > > > > (XEN)   [22642] POWERTYPE 4
> > > > > (XEN)   [22643] IDLE PPR 0x0020
> > > > > (XEN)IRR
> > > > >
> > >
> 00
> > > > > 00
> > > > > (XEN)ISR
> > > > >
> > >
> 02
> > > > > 00
> > > > > (XEN)   [22644] WAKE PPR 0x0020
> > > > > (XEN)IRR
> > > > >
> > >
> 02
> > > > > 00
> > > > > (XEN)ISR
> > > > >
> > >
> 02
> > > > > 00
> > > >
> > > > looks pending IRR (0x21) doesn't always trigger a spurious interrupt?
> > >
> > > Yes, that's correct. Having a pending IRR and going idle doesn't
> > > always trigger the spurious interrupt re-injection.
> > >
> > > > is it a fixed pattern after how many rounds of Cstate enter/exit with
> > > > pending IRR(0x21) then you see assertion happened (in this example
> > > > it happens at 3rd time)?
> > >
> > > It's not a fixed pattern, here's another trace with IRR(0x21) being
> > > pending just once during the Cstate transitions:
> >
> > did you observe a case where such asset may occur when IRR(0x21)
> > is cleared but ISR (0x21) is set?
> 
> No, I've always seen both ISR and IRR set when the interrupt injection
> happens. This of course doesn't mean it's not possible, but I have not
> seen any trace with ISR(0x21) set and IRR(0x21) clear.
> 

sorry but let me double confirm. You always see ISR[21]/IRR[21] being
set "before and after entering C3" to hit the problem, right? When 
interrupt injection happens later, ISR[21] is set but IRR[21] is cleared (as 
expected for normal interrupt delivery process).

btw I checked your original mail:

(XEN)[] mwait-idle.c#mwait_idle+0x2a5/0x381
xen/arch/x86/cpu/mwait-idle.c:802

   788  if (cpu_is_haltable(cpu))
   789  mwait_idle_with_hints(eax, MWAIT_ECX_INTERRUPT_BREAK);
   790
   791  after = cpuidle_get_tick();
   792
   793  cstate_restore_tsc();
   794  trace_exit_reason(irq_traced);
   795  TRACE_6D(TRC_PM_IDLE_EXIT, cx->type, after,
   796  irq_traced[0], irq_traced[1], irq_traced[2], 
irq_traced[3]);
   797
   798  /* Now back in C0. */
   799  update_idle_stats(power, cx, before, after);
   800  local_irq_enable();
   801
-> 802  if (!(lapic_timer_reliable_states & (1 << cstate)))
   803  lapic_timer_on();
   804
   805  sched_tick_resume();
   806  cpufreq_dbs_timer_resume();

Looks above code is different from staging:

acpi_processor_idle:
acpi_idle_do_entry:
acpi_processor_ffh_cstate_enter:
mwait_idle_with_hints

there is no mwait_idle alone. and even with compiler optimization I didn't
find code sequence like above...

Thanks
Kevin

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [qemu-mainline test] 131240: regressions - FAIL

2018-12-12 Thread osstest service owner
flight 131240 qemu-mainline real [real]
http://logs.test-lab.xenproject.org/osstest/logs/131240/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-armhf-armhf-xl-cubietruck 16 guest-start/debian.repeat fail REGR. vs. 
131172

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stopfail like 131172
 test-armhf-armhf-libvirt 14 saverestore-support-checkfail  like 131172
 test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop fail like 131172
 test-amd64-amd64-xl-qemuu-ws16-amd64 17 guest-stopfail like 131172
 test-armhf-armhf-libvirt-raw 13 saverestore-support-checkfail  like 131172
 test-amd64-i386-xl-pvshim12 guest-start  fail   never pass
 test-amd64-i386-libvirt-xsm  13 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  13 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 13 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2  fail never pass
 test-amd64-amd64-libvirt-vhd 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 13 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 14 saverestore-support-checkfail never pass
 test-armhf-armhf-xl-credit2  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 13 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 14 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-rtds 13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  13 saverestore-support-checkfail   never pass
 test-amd64-i386-xl-qemuu-ws16-amd64 17 guest-stop  fail never pass
 test-amd64-amd64-xl-qemuu-win10-i386 10 windows-installfail never pass
 test-amd64-i386-xl-qemuu-win10-i386 10 windows-install fail never pass

version targeted for testing:
 qemuu32a1a94dd324d33578dca1dc96d7896a0244d768
baseline version:
 qemuu4f818e7b7f8ecb5c166d093b8859fec2ddeca2ef

Last test of basis   131172  2018-12-09 12:08:42 Z3 days
Testing same since   131240  2018-12-11 17:37:01 Z1 days1 attempts


People who touched revisions under test:
  Peter Maydell 

jobs:
 build-amd64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-armhf-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-armhf-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl  pass
 test-armhf-armhf-xl  pass
 test-amd64-i386-xl   pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm   pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsmpass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64-xsmpass
 test-amd64-i386-xl-qemuu-debianhvm-amd64-xsm pass
 test-amd64-amd64-libvirt-xsm pass
 test-amd64-i386-libvirt-xsm   

[Xen-devel] [linux-4.14 test] 131238: tolerable FAIL - PUSHED

2018-12-12 Thread osstest service owner
flight 131238 linux-4.14 real [real]
http://logs.test-lab.xenproject.org/osstest/logs/131238/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  13 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  13 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 13 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2  fail never pass
 test-amd64-amd64-libvirt-vhd 12 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt 13 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt 14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 13 migrate-support-checkfail never pass
 test-armhf-armhf-xl-rtds 13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 14 saverestore-support-checkfail never pass
 test-armhf-armhf-xl-rtds 14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 13 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 14 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-credit2  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  14 saverestore-support-checkfail   never pass
 test-amd64-i386-xl-qemut-win7-amd64 17 guest-stop  fail never pass
 test-amd64-amd64-xl-qemuu-ws16-amd64 17 guest-stop fail never pass
 test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stop fail never pass
 test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop  fail never pass
 test-amd64-i386-xl-pvshim12 guest-start  fail   never pass
 test-amd64-amd64-xl-qemut-ws16-amd64 17 guest-stop fail never pass
 test-amd64-amd64-xl-qemut-win7-amd64 17 guest-stop fail never pass
 test-armhf-armhf-libvirt-raw 12 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  14 saverestore-support-checkfail   never pass
 test-amd64-i386-xl-qemuu-ws16-amd64 17 guest-stop  fail never pass
 test-amd64-i386-xl-qemut-ws16-amd64 17 guest-stop  fail never pass
 test-armhf-armhf-xl-vhd  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  13 saverestore-support-checkfail   never pass
 test-amd64-i386-xl-qemut-win10-i386 10 windows-install fail never pass
 test-amd64-amd64-xl-qemut-win10-i386 10 windows-installfail never pass
 test-amd64-amd64-xl-qemuu-win10-i386 10 windows-installfail never pass
 test-amd64-i386-xl-qemuu-win10-i386 10 windows-install fail never pass

version targeted for testing:
 linuxca48e5e30b75a28c12c43c7428c95735e4885e6b
baseline version:
 linux2e390c487815669fb9bb35d7ea11883cc10a9b50

Last test of basis   130155  2018-11-15 23:53:54 Z   26 days
Failing since130644  2018-11-21 08:41:02 Z   21 days   12 attempts
Testing same since   131156  2018-12-09 02:01:10 Z3 days3 attempts


442 people touched revisions under test,
not listing them all

jobs:
 build-amd64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-armhf-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-armhf-pvopspass
 build-i386-pvops pass
 build-amd64-rumprun  pass
 build-i386-rumprun   pass
 test-amd64-amd64-xl  pass

Re: [Xen-devel] [PATCH v5 5/7] xen/arm: zynqmp: eemi access control

2018-12-12 Thread Stefano Stabellini
On Tue, 11 Dec 2018, Julien Grall wrote:
> Hi,
> 
> On 03/12/2018 21:03, Stefano Stabellini wrote:
> > From: "Edgar E. Iglesias" 
> > 
> > From: Edgar E. Iglesias 
> > 
> > Introduce data structs to implement basic access controls.
> > Introduce the following three functions:
> > 
> > domain_has_node_access: check access to the node
> > domain_has_reset_access: check access to the reset line
> > domain_has_mmio_access: check access to the register
> > 
> > Signed-off-by: Edgar E. Iglesias 
> > Signed-off-by: Stefano Stabellini 
> > 
> > ---
> > Statically defines:
> > 
> > - pm_node_access
> > It encodes the relationship between a node id and the start of the MMIO
> > region of a device in the corresponding power domain. It is used for
> > permission checking. Although the MMIO region start address is available
> > on device tree and could be derived from there (we plan to improve that
> > in the future), the relationship between a node id and corresponding
> > devices is not described and needs to be hardcoded.
> > 
> > - pm_reset_access
> > Same as pm_node_access for reset lines.
> > 
> > ---
> > Changes in v5:
> > - improve in-code comments
> > - use mfn_t in struct pm_access
> > - remove mmio_access table
> > 
> > Changes in v4:
> > - add #include as needed
> > - add #if 0 for bisectability
> > - use mfn_t in pm_check_access
> > - add wrap-around ASSERT in domain_has_mmio_access
> > - use GENMASK in domain_has_mmio_access
> > - proper bound checks (== ARRAY_SIZE is out of bound)
> > ---
> >   xen/arch/arm/platforms/xilinx-zynqmp-eemi.c | 348
> > 
> >   1 file changed, 348 insertions(+)
> > 
> > diff --git a/xen/arch/arm/platforms/xilinx-zynqmp-eemi.c
> > b/xen/arch/arm/platforms/xilinx-zynqmp-eemi.c
> > index 369bb3f..92a02df 100644
> > --- a/xen/arch/arm/platforms/xilinx-zynqmp-eemi.c
> > +++ b/xen/arch/arm/platforms/xilinx-zynqmp-eemi.c
> > @@ -16,9 +16,357 @@
> >* GNU General Public License for more details.
> >*/
> >   +/*
> > + *  EEMI Power Management API access
> > + *
> > + * Refs:
> > + *
> > https://www.xilinx.com/support/documentation/user_guides/ug1200-eemi-api.pdf
> > + *
> > + * Background:
> > + * The ZynqMP has a subsystem named the PMU with a CPU and special devices
> > + * dedicated to running Power Management Firmware. Other masters in the
> > + * system need to send requests to the PMU in order to for example:
> > + * * Manage power state
> > + * * Configure clocks
> > + * * Program bitstreams for the programmable logic
> > + * * etc
> > + *
> > + * Although the details of the setup are configurable, in the common case
> > + * the PMU lives in the Secure world. NS World cannot directly communicate
> > + * with it and must use proxy services from ARM Trusted Firmware to reach
> > + * the PMU.
> > + *
> > + * Power Management on the ZynqMP is implemented in a layered manner.
> > + * The PMU knows about various masters and will enforce access controls
> > + * based on a pre-configured partitioning. This configuration dictates
> > + * which devices are owned by the various masters and the PMU FW makes sure
> > + * that a given master cannot turn off a device that it does not own or
> > that
> > + * is in use by other masters.
> > + *
> > + * The PMU is not aware of multiple execution states in masters.
> > + * For example, it treats the ARMv8 cores as single units and does not
> > + * distinguish between Secure vs NS OS's nor does it know about Hypervisors
> > + * and multiple guests. It is up to software on the ARMv8 cores to present
> > + * a unified view of its power requirements.
> > + *
> > + * To implement this unified view, ARM Trusted Firmware at EL3 provides
> > + * access to the PM API via SMC calls. ARM Trusted Firmware is responsible
> > + * for mediating between the Secure and the NS world, rejecting SMC calls
> > + * that request changes that are not allowed.
> > + *
> > + * Xen running above ATF owns the NS world and is responsible for
> > presenting
> > + * unified PM requests taking all guests and the hypervisor into account.
> > + *
> > + * Implementation:
> > + * The PM API contains different classes of calls.
> > + * Certain calls are harmless to expose to any guest.
> > + * These include calls to get the PM API Version, or to read out the
> > version
> > + * of the chip we're running on.
> > + *
> > + * In order to correctly virtualize these calls, we need to know if
> > + * guests issuing these calls have ownership of the given device.
> > + * The approach taken here is to map PM API Nodes identifying
> > + * a device into base addresses for registers that belong to that
> > + * same device.
> > + *
> > + * If the guest has access to devices registers, we give the guest
> > + * access to PM API calls that affect that device. This is implemented
> > + * by pm_node_access and domain_has_node_access().
> > + */
> > +
> > +#include 
> > +#include 
> >   #include 
> >   #include 
> >   +#if 0
> > +struct pm_access
> > +{
> > +

Re: [Xen-devel] [PATCH v5 4/7] xen: introduce mfn_init macro

2018-12-12 Thread Stefano Stabellini
On Wed, 5 Dec 2018, Jan Beulich wrote:
> >>> On 04.12.18 at 20:38,  wrote:
> > On Tue, 4 Dec 2018, Jan Beulich wrote:
> >> >>> On 03.12.18 at 22:03,  wrote:
> >> > To be used in constant initializations of mfn_t variables, such as:
> >> > 
> >> > static mfn_t node = mfn_init(MM_ADDR);
> >> > 
> >> > It is necessary because static inline functions cannot be used as static
> >> > initializers.
> >> 
> >> We had been at this point once (quite some time ago), and got
> >> away without such an addition. Did you try to find that old
> >> discussion? Are there any new reasons to have such a construct?
> >> Do you need this for other than setting a value to INVALID_MFN,
> >> in which case INVALID_MFN_INITIALIZER ought to be suitable?
> >> 
> >> This is not to say I'm entirely opposed.
> >> 
> >> If we were to have such a construct, I wonder though whether
> >> mfn_init() is suitable as a name. Simply MFN() perhaps, and then
> >> also consistently have GFN() and DFN()?
> > 
> > Hi Jan,
> > 
> > I am happy with any name, and MFN() together with GFN() and DFN() look
> > like a good option.
> > 
> > The reason why it is needed is that without it I cannot introduce a
> > statically initialized array of mfn_t type like the one in the following
> > patch in the series:
> > 
> > +static const struct pm_access pm_node_access[] = {
> > +/* MM_RPU grants access to all RPU Nodes.  */
> > +[NODE_RPU] = { mfn_init(MM_RPU) },
> > +[NODE_RPU_0] = { mfn_init(MM_RPU) },
> > +[NODE_RPU_1] = { mfn_init(MM_RPU) },
> > +[NODE_IPI_RPU_0] = { mfn_init(MM_RPU) },
> > 
> > [...]
> > 
> > Where MM_RPU is a mfn, and the NODE_* are IDs defined as enum:
> > 
> > #define MM_RPU  0xff9a0
> > 
> > enum pm_node_id {
> > NODE_RPU = 6,
> > NODE_RPU_0,
> > NODE_RPU_1,
> > 
> > [...]
> > 
> > 
> > Originally I had:
> > 
> >   [NODE_RPU] = { MM_RPU },
> > 
> > but I changed the type to be mfn_t to address one of Julien's comments.
> > You might get a better idea of the issue if you give a look at this
> > branch:
> > 
> > http://xenbits.xenproject.org/git-http/people/sstabellini/xen-unstable.git 
> > zynqmp-v5
> 
> Well, I have to admit that I'd rather not see ways to embed hard-coded
> MFNs into code made available generically. May I suggest that you use
> a macro with a name to your liking just locally to that one source file?

OK, no problem


> As a side note, I'm also puzzled by there being entries in the table which
> don't have their MFNs specified. Oddly enough it looks as if
> .hwdom_access was true if and only if no MFN is specified.

Yes, the hwdom_access check could be turned into a check for
MFN(INVALID_MFN). I'll do that it will actually make the array size
smaller.


> The term "node" of course is confusing too, considering its NUMA
> meaning elsewhere in the hypervisor.
 
That comes from the EEMI firmware specification: they use the term
"node" to address a power domain "unit".

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v5 6/7] xen/arm: zynqmp: implement zynqmp_eemi

2018-12-12 Thread Stefano Stabellini
On Wed, 12 Dec 2018, Julien Grall wrote:
> On 11/12/2018 22:23, Stefano Stabellini wrote:
> > On Tue, 11 Dec 2018, Julien Grall wrote:
> > > Hi Stefano,
> > > 
> > > On 03/12/2018 21:03, Stefano Stabellini wrote:
> > > > From: "Edgar E. Iglesias" 
> > > > 
> > > > From: Edgar E. Iglesias 
> > > > 
> > > > zynqmp_eemi uses the defined functions and structs to decide whether to
> > > > make a call to the firmware, or to simply return a predefined value.
> > > > 
> > > > Signed-off-by: Edgar E. Iglesias 
> > > > Signed-off-by: Stefano Stabellini 
> > > > ---
> > > > Changes in v5:
> > > > - remove mmio_access handling
> > > > 
> > > > Changes in v4:
> > > > - add #include as needed
> > > > - improve comment
> > > > - code style
> > > > ---
> > > >xen/arch/arm/platforms/xilinx-zynqmp-eemi.c | 181
> > > > +++-
> > > >1 file changed, 125 insertions(+), 56 deletions(-)
> > > > 
> > > > diff --git a/xen/arch/arm/platforms/xilinx-zynqmp-eemi.c
> > > > b/xen/arch/arm/platforms/xilinx-zynqmp-eemi.c
> > > > index 92a02df..9ecf286 100644
> > > > --- a/xen/arch/arm/platforms/xilinx-zynqmp-eemi.c
> > > > +++ b/xen/arch/arm/platforms/xilinx-zynqmp-eemi.c
> > > > @@ -76,10 +76,10 @@
> > > >  #include 
> > > >#include 
> > > > +#include 
> > > >#include 
> > > >#include 
> > > >-#if 0
> > > >struct pm_access
> > > >{
> > > >mfn_t mfn;
> > > > @@ -309,67 +309,136 @@ static bool domain_has_reset_access(struct domain
> > > > *d,
> > > > uint32_t rst)
> > > >return pm_check_access(pm_reset_access, d, rst);
> > > >}
> > > >-/*
> > > > - * Check if a given domain has access to perform an indirect
> > > > - * MMIO access.
> > > > - *
> > > > - * If the provided mask is invalid, it will be fixed up.
> > > > - */
> > > > -static bool domain_has_mmio_access(struct domain *d,
> > > > -   bool write, paddr_t addr,
> > > > -   uint32_t *mask)
> > > 
> > > Why do you remove code that you just introduced?
> > 
> > I am really sorry about this, it was error applying a patch. This code
> > should never have been introduced: the code should be removed from the
> > previous patch. I'll fix it.
> > 
> > 
> > > > +bool zynqmp_eemi(struct cpu_user_regs *regs)
> > > >{
> > > > -unsigned int i;
> > > > -bool ret = false;
> > > > -uint32_t prot_mask = 0;
> > > > -
> > > > -/*
> > > > - * The hardware domain gets read access to everything.
> > > > - * Lower layers will do further filtering.
> > > > - */
> > > > -if ( !write && is_hardware_domain(d) )
> > > > -return true;
> > > > +struct arm_smccc_res res;
> > > > +uint32_t fid = get_user_reg(regs, 0);
> > > > +uint32_t nodeid = get_user_reg(regs, 1);
> > > 
> > > You didn't address my concern regarding SMC32 vs SMC64 convention. As I
> > > said
> > > earlier on, at least CALL_COUNT, UID and VERSION are only accessible using
> > > the
> > > SMC32 convention.
> > > 
> > > I can't tell for the other as the EEMI spec does not seem to specify it. I
> > > would be surprised that EEMI would  ignore tops bits of the ID given they
> > > convey different information (e.g  fast/yielding call, 32/64-bit
> > > convention).
> > > 
> > > Looking at the branch you mentioned earlier on, zynqmp_pm_invoke_fn
> > > (drivers/firmware/xilinx/zynqmp.c) is definitely using the SMC64 calling
> > > convention as described in the documentation above the function.
> > > 
> > > So this needs to be fixed properly.
> > 
> > OK, I'll add a check for the mandatory smc32 calls and forward them to
> > firmware properly.
> 
> smc32 is only part of the problem I mentioned. The main problem is you only
> look at the function number (bits 0-15). This is not enough to know which
> function is going to be called. You want to take into account the full
> function identifier. We provide helpers to create them (see
> ARM_SMCCC_CALL_VAL).

Yes, I get what you are saying. I made a few changes in that direction.

   
> > > > +unsigned int pm_fn = fid & 0x;
> > > > +enum pm_ret_status ret;
> > > >-/* Scan the ACL.  */
> > > > -for ( i = 0; i < ARRAY_SIZE(pm_mmio_access); i++ )
> > > > +switch ( pm_fn )
> > > >{
> > > > -ASSERT(pm_mmio_access[i].start + pm_mmio_access[i].size >=
> > > > -   pm_mmio_access[i].start);
> > > > -
> > > > -if ( addr < pm_mmio_access[i].start )
> > > > -return false;
> > > > -if ( addr >= pm_mmio_access[i].start + pm_mmio_access[i].size )
> > > > -continue;
> > > > -
> > > > -if ( write && pm_mmio_access[i].readonly )
> > > > -return false;
> > > > -if ( pm_mmio_access[i].hwdom_access && !is_hardware_domain(d) )
> > > > -return false;
> > > > -if ( !domain_has_node_access(d, pm_mmio_access[i].node) )
> > > > -return false;
> > > > -
> > > > -/* We've got 

Re: [Xen-devel] [PATCH v5 3/7] xen/arm: zynqmp: introduce zynqmp specific defines

2018-12-12 Thread Stefano Stabellini
On Wed, 12 Dec 2018, Julien Grall wrote:
> Hi Stefano,
> 
> On 11/12/2018 19:22, Stefano Stabellini wrote:
> > On Tue, 11 Dec 2018, Julien Grall wrote:
> > > Hi,
> > > 
> > > On 03/12/2018 21:03, Stefano Stabellini wrote:
> > > What is the plan there?
> > 
> > The plan is to extract the node_id from a power-domain node on device
> > tree. Each device would have a phandler to link to the right
> > power-domain node which contains a power-domain-id attribute. The
> > power-domain-id attribute is the node_id here.
> > 
> > The power-domain-id changes to the Xilinx MPSoC device tree are under
> > discussion with the device tree community.
> 
> If I understand correctly, we will never be able to remove the hardcoded
> values in Xen. This is because some device-tree may not have the bindings. Am
> I correct?

If we want to support running on existing hardware and firmware releases,
then you are correct. We won't be able to remove the hardcoded values.
That ship has sailed, not much we can do about it.

If in the future we decide to drop support for older firmware releases
and ask users to update their firmare/devicetrees, then we'll be able to
remove the hardcoded values. I think it is something we can consider.
In fact, I would rather break compatibility than not providing support
for basic power management functionalities at all.

I think it is better to introduce basic EEMI support now, even if in a
couple of Xen releases from now we'll make the decision to drop the
hardcoded values and require new Xilinx device trees. Xilinx users are
used to updating firmware on these boards every 6 months, and it would
beneficial for them to be able to take a Xen Project release rather than
be forced to use a Xilinx Xen release to have support for power
management. When the new bindings become available, as we introduce
support for them in Xen, we can decide whether to keep the hardcoded
values or whether we should take the opportunity to drop them.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH for-4.12] dt: fix integer check in dt_device_is_compatible

2018-12-12 Thread Stefano Stabellini
On Wed, 12 Dec 2018, Julien Grall wrote:
> Hi Stefano,
> 
> On 11/12/2018 18:46, Stefano Stabellini wrote:
> > cplen is unsigned, thus, it can never be < 0. Use a signed integer local
> > variable instead.
> 
> The current code checks > 0. Looking at the code I don't think it can ever be
> negative. So can you details what is the problem you are trying to resolve?

Yes, it can be "negative" (not actually negative because it is defined
as unsigned), in fact it happens with the nodes added dynamically by grub
at boot. This patches fixes booting from grub.

Specifically ilen is initialed to 16, but strlen+1 is 17, so length
becomes -1. The length of the property generated by grub seems to be
short by 1.


> > Signed-off-by: Stefano Stabellini 
> > ---
> >   xen/common/device_tree.c | 7 +--
> >   1 file changed, 5 insertions(+), 2 deletions(-)
> > 
> > diff --git a/xen/common/device_tree.c b/xen/common/device_tree.c
> > index 8fc401d..df274cc 100644
> > --- a/xen/common/device_tree.c
> > +++ b/xen/common/device_tree.c
> > @@ -213,17 +213,20 @@ bool_t dt_device_is_compatible(const struct
> > dt_device_node *device,
> >   {
> >   const char* cp;
> >   u32 cplen, l;
> > +s64 ilen;
> > cp = dt_get_property(device, "compatible", );
> >   if ( cp == NULL )
> >   return 0;
> > -while ( cplen > 0 )
> > +
> > +ilen = cplen;
> > +while ( ilen > 0 )
> >   {
> >   if ( dt_compat_cmp(cp, compat) == 0 )
> >   return 1;
> >   l = strlen(cp) + 1;
> >   cp += l;
> > -cplen -= l;
> > +ilen -= l;
> >   }
> > return 0;
> > 
> 
> -- 
> Julien Grall
> 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 2/2] xen/arm: Stop relocating Xen

2018-12-12 Thread Stefano Stabellini
On Thu, 29 Nov 2018, Julien Grall wrote:
> At the moment, Xen is relocated towards the end of the memory. While
> this has the advantage to free space in low memory, the code is not
> compliant with the break-before-make because it requires to switch
> between two sets of page-table. This is not entirely trivial to fix as
> it would require us to go through an identity mapping and disabling MMU.
> 
> Furthermore, it looks like that some platform (such as the Hikey960)
> may not be able to bring-up secondary CPUs if the entry is too high.
> 
> I don't believe the low memory is an issue because Xen is quite tiny
> (< 2MB). So the best solution is to stop relocating Xen. This has the
> advantage to simplify the code and should speed-up the boot as relocation
> is not necessary anymore.
> 
> Note that the break-before-make issue is not fixed by this patch.
> 
> Signed-off-by: Julien Grall 
> Reported-by: Matthew Daley 
> ---
>  xen/arch/arm/arm32/head.S | 54 +++
>  xen/arch/arm/arm64/head.S | 50 +---
>  xen/arch/arm/mm.c | 20 ---
>  xen/arch/arm/setup.c  | 65 
> +++
>  xen/include/asm-arm/mm.h  |  2 +-
>  5 files changed, 18 insertions(+), 173 deletions(-)
> 
> diff --git a/xen/arch/arm/arm32/head.S b/xen/arch/arm/arm32/head.S
> index 93b51e9ef2..390a505e05 100644
> --- a/xen/arch/arm/arm32/head.S
> +++ b/xen/arch/arm/arm32/head.S
> @@ -469,58 +469,12 @@ fail:   PRINT("- Boot failed -\r\n")
>  GLOBAL(_end_boot)
>  
>  /*
> - * Copy Xen to new location and switch TTBR
> + * Switch TTBR
>   * r1:r0   ttbr
> - * r2  source address
> - * r3  destination address
> - * [sp]=>r4length
>   *
> - * Source and destination must be word aligned, length is rounded up
> - * to a 16 byte boundary.
> - *
> - * MUST BE VERY CAREFUL when saving things to RAM over the copy
> + * TODO: This code does not comply with break-before-make.
>   */
> -ENTRY(relocate_xen)
> -push {r4,r5,r6,r7,r8,r9,r10,r11}
> -
> -ldr   r4, [sp, #8*4]/* Get 4th argument from stack */
> -
> -/* Copy 16 bytes at a time using:
> - * r5:  counter
> - * r6:  data
> - * r7:  data
> - * r8:  data
> - * r9:  data
> - * r10: source
> - * r11: destination
> - */
> -mov   r5, r4
> -mov   r10, r2
> -mov   r11, r3
> -1:  ldmia r10!, {r6, r7, r8, r9}
> -stmia r11!, {r6, r7, r8, r9}
> -
> -subs  r5, r5, #16
> -bgt   1b
> -
> -/* Flush destination from dcache using:
> - * r5: counter
> - * r6: step
> - * r7: vaddr
> - */
> -dsb/* So the CPU issues all writes to the range */
> -
> -mov   r5, r4
> -ldr   r6, =dcache_line_bytes /* r6 := step */
> -ldr   r6, [r6]
> -mov   r7, r3
> -
> -1:  mcr   CP32(r7, DCCMVAC)
> -
> -add   r7, r7, r6
> -subs  r5, r5, r6
> -bgt   1b
> -
> +ENTRY(switch_ttbr)
>  dsb/* Ensure the flushes happen before
>  * continuing */
>  isb/* Ensure synchronization with 
> previous
> @@ -543,8 +497,6 @@ ENTRY(relocate_xen)
>  dsb/* Ensure completion of TLB+BP flush 
> */
>  isb
>  
> -pop {r4, r5,r6,r7,r8,r9,r10,r11}
> -
>  mov pc, lr
>  
>  #ifdef CONFIG_EARLY_PRINTK
> diff --git a/xen/arch/arm/arm64/head.S b/xen/arch/arm/arm64/head.S
> index 9428c3f5a2..4589a37874 100644
> --- a/xen/arch/arm/arm64/head.S
> +++ b/xen/arch/arm/arm64/head.S
> @@ -605,52 +605,14 @@ fail:   PRINT("- Boot failed -\r\n")
>  
>  GLOBAL(_end_boot)
>  
> -/* Copy Xen to new location and switch TTBR
> - * x0ttbr
> - * x1source address
> - * x2destination address
> - * x3length
> +/*
> + * Switch TTBR
>   *
> - * Source and destination must be word aligned, length is rounded up
> - * to a 16 byte boundary.
> + * x0ttbr
>   *
> - * MUST BE VERY CAREFUL when saving things to RAM over the copy */
> -ENTRY(relocate_xen)
> -/* Copy 16 bytes at a time using:
> - *   x9: counter
> - *   x10: data
> - *   x11: data
> - *   x12: source
> - *   x13: destination
> - */
> -mov x9, x3
> -mov x12, x1
> -mov x13, x2
> -
> -1:  ldp x10, x11, [x12], #16
> -stp x10, x11, [x13], #16
> -
> -subsx9, x9, #16
> -bgt 1b
> -
> -/* Flush destination from dcache using:
> - * x9: counter
> - * x10: step
> - * x11: vaddr
> - */
> -dsb   sy/* So the CPU issues all writes to the range */
> -
> -mov   x9, x3
> -ldr   x10, =dcache_line_bytes /* x10 := step */
> -ldr   x10, [x10]
> -   

Re: [Xen-devel] [PATCH 1/2] xen/arm: mm: Set-up page permission for Xen mappings earlier on

2018-12-12 Thread Stefano Stabellini
On Thu, 29 Nov 2018, Julien Grall wrote:
> Xen mapping is first create using a 2MB page and then shatterred in 4KB
> page for fine-graine permission. However, it is not safe to break-down
> superpage page without going to an intermediate step invalidating
> the entry.
> 
> As we are changing Xen mappings, we cannot go through the intermediate
> step. The only solution is to create Xen mapping using 4KB entries
> directly. As the Xen should always access the mappings according with
> the runtime permission, it is then possible to set-up the permissions
> while create the mapping.
> 
> We are still playing with the fire as there are still some
> break-before-make issue in setup_pagetables (i.e switch between 2 sets of
> page-tables). But it should slightly be better than the current state.
> 
> Signed-off-by: Julien Grall 
> Reported-by: Shameerali Kolothum Thodi 
> Reported-by: Jan-Peter Larsson 

Reviewed-by: Stefano Stabellini 

and committed


> ---
> I had few reports on new platforms where Xen reliably stale as soon as
> SCTLR.WXN is turned on. This likely happens because of not complying
> with Break-Before-Make when setting-up the permission as we
> break-down a superpage to 4KB mappings.
> ---
>  xen/arch/arm/mm.c | 49 ++---
>  1 file changed, 22 insertions(+), 27 deletions(-)
> 
> diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
> index 987fcb9162..2556e57a99 100644
> --- a/xen/arch/arm/mm.c
> +++ b/xen/arch/arm/mm.c
> @@ -649,11 +649,31 @@ void __init setup_pagetables(unsigned long 
> boot_phys_offset, paddr_t xen_paddr)
>  }
>  #endif
>  
> +/* Break up the Xen mapping into 4k pages and protect them separately. */
> +for ( i = 0; i < LPAE_ENTRIES; i++ )
> +{
> +mfn_t mfn = mfn_add(maddr_to_mfn(xen_paddr), i);
> +unsigned long va = XEN_VIRT_START + (i << PAGE_SHIFT);
> +
> +if ( !is_kernel(va) )
> +break;
> +pte = mfn_to_xen_entry(mfn, MT_NORMAL);
> +pte.pt.table = 1; /* 4k mappings always have this bit set */
> +if ( is_kernel_text(va) || is_kernel_inittext(va) )
> +{
> +pte.pt.xn = 0;
> +pte.pt.ro = 1;
> +}
> +if ( is_kernel_rodata(va) )
> +pte.pt.ro = 1;
> +xen_xenmap[i] = pte;
> +}
> +
>  /* Initialise xen second level entries ... */
>  /* ... Xen's text etc */
>  
> -pte = mfn_to_xen_entry(maddr_to_mfn(xen_paddr), MT_NORMAL);
> -pte.pt.xn = 0;/* Contains our text mapping! */
> +pte = pte_of_xenaddr((vaddr_t)xen_xenmap);
> +pte.pt.table = 1;
>  xen_second[second_table_offset(XEN_VIRT_START)] = pte;
>  
>  /* ... Fixmap */
> @@ -693,31 +713,6 @@ void __init setup_pagetables(unsigned long 
> boot_phys_offset, paddr_t xen_paddr)
>  clear_table(boot_second);
>  clear_table(boot_third);
>  
> -/* Break up the Xen mapping into 4k pages and protect them separately. */
> -for ( i = 0; i < LPAE_ENTRIES; i++ )
> -{
> -mfn_t mfn = mfn_add(maddr_to_mfn(xen_paddr), i);
> -unsigned long va = XEN_VIRT_START + (i << PAGE_SHIFT);
> -if ( !is_kernel(va) )
> -break;
> -pte = mfn_to_xen_entry(mfn, MT_NORMAL);
> -pte.pt.table = 1; /* 4k mappings always have this bit set */
> -if ( is_kernel_text(va) || is_kernel_inittext(va) )
> -{
> -pte.pt.xn = 0;
> -pte.pt.ro = 1;
> -}
> -if ( is_kernel_rodata(va) )
> -pte.pt.ro = 1;
> -write_pte(xen_xenmap + i, pte);
> -/* No flush required here as page table is not hooked in yet. */
> -}
> -
> -pte = pte_of_xenaddr((vaddr_t)xen_xenmap);
> -pte.pt.table = 1;
> -write_pte(xen_second + second_linear_offset(XEN_VIRT_START), pte);
> -/* TLBFLUSH and ISB would be needed here, but wait until we set WXN */
> -
>  /* From now on, no mapping may be both writable and executable. */
>  WRITE_SYSREG32(READ_SYSREG32(SCTLR_EL2) | SCTLR_WXN, SCTLR_EL2);
>  /* Flush everything after setting WXN bit. */
> -- 
> 2.11.0
> 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [ovmf test] 131245: regressions - FAIL

2018-12-12 Thread osstest service owner
flight 131245 ovmf real [real]
http://logs.test-lab.xenproject.org/osstest/logs/131245/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-i3866 xen-buildfail REGR. vs. 129475
 build-i386-xsm6 xen-buildfail REGR. vs. 129475
 build-amd64   6 xen-buildfail REGR. vs. 129475
 build-amd64-xsm   6 xen-buildfail REGR. vs. 129475

Tests which did not succeed, but are not blocking:
 test-amd64-i386-xl-qemuu-ovmf-amd64  1 build-check(1)  blocked n/a
 build-amd64-libvirt   1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemuu-ovmf-amd64  1 build-check(1) blocked n/a
 build-i386-libvirt1 build-check(1)   blocked  n/a

version targeted for testing:
 ovmf e07092edca8442db4a941dbeea0cd196c7bf8ec9
baseline version:
 ovmf 5ae3184d8c59f7bbb84bad482df6b8020ba58188

Last test of basis   129475  2018-11-05 21:13:11 Z   37 days
Failing since129526  2018-11-06 20:49:26 Z   36 days  152 attempts
Testing same since   131245  2018-12-11 20:48:15 Z1 days1 attempts


People who touched revisions under test:
  Achin Gupta 
  Ard Biesheuvel 
  Bob Feng 
  bob.c.f...@intel.com 
  BobCF 
  Chasel Chiu 
  Chasel, Chiu 
  Chen A Chen 
  Dandan Bi 
  David Wei 
  Eric Dong 
  Feng, Bob C 
  Fu Siyuan 
  Gary Lin 
  Hao Wu 
  Jaben Carsey 
  Jeff Brasen 
  Jian J Wang 
  Jiaxin Wu 
  Jiewen Yao 
  Laszlo Ersek 
  Leif Lindholm 
  Liming Gao 
  Liu Yu 
  Marc Zyngier 
  Marcin Wojtas 
  Ming Huang 
  Pedroa Liu 
  Ruiyu Ni 
  shenglei 
  Shenglei Zhang 
  Star Zeng 
  Sughosh Ganu 
  Sumit Garg 
  Sun, Zailiang 
  Thomas Abraham 
  Tomasz Michalec 
  Vijayenthiran Subramaniam 
  Wang BinX A 
  Wu Jiaxin 
  Yonghong Zhu 
  yuchenlin 
  Zailiang Sun 
  Zhang, Chao B 
  Zhao, ZhiqiangX 
  ZhiqiangX Zhao 
  zwei4 

jobs:
 build-amd64-xsm  fail
 build-i386-xsm   fail
 build-amd64  fail
 build-i386   fail
 build-amd64-libvirt  blocked 
 build-i386-libvirt   blocked 
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 blocked 
 test-amd64-i386-xl-qemuu-ovmf-amd64  blocked 



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Not pushing.

(No revision log; it would be 3391 lines long.)

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v9 0/7] KVM: x86: Allow Qemu/KVM to use PVH entry point

2018-12-12 Thread Maran Wilson



On 12/12/2018 12:39 PM, Borislav Petkov wrote:

On Tue, Dec 11, 2018 at 11:29:21AM -0800, Maran Wilson wrote:

Is your question about what options you need to provide to Qemu? Or is your
question about the SW implementation choices?

Assuming the former...

Yeah, that's what I wanted to know. But looking at it, I'm booting
bzImage here just as quickly and as flexible so I don't see the
advantage of this new method for my use case here of booting kernels
in qemu.

But maybe there's a good use case where firmware is slow and one doesn't
really wanna noodle through it or when one does start a gazillion VMs
per second or whatever...


Right, the time saved is not something you would notice while starting a 
VM manually. But it does reduce the time to reach startup_64() in Linux 
by about 50% (going from around 94ms to around 47ms) when booting a VM 
using Qemu+qboot (for example). That time savings becomes pretty 
important when you are trying to use VMs as containers (for instance, as 
is the case with Kata containers) and trying to get the latency for 
launching such a container really low -- to come as close as possible to 
match the latency for launching more traditional containers that don't 
have the additional security/isolation of running within a separate VM.


Thanks,
-Maran



Thx.




___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [xen-unstable test] 131233: regressions - FAIL

2018-12-12 Thread osstest service owner
flight 131233 xen-unstable real [real]
http://logs.test-lab.xenproject.org/osstest/logs/131233/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-amd64-examine  4 memdisk-try-append   fail REGR. vs. 130985

Regressions which are regarded as allowable (not blocking):
 test-armhf-armhf-xl-rtds16 guest-start/debian.repeat fail REGR. vs. 130985

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemut-win7-amd64 17 guest-stopfail like 130985
 test-armhf-armhf-libvirt 14 saverestore-support-checkfail  like 130985
 test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stopfail like 130985
 test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop fail like 130985
 test-amd64-i386-xl-qemut-win7-amd64 17 guest-stop fail like 130985
 test-armhf-armhf-libvirt-raw 13 saverestore-support-checkfail  like 130985
 test-amd64-amd64-xl-qemuu-ws16-amd64 17 guest-stopfail like 130985
 test-amd64-amd64-xl-qemut-ws16-amd64 17 guest-stopfail like 130985
 test-amd64-i386-xl-qemuu-ws16-amd64 17 guest-stop fail like 130985
 test-amd64-amd64-libvirt 13 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  13 migrate-support-checkfail   never pass
 test-amd64-i386-xl-pvshim12 guest-start  fail   never pass
 test-amd64-i386-libvirt-xsm  13 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-armhf-armhf-xl-arndale  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  14 saverestore-support-checkfail   never pass
 test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2  fail never pass
 test-amd64-amd64-libvirt-vhd 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  13 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt 13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 13 migrate-support-checkfail never pass
 test-armhf-armhf-xl-multivcpu 13 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-cubietruck 14 saverestore-support-checkfail never pass
 test-armhf-armhf-xl-multivcpu 14 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  14 saverestore-support-checkfail   never pass
 test-amd64-i386-xl-qemut-ws16-amd64 17 guest-stop  fail never pass
 test-armhf-armhf-xl-vhd  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  13 saverestore-support-checkfail   never pass
 test-amd64-amd64-xl-qemuu-win10-i386 10 windows-installfail never pass
 test-amd64-i386-xl-qemuu-win10-i386 10 windows-install fail never pass
 test-amd64-amd64-xl-qemut-win10-i386 10 windows-installfail never pass
 test-amd64-i386-xl-qemut-win10-i386 10 windows-install fail never pass

version targeted for testing:
 xen  76a68b902062a5ca9053f5cf6a3ab46148cb45f2
baseline version:
 xen  82855aba5bf91e50c81526167c11d4aeaf665e66

Last test of basis   130985  2018-12-03 17:11:24 Z9 days
Failing since131065  2018-12-05 19:14:18 Z7 days5 attempts
Testing same since   131233  2018-12-11 13:49:11 Z1 days1 attempts


People who touched revisions under test:
  Andrew Cooper 
  Anthony PERARD 
  Doug Goldstein 
  Ian Jackson 
  Jan Beulich 
  Juergen Gross 
  Julien Grall 
  Kevin Tian 
  Oleksandr Tyshchenko 
  Paul Durrant 
  Roger Pau Monné 
  Wei Liu 

jobs:
 build-amd64-xsm  pass
 build-i386-xsm   pass
 build-amd64-xtf  pass
 build-amd64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-armhf-libvirt 

[Xen-devel] [linux-3.18 bisection] complete test-amd64-amd64-pair

2018-12-12 Thread osstest service owner
branch xen-unstable
xenbranch xen-unstable
job test-amd64-amd64-pair
testid xen-boot/dst_host

Tree: linux 
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git
Tree: linuxfirmware git://xenbits.xen.org/osstest/linux-firmware.git
Tree: qemu git://xenbits.xen.org/qemu-xen-traditional.git
Tree: qemuu git://xenbits.xen.org/qemu-xen.git
Tree: xen git://xenbits.xen.org/xen.git

*** Found and reproduced problem changeset ***

  Bug is in tree:  linux 
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git
  Bug introduced:  7b8052e19304865477e03a0047062d977309a22f
  Bug not present: d255d18a34a8d53ccc4a019dc07e17b6e8cf6bd1
  Last fail repro: http://logs.test-lab.xenproject.org/osstest/logs/131278/


  commit 7b8052e19304865477e03a0047062d977309a22f
  Author: Jan Beulich 
  Date:   Mon Oct 19 04:23:29 2015 -0600
  
  igb: fix NULL derefs due to skipped SR-IOV enabling
  
  [ Upstream commit be06998f96ecb93938ad2cce46c4289bf7cf45bc ]
  
  The combined effect of commits 6423fc3416 ("igb: do not re-init SR-IOV
  during probe") and ceee3450b3 ("igb: make sure SR-IOV init uses the
  right number of queues") causes VFs no longer getting set up, leading
  to NULL pointer dereferences due to the adapter's ->vf_data being NULL
  while ->vfs_allocated_count is non-zero. The first commit not only
  neglected the side effect of igb_sriov_reinit() that the second commit
  tried to account for, but also that of setting IGB_FLAG_HAS_MSIX,
  without which igb_enable_sriov() is effectively a no-op. Calling
  igb_{,re}set_interrupt_capability() as done here seems to address this,
  but I'm not sure whether this is better than sinply reverting the other
  two commits.
  
  Signed-off-by: Jan Beulich 
  Tested-by: Aaron Brown 
  Signed-off-by: Jeff Kirsher 
  Signed-off-by: Sasha Levin 


For bisection revision-tuple graph see:
   
http://logs.test-lab.xenproject.org/osstest/results/bisect/linux-3.18/test-amd64-amd64-pair.xen-boot--dst_host.html
Revision IDs in each graph node refer, respectively, to the Trees above.


Running cs-bisection-step 
--graph-out=/home/logs/results/bisect/linux-3.18/test-amd64-amd64-pair.xen-boot--dst_host
 --summary-out=tmp/131280.bisection-summary --basis-template=128858 
--blessings=real,real-bisect --flight=131280 linux-3.18 test-amd64-amd64-pair 
xen-boot/dst_host
Searching for failure / basis pass:
 131231 fail [dst_host=debina1,src_host=debina0] / 130367 
[dst_host=albana0,src_host=albana1] 130203 [dst_host=albana0,src_host=albana1] 
130067 [dst_host=pinot1,src_host=pinot0] 129845 
[dst_host=albana1,src_host=albana0] 129760 
[dst_host=godello1,src_host=godello0] 128858 
[dst_host=elbling0,src_host=elbling1] 128841 
[dst_host=elbling0,src_host=elbling1] 128807 
[dst_host=godello1,src_host=godello0] 128691 [dst_host=fiano1,src_host=fiano0] 
128258 [dst_host=fiano1,src_host=fiano0] 128232 
[dst_host=albana1,src_host=albana0] 128177 ok.
Failure / basis pass flights: 131231 / 128177
(tree with no url: minios)
(tree with no url: ovmf)
(tree with no url: seabios)
Tree: linux 
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git
Tree: linuxfirmware git://xenbits.xen.org/osstest/linux-firmware.git
Tree: qemu git://xenbits.xen.org/qemu-xen-traditional.git
Tree: qemuu git://xenbits.xen.org/qemu-xen.git
Tree: xen git://xenbits.xen.org/xen.git
Latest 3879c163e8681939b1d93139521aee983623884f 
c530a75c1e6a472b0eb9558310b518f0dfcd8860 
d0d8ad39ecb51cd7497cd524484fe09f50876798 
de5b678ca4dcdfa83e322491d478d66df56c1986 
82855aba5bf91e50c81526167c11d4aeaf665e66
Basis pass 921b2fed6a79439ef1609ef4af0ada5cccb3555c 
c530a75c1e6a472b0eb9558310b518f0dfcd8860 
9c0eed618f37dd5b4a57c8b3fbc48ef8913e3149 
de5b678ca4dcdfa83e322491d478d66df56c1986 
940185b2f6f343251c2b83bd96e599398cea51ec
Generating revisions with ./adhoc-revtuple-generator  
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git#921b2fed6a79439ef1609ef4af0ada5cccb3555c-3879c163e8681939b1d93139521aee983623884f
 
git://xenbits.xen.org/osstest/linux-firmware.git#c530a75c1e6a472b0eb9558310b518f0dfcd8860-c530a75c1e6a472b0eb9558310b518f0dfcd8860
 
git://xenbits.xen.org/qemu-xen-traditional.git#9c0eed618f37dd5b4a57c8b3fbc48ef8913e3149-d0d8ad39ecb51cd7497cd524484fe09f50876798
 
git://xenbits.xen.org/qemu-xen.git#de5b678ca4dcdfa83e322491d478d66df56c1986-de5b678ca4dcdfa83e322491d478d66df56c1986
 
git://xenbits.xen.org/xen.git#940185b2f6f343251c2b83bd96e599398cea51ec-82855aba5bf91e50c81526167c11d4aeaf665e66
Loaded 3004 nodes in revision graph
Searching for test results:
 128096 [dst_host=chardonnay1,src_host=chardonnay0]
 128177 pass 921b2fed6a79439ef1609ef4af0ada5cccb3555c 
c530a75c1e6a472b0eb9558310b518f0dfcd8860 
9c0eed618f37dd5b4a57c8b3fbc48ef8913e3149 
de5b678ca4dcdfa83e322491d478d66df56c1986 
940185b2f6f343251c2b83bd96e599398cea51ec
 128232 [dst_host=albana1,src_host=albana0]
 128258 

[Xen-devel] [linux-3.18 bisection] complete test-amd64-amd64-pair

2018-12-12 Thread osstest service owner
branch xen-unstable
xenbranch xen-unstable
job test-amd64-amd64-pair
testid xen-boot/src_host

Tree: linux 
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git
Tree: linuxfirmware git://xenbits.xen.org/osstest/linux-firmware.git
Tree: qemu git://xenbits.xen.org/qemu-xen-traditional.git
Tree: qemuu git://xenbits.xen.org/qemu-xen.git
Tree: xen git://xenbits.xen.org/xen.git

*** Found and reproduced problem changeset ***

  Bug is in tree:  linux 
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git
  Bug introduced:  7b8052e19304865477e03a0047062d977309a22f
  Bug not present: d255d18a34a8d53ccc4a019dc07e17b6e8cf6bd1
  Last fail repro: http://logs.test-lab.xenproject.org/osstest/logs/131278/


  commit 7b8052e19304865477e03a0047062d977309a22f
  Author: Jan Beulich 
  Date:   Mon Oct 19 04:23:29 2015 -0600
  
  igb: fix NULL derefs due to skipped SR-IOV enabling
  
  [ Upstream commit be06998f96ecb93938ad2cce46c4289bf7cf45bc ]
  
  The combined effect of commits 6423fc3416 ("igb: do not re-init SR-IOV
  during probe") and ceee3450b3 ("igb: make sure SR-IOV init uses the
  right number of queues") causes VFs no longer getting set up, leading
  to NULL pointer dereferences due to the adapter's ->vf_data being NULL
  while ->vfs_allocated_count is non-zero. The first commit not only
  neglected the side effect of igb_sriov_reinit() that the second commit
  tried to account for, but also that of setting IGB_FLAG_HAS_MSIX,
  without which igb_enable_sriov() is effectively a no-op. Calling
  igb_{,re}set_interrupt_capability() as done here seems to address this,
  but I'm not sure whether this is better than sinply reverting the other
  two commits.
  
  Signed-off-by: Jan Beulich 
  Tested-by: Aaron Brown 
  Signed-off-by: Jeff Kirsher 
  Signed-off-by: Sasha Levin 


For bisection revision-tuple graph see:
   
http://logs.test-lab.xenproject.org/osstest/results/bisect/linux-3.18/test-amd64-amd64-pair.xen-boot--src_host.html
Revision IDs in each graph node refer, respectively, to the Trees above.


Running cs-bisection-step 
--graph-out=/home/logs/results/bisect/linux-3.18/test-amd64-amd64-pair.xen-boot--src_host
 --summary-out=tmp/131278.bisection-summary --basis-template=128858 
--blessings=real,real-bisect linux-3.18 test-amd64-amd64-pair xen-boot/src_host
Searching for failure / basis pass:
 131231 fail [dst_host=debina1,src_host=debina0] / 130367 
[dst_host=albana0,src_host=albana1] 130203 [dst_host=albana0,src_host=albana1] 
130067 [dst_host=pinot1,src_host=pinot0] 129845 
[dst_host=albana1,src_host=albana0] 129760 
[dst_host=godello1,src_host=godello0] 128858 
[dst_host=elbling0,src_host=elbling1] 128841 
[dst_host=elbling0,src_host=elbling1] 128807 
[dst_host=godello1,src_host=godello0] 128691 [dst_host=fiano1,src_host=fiano0] 
128258 [dst_host=fiano1,src_host=fiano0] 128232 
[dst_host=albana1,src_host=albana0] 128177 ok.
Failure / basis pass flights: 131231 / 128177
(tree with no url: minios)
(tree with no url: ovmf)
(tree with no url: seabios)
Tree: linux 
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git
Tree: linuxfirmware git://xenbits.xen.org/osstest/linux-firmware.git
Tree: qemu git://xenbits.xen.org/qemu-xen-traditional.git
Tree: qemuu git://xenbits.xen.org/qemu-xen.git
Tree: xen git://xenbits.xen.org/xen.git
Latest 3879c163e8681939b1d93139521aee983623884f 
c530a75c1e6a472b0eb9558310b518f0dfcd8860 
d0d8ad39ecb51cd7497cd524484fe09f50876798 
de5b678ca4dcdfa83e322491d478d66df56c1986 
82855aba5bf91e50c81526167c11d4aeaf665e66
Basis pass 921b2fed6a79439ef1609ef4af0ada5cccb3555c 
c530a75c1e6a472b0eb9558310b518f0dfcd8860 
9c0eed618f37dd5b4a57c8b3fbc48ef8913e3149 
de5b678ca4dcdfa83e322491d478d66df56c1986 
940185b2f6f343251c2b83bd96e599398cea51ec
Generating revisions with ./adhoc-revtuple-generator  
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git#921b2fed6a79439ef1609ef4af0ada5cccb3555c-3879c163e8681939b1d93139521aee983623884f
 
git://xenbits.xen.org/osstest/linux-firmware.git#c530a75c1e6a472b0eb9558310b518f0dfcd8860-c530a75c1e6a472b0eb9558310b518f0dfcd8860
 
git://xenbits.xen.org/qemu-xen-traditional.git#9c0eed618f37dd5b4a57c8b3fbc48ef8913e3149-d0d8ad39ecb51cd7497cd524484fe09f50876798
 
git://xenbits.xen.org/qemu-xen.git#de5b678ca4dcdfa83e322491d478d66df56c1986-de5b678ca4dcdfa83e322491d478d66df56c1986
 
git://xenbits.xen.org/xen.git#940185b2f6f343251c2b83bd96e599398cea51ec-82855aba5bf91e50c81526167c11d4aeaf665e66
Loaded 3004 nodes in revision graph
Searching for test results:
 128096 [dst_host=chardonnay1,src_host=chardonnay0]
 128177 pass 921b2fed6a79439ef1609ef4af0ada5cccb3555c 
c530a75c1e6a472b0eb9558310b518f0dfcd8860 
9c0eed618f37dd5b4a57c8b3fbc48ef8913e3149 
de5b678ca4dcdfa83e322491d478d66df56c1986 
940185b2f6f343251c2b83bd96e599398cea51ec
 128232 [dst_host=albana1,src_host=albana0]
 128258 

[Xen-devel] [linux-3.18 test] 131231: regressions - FAIL

2018-12-12 Thread osstest service owner
flight 131231 linux-3.18 real [real]
http://logs.test-lab.xenproject.org/osstest/logs/131231/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-i386-examine   8 reboot   fail REGR. vs. 128858
 test-amd64-i386-xl-qemuu-debianhvm-amd64-shadow 7 xen-boot fail REGR. vs. 
128858
 test-amd64-i386-xl-raw7 xen-boot fail REGR. vs. 128858
 test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsm 7 xen-boot fail REGR. 
vs. 128858
 test-amd64-amd64-xl-qemuu-dmrestrict-amd64-dmrestrict 7 xen-boot fail REGR. 
vs. 128858
 test-amd64-amd64-xl-multivcpu  7 xen-bootfail REGR. vs. 128858
 test-amd64-i386-xl7 xen-boot fail REGR. vs. 128858
 test-amd64-i386-xl-qemuu-debianhvm-amd64  7 xen-boot fail REGR. vs. 128858
 test-amd64-i386-pair 10 xen-boot/src_hostfail REGR. vs. 128858
 test-amd64-i386-pair 11 xen-boot/dst_hostfail REGR. vs. 128858
 test-amd64-amd64-xl-pvshim7 xen-boot fail REGR. vs. 128858
 test-amd64-amd64-xl-xsm   7 xen-boot fail REGR. vs. 128858
 test-amd64-i386-xl-qemuu-ovmf-amd64  7 xen-boot  fail REGR. vs. 128858
 test-amd64-i386-libvirt-pair 10 xen-boot/src_hostfail REGR. vs. 128858
 test-amd64-i386-libvirt-pair 11 xen-boot/dst_hostfail REGR. vs. 128858
 test-amd64-amd64-rumprun-amd64  7 xen-boot   fail REGR. vs. 128858
 test-amd64-amd64-libvirt-pair 10 xen-boot/src_host   fail REGR. vs. 128858
 test-amd64-amd64-libvirt-pair 11 xen-boot/dst_host   fail REGR. vs. 128858
 test-amd64-amd64-xl-qemuu-ovmf-amd64  7 xen-boot fail REGR. vs. 128858
 test-amd64-amd64-pair10 xen-boot/src_hostfail REGR. vs. 128858
 test-amd64-amd64-pair11 xen-boot/dst_hostfail REGR. vs. 128858

Tests which are failing intermittently (not blocking):
 test-armhf-armhf-xl-multivcpu 17 guest-start.2   fail in 131095 pass in 131231
 test-amd64-i386-qemut-rhel6hvm-amd 12 guest-start/redhat.repeat fail in 131192 
pass in 131231
 test-amd64-i386-rumprun-i386  7 xen-boot   fail pass in 131095
 test-amd64-i386-freebsd10-amd64 14 guest-saverestore   fail pass in 131192

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-xl-vhd  10 debian-di-install   fail in 131095 like 128841
 test-armhf-armhf-libvirt-raw 10 debian-di-install   fail in 131095 like 128841
 test-amd64-amd64-examine  4 memdisk-try-append  fail in 131192 like 128807
 test-armhf-armhf-libvirt 14 saverestore-support-checkfail  like 128858
 test-amd64-amd64-xl-qemut-win7-amd64 17 guest-stopfail like 128858
 test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stopfail like 128858
 test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop fail like 128858
 test-amd64-i386-xl-qemut-win7-amd64 17 guest-stop fail like 128858
 test-armhf-armhf-libvirt-raw 13 saverestore-support-checkfail  like 128858
 test-amd64-i386-xl-qemuu-dmrestrict-amd64-dmrestrict 10 debian-hvm-install 
fail never pass
 test-amd64-amd64-xl-pvhv2-intel 12 guest-start fail never pass
 test-amd64-amd64-xl-pvhv2-amd 12 guest-start  fail  never pass
 test-amd64-i386-xl-pvshim12 guest-start  fail   never pass
 test-amd64-amd64-libvirt 13 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  13 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  13 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2  fail never pass
 test-armhf-armhf-xl-arndale  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  14 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 12 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt 13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 13 migrate-support-checkfail never pass
 test-armhf-armhf-xl-multivcpu 13 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-cubietruck 14 saverestore-support-checkfail never pass
 test-armhf-armhf-xl-multivcpu 14 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-credit1  13 migrate-support-check  

Re: [Xen-devel] [PATCH v9 0/7] KVM: x86: Allow Qemu/KVM to use PVH entry point

2018-12-12 Thread Borislav Petkov
On Tue, Dec 11, 2018 at 11:29:21AM -0800, Maran Wilson wrote:
> Is your question about what options you need to provide to Qemu? Or is your
> question about the SW implementation choices?
> 
> Assuming the former...

Yeah, that's what I wanted to know. But looking at it, I'm booting
bzImage here just as quickly and as flexible so I don't see the
advantage of this new method for my use case here of booting kernels
in qemu.

But maybe there's a good use case where firmware is slow and one doesn't
really wanna noodle through it or when one does start a gazillion VMs
per second or whatever...

Thx.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [xen-unstable-smoke test] 131274: tolerable all pass - PUSHED

2018-12-12 Thread osstest service owner
flight 131274 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/131274/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt 13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  14 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  9c35572f359cd6f71aa20b0991c74e032b8721d2
baseline version:
 xen  a9c904c5a827144eb722cfb46634c60b739e19eb

Last test of basis   131246  2018-12-11 21:00:46 Z0 days
Testing same since   131274  2018-12-12 17:00:36 Z0 days1 attempts


People who touched revisions under test:
  Julien Grall 
  Razvan Cojocaru 
  Stefano Stabellini 

jobs:
 build-amd64  pass
 build-armhf  pass
 build-amd64-libvirt  pass
 test-armhf-armhf-xl  pass
 test-amd64-amd64-xl-qemuu-debianhvm-i386 pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/xen.git
   a9c904c5a8..9c35572f35  9c35572f359cd6f71aa20b0991c74e032b8721d2 -> smoke

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 2/2] arm/irq: skip action avalability check for non-debug build

2018-12-12 Thread Stefano Stabellini
On Wed, 12 Dec 2018, Julien Grall wrote:
> Title: s/avalability/availability/
> 
> On 12/12/2018 16:55, Andrii Anisov wrote:
> > From: Andrii Anisov 
> > 
> > An IRQ with _IRQ_GUEST flag set always has an action.
> > An IRQ with _IRQ_DISABLED flag cleared always have an action.
> 
> s/have/has/
> 
> Those conditions are not sufficient to ensure desc->action is not NULL. You
> also need to take the spinlock.
> 
> While looking at the code, I noticed an interesting race with the release
> code. Guest IRQ are released using the function gic_remove_irq_to_guest. The
> sequence is roughly:
> 
> 1) spin_lock(desc->lock);
> 2) writel(desc->irq, ICENABLER);
> 3) set_bit(_IRQ_DISABLED, >status);
> 4) clear_bit(_IRQ_GUES, >status);
> 5) desc->handler = _irq_type;
> 6) spin_unlock(desc->lock);
> 
> Even if 2) will disable the interrupt in the hardware, the interrupt may have
> been received earlier on another CPU and waiting on the lock. As soon as the
> lock is taken, the code will notice the irq disabled (thanks to 3)) and will
> then end the interrupt. The callbak end for no_irq_type is a NOP, therefore
> the interrupt will stay active and the priority will not be dropped.
> 
> Because of that, the CPU will never be able to receive interrupt for guest
> anymore. AFAICT, this can only happen if an interrupt is received while
> destroying the assigned domain.
> 
> I think 5) should be replaced with
> 
> desc->handler = gic_hw_ops->gic_host_irq_type;
> 
> Or we potentially need to update no_irq_type and EOI "spurious interrupt".
> 
> I am not entirely sure which way is the best to address the race. Any
> opinions?

I think that changing the .end function of no_irq_type to be the same as
the end function of the host_irq_type controller is the safest option:
yes no_irq_type means no irqs but if we receive an interrupt we should
still EOI it no matter what.

 
> > Those flags checks cover all accesses to desc->action in do_IRQ, > so we can
> > skip desc->action check.
> 
> "in non-debug build".
> 
> > Still keep it in place for debug build.
> 
> "Keep in place for debug build to help diagnostics potential
> misconfiguration".
> 
> > 
> > Signed-off-by: Andrii Anisov 
> > Reviewed-by: Julien Grall 
> 
> Please don't add a reviewed-by tag until it was explicitly written by the
> reviewer.
> 
> > ---
> >   xen/arch/arm/irq.c | 2 ++
> >   1 file changed, 2 insertions(+)
> > 
> > diff --git a/xen/arch/arm/irq.c b/xen/arch/arm/irq.c
> > index d6a0273..4a02cc1 100644
> > --- a/xen/arch/arm/irq.c
> > +++ b/xen/arch/arm/irq.c
> > @@ -209,12 +209,14 @@ void do_IRQ(struct cpu_user_regs *regs, unsigned int
> > irq, int is_fiq)
> >   spin_lock(>lock);
> >   desc->handler->ack(desc);
> >   +#ifndef NDEBUG
> >   if ( !desc->action )
> >   {
> >   printk("Unknown %s %#3.3x\n",
> >  is_fiq ? "FIQ" : "IRQ", irq);
> >   goto out;
> >   }
> > +#endif
> > if ( test_bit(_IRQ_GUEST, >status) )
> >   {
> > 
> 
> Cheers,
> 
> -- 
> Julien Grall
> 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [linux-linus test] 131224: regressions - FAIL

2018-12-12 Thread osstest service owner
flight 131224 linux-linus real [real]
http://logs.test-lab.xenproject.org/osstest/logs/131224/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-amd64-pygrub   7 xen-boot fail REGR. vs. 125898
 test-amd64-amd64-xl-qemut-debianhvm-amd64-xsm 7 xen-boot fail REGR. vs. 125898
 test-amd64-amd64-xl-qemuu-win7-amd64  7 xen-boot fail REGR. vs. 125898
 test-amd64-amd64-libvirt-xsm  7 xen-boot fail REGR. vs. 125898
 test-amd64-amd64-xl-pvhv2-intel  7 xen-boot  fail REGR. vs. 125898
 test-amd64-amd64-xl-qcow2 7 xen-boot fail REGR. vs. 125898
 test-amd64-amd64-rumprun-amd64  7 xen-boot   fail REGR. vs. 125898
 test-amd64-amd64-libvirt-vhd  7 xen-boot fail REGR. vs. 125898
 test-amd64-amd64-libvirt-pair 10 xen-boot/src_host   fail REGR. vs. 125898
 test-amd64-amd64-libvirt-pair 11 xen-boot/dst_host   fail REGR. vs. 125898
 test-amd64-amd64-xl-multivcpu  7 xen-bootfail REGR. vs. 125898
 test-amd64-amd64-pair10 xen-boot/src_hostfail REGR. vs. 125898
 test-amd64-amd64-pair11 xen-boot/dst_hostfail REGR. vs. 125898
 test-amd64-i386-examine   8 reboot   fail REGR. vs. 125898
 test-amd64-i386-xl-xsm7 xen-boot fail REGR. vs. 125898
 test-amd64-i386-qemut-rhel6hvm-intel  7 xen-boot fail REGR. vs. 125898
 test-amd64-i386-xl7 xen-boot fail REGR. vs. 125898
 test-amd64-i386-xl-qemut-win10-i386  7 xen-boot  fail REGR. vs. 125898
 test-amd64-i386-xl-shadow 7 xen-boot fail REGR. vs. 125898
 test-amd64-i386-rumprun-i386  7 xen-boot fail REGR. vs. 125898
 test-amd64-i386-libvirt   7 xen-boot fail REGR. vs. 125898
 test-amd64-i386-xl-qemut-debianhvm-amd64  7 xen-boot fail REGR. vs. 125898
 test-amd64-i386-xl-qemut-debianhvm-amd64-xsm  7 xen-boot fail REGR. vs. 125898
 test-amd64-i386-pair 10 xen-boot/src_hostfail REGR. vs. 125898
 test-amd64-i386-pair 11 xen-boot/dst_hostfail REGR. vs. 125898
 test-amd64-i386-libvirt-xsm   7 xen-boot fail REGR. vs. 125898
 test-amd64-i386-libvirt-pair 10 xen-boot/src_hostfail REGR. vs. 125898
 test-amd64-i386-libvirt-pair 11 xen-boot/dst_hostfail REGR. vs. 125898
 test-amd64-amd64-examine  8 reboot   fail REGR. vs. 125898
 test-amd64-amd64-qemuu-nested-intel 17 debian-hvm-install/l1/l2 fail REGR. vs. 
125898

Regressions which are regarded as allowable (not blocking):
 test-amd64-amd64-xl-rtds  7 xen-boot fail REGR. vs. 125898
 test-armhf-armhf-xl-rtds16 guest-start/debian.repeat fail REGR. vs. 125898

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-credit1   7 xen-bootfail baseline untested
 test-amd64-amd64-xl-qemut-win7-amd64 17 guest-stopfail like 125898
 test-amd64-i386-xl-qemut-win7-amd64 17 guest-stop fail like 125898
 test-armhf-armhf-libvirt 14 saverestore-support-checkfail  like 125898
 test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop fail like 125898
 test-armhf-armhf-libvirt-raw 13 saverestore-support-checkfail  like 125898
 test-amd64-amd64-xl-qemut-ws16-amd64 17 guest-stopfail like 125898
 test-amd64-amd64-xl-qemuu-ws16-amd64 17 guest-stopfail like 125898
 test-amd64-i386-xl-pvshim12 guest-start  fail   never pass
 test-amd64-amd64-libvirt 13 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-armhf-armhf-xl-arndale  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  14 saverestore-support-checkfail   never pass
 test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2  fail never pass
 test-armhf-armhf-xl-credit1  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 13 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-credit2  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 14 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-cubietruck 13 migrate-support-checkfail never pass
 test-armhf-armhf-xl-credit2  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 14 saverestore-support-checkfail never pass
 test-armhf-armhf-xl-rtds 13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 14 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 13 migrate-support-checkfail   never pass
 

Re: [Xen-devel] [PATCH v2 05/10] libxl: Do root checks once in libxl__domain_get_device_model_uid

2018-12-12 Thread George Dunlap


> On Dec 12, 2018, at 3:45 PM, Ian Jackson  wrote:
> 
> George Dunlap writes ("[PATCH v2 05/10] libxl: Do root checks once in 
> libxl__domain_get_device_model_uid"):
>> At the moment, we check for equivalence to literal "root" before
>> deciding whether to add the `runas` command-line option to QEMU.  This
>> is unsatisfactory for several reasons.
> ...
>> v2:
>> - Refactor to use `out` rather than multiple labels
>> - Only check for root once
>> - Use 'out' rather than direct returns for errors (only use direct returns
>>  for early `succeed-without-setting-runas` paths)
>> - Use `rc` rather than `ret` to more closely align with CODING_STYLE
>> - Fill out comments about the cases we're handling
>> - Return ERROR_DEVICE_EXISTS rather than ERROR_FAIL if there's another
>>  username that maps to our calculated uid
>> - Report an error if the specified device_model_user doesn't exist
> 
> Thanks.  This is all much better now.

FWIW I agree. :-)

> Or you could treat the !user_base as an error block and write this:
> 
>> +if (!user_base) {
>> +LOGD(ERROR, guest_domid, "Couldn't find device_model_user %s",
>> + user);
>> +rc = ERROR_INVAL;
>  +goto out;
>  +}
>  +intended_uid = user_base->pw_uid;
>  +rc = 0;
>  +goto out;
>> +}

This looks good.

> 
>> +/*
>> + * If dm_restrict isn't set, and we don't have a specified user, don't
>> + * bother setting a `-runas` parameter.
>> + */
>> if (!libxl_defbool_val(b_info->dm_restrict)) {
>> LOGD(DEBUG, guest_domid,
>>  "dm_restrict disabled, starting QEMU as root");
>> return 0;
>> }
> 
> Why `return 0' here but `goto out' earlier ?  IMO all the success
> returns should be the same.

Not exactly — we only want to do the root check if we’re running as an 
alternate user.  In this case, neither device_model_user nor dm_restrict is 
set, so we’re not running as an alternate user, so we don’t want to run the 
`if(intended_uid == 0)` check on the normal ‘out’ path.

I take it you’d prefer to always jump to out, but to have the conditional be, 
`if (!rc && user)`?

> 
>> if (user_clash) {
>> LOGD(ERROR, guest_domid,
>>  "wanted to use uid %ld (%s + %d) but that is user %s !",
>>  (long)intended_uid, LIBXL_QEMU_USER_RANGE_BASE,
>>  guest_domid, user_clash->pw_name);
>> -return ERROR_FAIL;
>> +rc = ERROR_DEVICE_EXISTS;
>> +goto out;
> 
> I appreciate your desire to specify particular error value, but I am
> far from convinced that ERROR_DEVICE_EXISTS is appropriate.  It would
> lead someone to ask which device was in the way.

Consider this patch the most concise way of asking the question, “What error 
should I use in this case?” :-)

> 
> We generally use EINVAL for bad configurations.  If you prefer, feel
> free to introduce a new error code.  32-bit signed integers are pretty
> cheap.

The integers may be cheap, but scanning through trying to comprehend them is 
not. :-)

I’ll think about whether to introduce a new one or just use ERROR_INVAL.

> 
>> +if (rc < 0)
>> +goto out;
>> if (user_base) {
>> LOGD(WARN, guest_domid, "Could not find user %s, falling back to %s",
>>  LIBXL_QEMU_USER_RANGE_BASE, LIBXL_QEMU_USER_SHARED);
>> -goto end_search;
>> +intended_uid = user_base->pw_uid;
>> +goto out;
> 
> Here we have this pattern again with a `goto out' without a preceding
> assignment to `rc'.  AFAICT the rules implied by your out block are:
> 
> * Every goto out must be preceded by an assignment to rc.
>   IMO there is no reason this should not immediately precede
>   the goto out.
> 
> * Additionally, if rc is 0 then the goto out must also be preceded
>   relatively recently by an assignment to intended_uid.

Those are rules that you’re implying, not me. :-)  My `goto out` invariant in 
this patch were:

1. rc may be an error code.  In this case, rc is returned.
2. rc may be zero; if rc is zero:
 2a. user must be non-NULL,
 2b. user must be verified to exist on the system, and
 2c. intended_uid must be set to the userid reported in the previous check

In order to accept your suggestion above to replace the `return` with a `goto 
out`, I have to make the invariant as follows:

1. rc may be an error code.  In this case, rc is returned.
2. rc may be zero, and user NULL.  In this case, rc is returned.
3. rc may be zero, and user non-NULL.  In this case:
  [2b and 2c from above]

In this case, we know that rc is 0 because we just checked the value 6 lines 
earlier.  If code is ever added in between such that rc becomes non-zero, 
*that* code should be calling `goto out` (or thinking carefully about why 
falling through to this code is OK).

I’ll write redundant statements everywhere if you want, but I thought that 
would count as the kind of code duplication you wanted 

[Xen-devel] [PATCH 0/2] gic-vgic optimizations

2018-12-12 Thread Andrii Anisov
From: Andrii Anisov 

Here are few patches from RFC series [1] currently approved to
be upstreamed with appropriate changes.

Andrii Anisov (2):
  gic-vgic: Drop an excessive clear_lrs
Is a patch #5 [2], with a change:
 - Keep LR clear for debug build
No changes in v2:


  arm/irq: skip action availability check for non-debug build
Is a patch #11 [3], with a change:
 - Completely remove the check for a non-debug build,
   but preserve for debug.
Changes in v2:
 - reworded the commit title and message
 - removed an RB prematurely inserted by mistake

[1] https://lists.xenproject.org/archives/html/xen-devel/2018-11/msg03328.html
[2] https://lists.xenproject.org/archives/html/xen-devel/2018-11/msg03285.html
[3] https://lists.xenproject.org/archives/html/xen-devel/2018-11/msg03291.html

 xen/arch/arm/gic-vgic.c | 2 ++
 xen/arch/arm/irq.c  | 2 ++
 2 files changed, 4 insertions(+)

-- 
2.7.4


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v2 1/2] gic-vgic: Drop an excessive clear_lrs

2018-12-12 Thread Andrii Anisov
From: Andrii Anisov 

This action is excessive because for an invalid LR there is no need
to write another invalid value to a register. So we can skip it here,
saving a peripheral register write.
Keep clearing the LR for the DEBUG build. This would make dumped
invalid LRs be zero. That is more obvious than picking state bits
from a non-zero value.

Signed-off-by: Andrii Anisov 
Reviewed-by: Julien Grall 
---
 xen/arch/arm/gic-vgic.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/xen/arch/arm/gic-vgic.c b/xen/arch/arm/gic-vgic.c
index 990399c..48922f5 100644
--- a/xen/arch/arm/gic-vgic.c
+++ b/xen/arch/arm/gic-vgic.c
@@ -216,7 +216,9 @@ static void gic_update_one_lr(struct vcpu *v, int i)
 }
 else
 {
+#ifndef NDEBUG
 gic_hw_ops->clear_lr(i);
+#endif
 clear_bit(i, _cpu(lr_mask));
 
 if ( p->desc != NULL )
-- 
2.7.4


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v2 2/2] arm/irq: skip action availability check for non-debug build

2018-12-12 Thread Andrii Anisov
From: Andrii Anisov 

Under desc->lock taken:
An IRQ with _IRQ_GUEST flag set always has an action.
An IRQ with _IRQ_DISABLED flag cleared always has an action.
Those flags checks cover all accesses to desc->action in do_IRQ,
so we can skip desc->action check in non-debug build.
Keep in place for debug build to help diagnostics potential
misconfiguration.

Signed-off-by: Andrii Anisov 
---
 xen/arch/arm/irq.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/xen/arch/arm/irq.c b/xen/arch/arm/irq.c
index d6a0273..4a02cc1 100644
--- a/xen/arch/arm/irq.c
+++ b/xen/arch/arm/irq.c
@@ -209,12 +209,14 @@ void do_IRQ(struct cpu_user_regs *regs, unsigned int irq, 
int is_fiq)
 spin_lock(>lock);
 desc->handler->ack(desc);
 
+#ifndef NDEBUG
 if ( !desc->action )
 {
 printk("Unknown %s %#3.3x\n",
is_fiq ? "FIQ" : "IRQ", irq);
 goto out;
 }
+#endif
 
 if ( test_bit(_IRQ_GUEST, >status) )
 {
-- 
2.7.4


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 2/2] arm/irq: skip action avalability check for non-debug build

2018-12-12 Thread Julien Grall



On 12/12/2018 17:59, Andrii Anisov wrote:

On 12.12.18 19:49, Julien Grall wrote:
Those conditions are not sufficient to ensure desc->action is not NULL. You 
also need to take the spinlock.

Agree. Should I describe it as following?


Under desc->lock taken:
An IRQ with _IRQ_GUEST flag set always has an action.
An IRQ with _IRQ_DISABLED flag cleared always have an action.


This looks better.





While looking at the code, I noticed an interesting race with the release code.

As I understand the race in not directly linked to this patch. Is it correct?


That's correct. I actually noticed when checking whether the commit message was 
matching the behavior.




Guest IRQ are released using the function gic_remove_irq_to_guest. The 
sequence is roughly:


1) spin_lock(desc->lock);
2) writel(desc->irq, ICENABLER);
3) set_bit(_IRQ_DISABLED, >status);
4) clear_bit(_IRQ_GUES, >status);
5) desc->handler = _irq_type;
6) spin_unlock(desc->lock);

Even if 2) will disable the interrupt in the hardware, the interrupt may have 
been received earlier on another CPU and waiting on the lock. As soon as the 
lock is taken, the code will notice the irq disabled (thanks to 3)) and will 
then end the interrupt. The callbak end for no_irq_type is a NOP, therefore 
the interrupt will stay active and the priority will not be dropped.


Because of that, the CPU will never be able to receive interrupt for guest 
anymore. AFAICT, this can only happen if an interrupt is received while 
destroying the assigned domain.


I think 5) should be replaced with

desc->handler = gic_hw_ops->gic_host_irq_type;

Or we potentially need to update no_irq_type and EOI "spurious interrupt".

I am not entirely sure which way is the best to address the race. Any opinions?

Let me spend a bit more time to look into that

Other wording and grammatical nits will be addressed as suggested.



--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Xen optimization

2018-12-12 Thread Stefano Stabellini
On Wed, 12 Dec 2018, Andrii Anisov wrote:
> Hello Stefano,
> 
> On 12.12.18 19:39, Stefano Stabellini wrote:
> > Thanks for the good work, Andrii!
> > 
> > The WARM_MAX improvements for vwfi=native with your optimizations are
> > impressive.
> 
> I really hope you are not speaking about these numbers:
> 
> > > max=840 warm_max=120 min=120 avg=127
> 
> Those are TBM baremetal numbers in hyp mode.

I know, I was referring to your older results, sorry for the confusion.


> Did you try my RFC on your HW?


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Xen optimization

2018-12-12 Thread Dario Faggioli
On Wed, 2018-12-12 at 19:32 +0200, Andrii Anisov wrote:
> On 12.12.18 19:10, Dario Faggioli wrote:
> > I think only bisection could shed some light on this. And it would
> > be
> > wonderful if you could do that, but I understand that it takes
> > time. :-
> > /
> Well, bisect might help. But I'm really confused why MemTotal may be
> reduced.
> 
Yeah, and although difficult to admit/see the reason why, I think this
looks like it is coming from something we do in Xen. And since you say
you have an old Xen version that works, I really see bisection as the
way to go...

> > Are you absolutely sure about that? That is, are you "just"
> > assuming
> > the scheduler won't move stuff, or have you put some debugging or
> > printing in place to verify that to be the case?
> Being honest, I did not check for exactly this setup. I verified it
> for 4.10.
>
Not sure I'm getting. Are you saying that you somehow verified that on
4.10 vcpus don't move? But on 4.10 you have pinning that works, don't
you?

Or are you saying you've verified that vcpus don't move, on 4.10, even
without doing the pinning? If yes, can I ask how?

As for staging, I really can't tell, as indeed there would be no need
for them to move, but they actually could, for a number of reasons.

So, unless you, like, put printk()-s (if you can) or ASSERTS() when v-
>processor changes, I wouldn't take that for granted. :-(

Regards,
Dario
-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Software Engineer @ SUSE https://www.suse.com/


signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 2/2] arm/irq: skip action avalability check for non-debug build

2018-12-12 Thread Andrii Anisov

On 12.12.18 19:49, Julien Grall wrote:

Those conditions are not sufficient to ensure desc->action is not NULL. You 
also need to take the spinlock.

Agree. Should I describe it as following?


Under desc->lock taken:
An IRQ with _IRQ_GUEST flag set always has an action.
An IRQ with _IRQ_DISABLED flag cleared always have an action.




While looking at the code, I noticed an interesting race with the release code.

As I understand the race in not directly linked to this patch. Is it correct?


Guest IRQ are released using the function gic_remove_irq_to_guest. The sequence 
is roughly:

1) spin_lock(desc->lock);
2) writel(desc->irq, ICENABLER);
3) set_bit(_IRQ_DISABLED, >status);
4) clear_bit(_IRQ_GUES, >status);
5) desc->handler = _irq_type;
6) spin_unlock(desc->lock);

Even if 2) will disable the interrupt in the hardware, the interrupt may have 
been received earlier on another CPU and waiting on the lock. As soon as the 
lock is taken, the code will notice the irq disabled (thanks to 3)) and will 
then end the interrupt. The callbak end for no_irq_type is a NOP, therefore the 
interrupt will stay active and the priority will not be dropped.

Because of that, the CPU will never be able to receive interrupt for guest 
anymore. AFAICT, this can only happen if an interrupt is received while 
destroying the assigned domain.

I think 5) should be replaced with

desc->handler = gic_hw_ops->gic_host_irq_type;

Or we potentially need to update no_irq_type and EOI "spurious interrupt".

I am not entirely sure which way is the best to address the race. Any opinions?

Let me spend a bit more time to look into that

Other wording and grammatical nits will be addressed as suggested.

--
Sincerely,
Andrii Anisov.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 2/2] arm/irq: skip action avalability check for non-debug build

2018-12-12 Thread Andrii Anisov



On 12.12.18 19:49, Julien Grall wrote:


Please don't add a reviewed-by tag until it was explicitly written by the 
reviewer.

My bad, I mixed it with #5.


--
Sincerely,
Andrii Anisov.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 2/2] arm/irq: skip action avalability check for non-debug build

2018-12-12 Thread Julien Grall

Title: s/avalability/availability/

On 12/12/2018 16:55, Andrii Anisov wrote:

From: Andrii Anisov 

An IRQ with _IRQ_GUEST flag set always has an action.
An IRQ with _IRQ_DISABLED flag cleared always have an action.


s/have/has/

Those conditions are not sufficient to ensure desc->action is not NULL. You also 
need to take the spinlock.


While looking at the code, I noticed an interesting race with the release code. 
Guest IRQ are released using the function gic_remove_irq_to_guest. The sequence 
is roughly:


1) spin_lock(desc->lock);
2) writel(desc->irq, ICENABLER);
3) set_bit(_IRQ_DISABLED, >status);
4) clear_bit(_IRQ_GUES, >status);
5) desc->handler = _irq_type;
6) spin_unlock(desc->lock);

Even if 2) will disable the interrupt in the hardware, the interrupt may have 
been received earlier on another CPU and waiting on the lock. As soon as the 
lock is taken, the code will notice the irq disabled (thanks to 3)) and will 
then end the interrupt. The callbak end for no_irq_type is a NOP, therefore the 
interrupt will stay active and the priority will not be dropped.


Because of that, the CPU will never be able to receive interrupt for guest 
anymore. AFAICT, this can only happen if an interrupt is received while 
destroying the assigned domain.


I think 5) should be replaced with

desc->handler = gic_hw_ops->gic_host_irq_type;

Or we potentially need to update no_irq_type and EOI "spurious interrupt".

I am not entirely sure which way is the best to address the race. Any opinions?



Those flags checks cover all accesses to desc->action in do_IRQ, > so we can skip 
desc->action check.


"in non-debug build".


Still keep it in place for debug build.


"Keep in place for debug build to help diagnostics potential misconfiguration".



Signed-off-by: Andrii Anisov 
Reviewed-by: Julien Grall 


Please don't add a reviewed-by tag until it was explicitly written by the 
reviewer.


---
  xen/arch/arm/irq.c | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/xen/arch/arm/irq.c b/xen/arch/arm/irq.c
index d6a0273..4a02cc1 100644
--- a/xen/arch/arm/irq.c
+++ b/xen/arch/arm/irq.c
@@ -209,12 +209,14 @@ void do_IRQ(struct cpu_user_regs *regs, unsigned int irq, 
int is_fiq)
  spin_lock(>lock);
  desc->handler->ack(desc);
  
+#ifndef NDEBUG

  if ( !desc->action )
  {
  printk("Unknown %s %#3.3x\n",
 is_fiq ? "FIQ" : "IRQ", irq);
  goto out;
  }
+#endif
  
  if ( test_bit(_IRQ_GUEST, >status) )

  {



Cheers,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH for-4.12 v2 16/17] xen/arm: Implement Set/Way operations

2018-12-12 Thread Dario Faggioli
On Wed, 2018-12-12 at 09:25 -0800, Stefano Stabellini wrote:
> On Wed, 12 Dec 2018, Julien Grall wrote:
> > > For Dario: basically we have a long running operation to perform,
> we
> > > thought that the best place for it would be on the path returning
> to
> > > guest (leave_hypervisor_tail). The operation can interrupt itself
> > > checking sotfirq_pending() once in a while to avoid blocking the
> pcpu
> > > for too long.
> > > 
> > > The question is: is it better to check sotfirq_pending() even
> before
> > > starting? Or every so often during the operating is good enough?
> Does it
> > > even matter?
> > I am not sure to understand what is your concern here. Checking for
> > softirq_pending() often is not an issue. The issue is when we
> happen to not
> > check it. At the moment, I would prefer to be over cautious until
> we figure
> > out whether this is a real issue.
> > 
> > If you are concerned about the performance impact, this is only
> called when a
> > guest is using set/way.
> 
> Actually, I have no concerns, as I think it should make no
> difference,
> but I just wanted a second opinion.
>
Yeah, sorry. I saw the email on Monday, but then got distracted.

So, in this case, I personally don't think either solution is so much
better (or so much worse) of the other one.

In general, what's best may vary on a case-by-case basis (e.g., how
long have we been non-preemptable already, when we entering the long
running operation?).

Therefore, if I'd want to be on the safe side, I think I would check
before entering the loop (or whatever the long running op is
implemented).

The performance impact of just one more softirq_pending() check itself
should really be negligible (even considering cache effects).

Regards,
Dario
-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Software Engineer @ SUSE https://www.suse.com/


signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Xen optimization

2018-12-12 Thread Andrii Anisov

Hello Stefano,

On 12.12.18 19:39, Stefano Stabellini wrote:

Thanks for the good work, Andrii!

The WARM_MAX improvements for vwfi=native with your optimizations are 
impressive.


I really hope you are not speaking about these numbers:


max=840 warm_max=120 min=120 avg=127


Those are TBM baremetal numbers in hyp mode.

Did you try my RFC on your HW?

--
Sincerely,
Andrii Anisov.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Xen optimization

2018-12-12 Thread Stefano Stabellini
On Wed, 12 Dec 2018, Andrii Anisov wrote:
> On 12.12.18 11:46, Andrii Anisov wrote:
> > Digging into that now.
> I got it. My u-boot starts TBM in hyp mode. But them both miss setting
> HCR_EL2.IMO, so no interrupt exception was taken in hyp.
> OK, for my baremetal TBM in hyp, numbers are:
> 
> max=840 warm_max=120 min=120 avg=127
> 
> I guess, warm_max and min are one tick of the system timer. And it seems to me
> that one tick of the system timer is the lower limit of the irq latency by HW
> design.

Thanks for the good work, Andrii!

The WARM_MAX improvements for vwfi=native with your optimizations are 
impressive.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 03/10] libxl: Clean up userlookup_helper_getpw* helper

2018-12-12 Thread George Dunlap


> On Dec 12, 2018, at 3:26 PM, Ian Jackson  wrote:
> 
> George Dunlap writes ("[PATCH v2 03/10] libxl: Clean up 
> userlookup_helper_getpw* helper"):
>> Bring conventions more in line with libxl__xs_read_checked():
>> - If found, return 0 and set pointer to non-NULL
>> - If not found, return 0 and set pointer to NULL
>> - On error, return libxl-style error number.
>> 
>> Update documentation to match.
> ...
>> #define DEFINE_USERLOOKUP_HELPER(NAME,SPEC_TYPE,STRUCTNAME,SYSCONF) \
>> static int userlookup_helper_##NAME(libxl__gc *gc,  \
>> @@ -83,7 +89,7 @@ static int libxl__create_qemu_logfile(libxl__gc *gc, char 
>> *name)
>> struct STRUCTNAME *resultp = NULL;  \
> 
> git diff has excelled itself in choice of heading line, hasn't it ?
> 
>> @@ -142,14 +147,14 @@ static int 
>> libxl__domain_get_device_model_uid(libxl__gc *gc,
>>  _pwbuf, _base);
>> if (ret < 0)
>> return ret;
>> -if (ret > 0) {
>> +if (user_base) {
> 
> I would prefer to also:
> 
>  -if (ret < 0)
>  +if (ret)
>   return ret;
> 
> which is more conventional for rc and might reduce the impact of bugs
> where the function returned a positive.  (Twice.)

Looks like thrice, but sure.

> 
> With or without that change,
> 
> Acked-by: Ian Jackson 
> 
> Thanks for the cleanup!

No problem — it’s nice to have something tractable and satisfying to accomplish 
when my other active project is a seemingly never-ending ball of twine…

 -George


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC 3/3] pvh: Boot uncompressed kernel using direct boot ABI

2018-12-12 Thread Maran Wilson

On 12/12/2018 7:28 AM, Stefano Garzarella wrote:

On Tue, Dec 11, 2018 at 7:35 PM Maran Wilson  wrote:

On 12/11/2018 9:11 AM, Stefano Garzarella wrote:

Hi Liam,
in order to support PVH also with SeaBIOS, I'm going to work on a new
option rom (like linuxboot/multiboot) that can be used in this case.

That is awesome. Yes, please keep us posted when you have something working.

Yes, I'll keep you updated!


Just FYI, before switching over to using Qemu+qboot, we had been using a
Qemu only solution (but not using an option rom) internally that worked
very well using no FW at all. We had Qemu simply parse the ELF file and
jump to the PVH entry point if one is found. The only gotcha was that we
had to include a pair of patches that were originally written by folks
at Intel as part of the clear containers work. Specifically, in order to
be able to skip firmware entirely, we had to do 2 additional things: (1)
ACPI tables generated by Qemu are usually patched up by FW. Since we
were running no FW, we needed to do that patching up of the ACPI tables
in Qemu when it was detected that we were going to enter the OS via the
PVH entry point. (2) We also needed to add a patch to Qemu to enable a
few PM registers -- something typically done by FW.

I had a look of qemu-lite, are you referring to this?


Yes. More specifically, we were using a modified version of this patch:
   acpi: patch guest ACPI when loading firmware is skipped
But unlike qemu-lite, we were not using a -nofw flag, instead, just 
choosing PVH vs legacy boot based on which -kernel binary was provided 
and whether it contained the PVH ELF note.


So apply the above patch, you also need to pick up:
   acpi: expose acpi_checksum()

For a while, we had also been using patch:
   ich9: enable pm registers when there is no firmware
But that last patch can be avoided by simply selecting Hardware-Reduced 
ACPI mode when building the FADT in Qemu, when PVH boot is selected.


But you probably wont need those patches at all if you are actually 
running some version of minimized SeaBIOS.


Thanks,
-Maran



But if SeaBIOS is involved in the solution you are working on, I guess
you won't really need those extra patches. Just figured I'd mention it
so you have the full picture.

Thank you very much to share with me these details!

Cheers,
Stefano


Thanks,
-Maran


I'll keep you updated on it!

Cheers,
Stefano
On Wed, Dec 5, 2018 at 11:38 PM Liam Merwick  wrote:

These changes (along with corresponding qboot and Linux kernel changes)
enable a guest to be booted using the x86/HVM direct boot ABI.

This commit adds a load_elfboot() routine to pass the size and
location of the kernel entry point to qboot (which will fill in
the start_info struct information needed to to boot the guest).
Having loaded the ELF binary, load_linux() will run qboot
which continues the boot.

The address for the kernel entry point has already been read
from an ELF Note in the uncompressed kernel binary earlier
in pc_memory_init().

Signed-off-by: George Kennedy 
Signed-off-by: Liam Merwick 
---
   hw/i386/pc.c | 72 

   1 file changed, 72 insertions(+)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 056aa46d99b9..d3012cbd8597 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -54,6 +54,7 @@
   #include "sysemu/qtest.h"
   #include "kvm_i386.h"
   #include "hw/xen/xen.h"
+#include "hw/xen/start_info.h"
   #include "ui/qemu-spice.h"
   #include "exec/memory.h"
   #include "exec/address-spaces.h"
@@ -1098,6 +1099,50 @@ done:
   return pvh_start_addr != 0;
   }

+static bool load_elfboot(const char *kernel_filename,
+   int kernel_file_size,
+   uint8_t *header,
+   size_t pvh_xen_start_addr,
+   FWCfgState *fw_cfg)
+{
+uint32_t flags = 0;
+uint32_t mh_load_addr = 0;
+uint32_t elf_kernel_size = 0;
+uint64_t elf_entry;
+uint64_t elf_low, elf_high;
+int kernel_size;
+
+if (ldl_p(header) != 0x464c457f) {
+return false; /* no elfboot */
+}
+
+bool elf_is64 = header[EI_CLASS] == ELFCLASS64;
+flags = elf_is64 ?
+((Elf64_Ehdr *)header)->e_flags : ((Elf32_Ehdr *)header)->e_flags;
+
+if (flags & 0x00010004) { /* LOAD_ELF_HEADER_HAS_ADDR */
+error_report("elfboot unsupported flags = %x", flags);
+exit(1);
+}
+
+kernel_size = load_elf(kernel_filename, NULL, NULL, _entry,
+   _low, _high, 0, I386_ELF_MACHINE,
+   0, 0);
+
+if (kernel_size < 0) {
+error_report("Error while loading elf kernel");
+exit(1);
+}
+mh_load_addr = elf_low;
+elf_kernel_size = elf_high - elf_low;
+
+fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ENTRY, pvh_xen_start_addr);
+fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ADDR, mh_load_addr);
+fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_SIZE, elf_kernel_size);
+
+return true;
+}
+
   static void 

Re: [Xen-devel] Xen optimization

2018-12-12 Thread Andrii Anisov

Hello Dario,

On 12.12.18 19:10, Dario Faggioli wrote:

Ah, yes... I've seen the thread. I haven't commented, as it is really,
really weird, and I don't know what to think/say.

I think only bisection could shed some light on this. And it would be
wonderful if you could do that, but I understand that it takes time. :-
/

Well, bisect might help. But I'm really confused why MemTotal may be reduced.


Are you absolutely sure about that? That is, are you "just" assuming
the scheduler won't move stuff, or have you put some debugging or
printing in place to verify that to be the case?Being honest, I did not check 
for exactly this setup. I verified it for 4.10.



I'm asking because, yet, in theory that is what one would expect. But,
as I think you know very well, although in theory there is no
difference between theory and practice, in practice, there is. :-)

I know it very well :)

--
Sincerely,
Andrii Anisov.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Xen optimization

2018-12-12 Thread Dario Faggioli
On Wed, 2018-12-12 at 11:39 +0200, Andrii Anisov wrote:
> Hello Dario,
> 
Hi,

> On 11.12.18 18:56, Dario Faggioli wrote:
> > Also, what about Xen numbers, sched=null.
> Didn't check, will put on the list.
> 
:-)

> > I don't expect much improvement, considering pinning is in-place
> > already.
> Actually, I faced a strange issue with explicit pinning of Dom0.
> Didn't sort out the cause yet. And Julien says it is not reproducible
> on his desk.
>
Ah, yes... I've seen the thread. I haven't commented, as it is really,
really weird, and I don't know what to think/say.

I think only bisection could shed some light on this. And it would be
wonderful if you could do that, but I understand that it takes time. :-
/

> But yes, with VCPU number less than PCPUs - there is no migration of
> Dom0 VCPUs.
> 
Are you absolutely sure about that? That is, are you "just" assuming
the scheduler won't move stuff, or have you put some debugging or
printing in place to verify that to be the case? 

I'm asking because, yet, in theory that is what one would expect. But,
as I think you know very well, although in theory there is no
difference between theory and practice, in practice, there is. :-)

Regards,
Dario

> [1]https://lists.xenproject.org/archives/html/xen-devel/2018-12/msg00435.html
-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Software Engineer @ SUSE https://www.suse.com/


signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 1/2] x86/dom0: rename paging function

2018-12-12 Thread Roger Pau Monné
On Wed, Dec 12, 2018 at 09:15:09AM -0700, Jan Beulich wrote:
> >>> On 12.12.18 at 16:56,  wrote:
> > On Wed, Dec 12, 2018 at 03:32:53AM -0700, Jan Beulich wrote:
> >> >>> On 12.12.18 at 11:04,  wrote:
> >> > You mentioned there's some code (for PV?) to calculate the size of the
> >> > page tables but I'm having trouble finding it (mainly because I'm not
> >> > that familiar with PV), could you point me to it?
> >> 
> >> In dom0_construct_pv() you'll find a loop starting with
> >> "for ( nr_pt_pages = 2; ; nr_pt_pages++ )". It's not the neatest,
> >> but at least we've never had reports of failure.
> > 
> > That seems quite complicated, what about using the formula below:
> > 
> > /*
> >  * Approximate the memory required for the HAP/IOMMU page tables by
> >  * pessimistically assuming every guest page will use a p2m page table
> >  * entry.
> >  */
> > return DIV_ROUND_UP((
> > /* Account for one entry in the L1 per page. */
> > nr_pages +
> > /* Account for one entry in the L2 per 512 pages. */
> > DIV_ROUND_UP(nr_pages, 512) +
> > /* Account for one entry in the L3 per 512^2 pages. */
> > DIV_ROUND_UP(nr_pages, 512 * 512) +
> > /* Account for one entry in the L4 per 512^3 pages. */
> > DIV_ROUND_UP(nr_pages, 512 * 512 * 512) +
> > ) * 8, PAGE_SIZE << PAGE_ORDER_4K);
> > 
> > That takes into account higher level page table structures.
> 
> That's a fair approximation without 2M and 1G pages available. I'm
> unconvinced we want to over-estimate this heavily in the more
> common case of large page mappings being available. Otoh this
> provides enough resources to later also deal with shattering of
> large pages.
> 
> The MMIO side of things of course still remains unclear.

Right, for the MMIO and the handling of grant and foreign mappings it's
not clear how we want to proceed.

Maybe account for all host RAM (total_pages) plus MMIO BARs?

> What I don't understand in any case though is
> "PAGE_SIZE << PAGE_ORDER_4K". This is x86 code - why not
> just PAGE_SIZE?

Oh, I've done it like that because this is related to p2m code, which
uses this way to get the page size. IIRC you told me to use this for
things like pvh_setup_e820. I don't mind switching to just PAGE_SIZE.

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH 0/2] gic-vgic optimizations

2018-12-12 Thread Andrii Anisov
From: Andrii Anisov 

Here are few patches from RFC series [1] currently approved to
be upstreamed with appropriate changes.

Andrii Anisov (2):
  gic-vgic: Drop an excessive clear_lrs
Is a patch #5 [2], with a change:
 - Keep LR clear for debug build

  arm/irq: skip action availability check for non-debug build
Is a patch #11 [3], with a change:
 - Completely remove the check for a non-debug build,
   but preserve for debug.

[1] https://lists.xenproject.org/archives/html/xen-devel/2018-11/msg03328.html
[2] https://lists.xenproject.org/archives/html/xen-devel/2018-11/msg03285.html
[3] https://lists.xenproject.org/archives/html/xen-devel/2018-11/msg03291.html


 xen/arch/arm/gic-vgic.c | 2 ++
 xen/arch/arm/irq.c  | 2 ++
 2 files changed, 4 insertions(+)

-- 
2.7.4


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH 1/2] gic-vgic: Drop an excessive clear_lrs

2018-12-12 Thread Andrii Anisov
From: Andrii Anisov 

This action is excessive because for an invalid LR there is no need
to write another invalid value to a register. So we can skip it here,
saving a peripheral register write.
Keep clearing the LR for the DEBUG build. This would make dumped
invalid LRs be zero. That is more obvious than picking state bits
from a non-zero value.

Signed-off-by: Andrii Anisov 
Reviewed-by: Julien Grall 
---
 xen/arch/arm/gic-vgic.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/xen/arch/arm/gic-vgic.c b/xen/arch/arm/gic-vgic.c
index 990399c..48922f5 100644
--- a/xen/arch/arm/gic-vgic.c
+++ b/xen/arch/arm/gic-vgic.c
@@ -216,7 +216,9 @@ static void gic_update_one_lr(struct vcpu *v, int i)
 }
 else
 {
+#ifndef NDEBUG
 gic_hw_ops->clear_lr(i);
+#endif
 clear_bit(i, _cpu(lr_mask));
 
 if ( p->desc != NULL )
-- 
2.7.4


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH 2/2] arm/irq: skip action avalability check for non-debug build

2018-12-12 Thread Andrii Anisov
From: Andrii Anisov 

An IRQ with _IRQ_GUEST flag set always has an action.
An IRQ with _IRQ_DISABLED flag cleared always have an action.
Those flags checks cover all accesses to desc->action in do_IRQ,
so we can skip desc->action check.
Still keep it in place for debug build.

Signed-off-by: Andrii Anisov 
Reviewed-by: Julien Grall 
---
 xen/arch/arm/irq.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/xen/arch/arm/irq.c b/xen/arch/arm/irq.c
index d6a0273..4a02cc1 100644
--- a/xen/arch/arm/irq.c
+++ b/xen/arch/arm/irq.c
@@ -209,12 +209,14 @@ void do_IRQ(struct cpu_user_regs *regs, unsigned int irq, 
int is_fiq)
 spin_lock(>lock);
 desc->handler->ack(desc);
 
+#ifndef NDEBUG
 if ( !desc->action )
 {
 printk("Unknown %s %#3.3x\n",
is_fiq ? "FIQ" : "IRQ", irq);
 goto out;
 }
+#endif
 
 if ( test_bit(_IRQ_GUEST, >status) )
 {
-- 
2.7.4


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 13/25] argo: implement the register op

2018-12-12 Thread Roger Pau Monné
On Fri, Nov 30, 2018 at 05:32:52PM -0800, Christopher Clark wrote:
> Used by a domain to register a region of memory for receiving messages from
> either a specified other domain, or, if specifying a wildcard, any domain.
> 
> This operation creates a mapping within Xen's private address space that
> will remain resident for the lifetime of the ring. In subsequent commits, the
> hypervisor will use this mapping to copy data from a sending domain into this
> registered ring, making it accessible to the domain that registered the ring 
> to
> receive data.
> 
> In this code, the p2m type of the memory supplied by the guest for the ring
> must be p2m_ram_rw, which is a conservative choice made to defer the need to
> reason about the other p2m types with this commit.
> 
> argo_pfn_t type is introduced here to create a pfn_t type that is 64-bit on
> all architectures, to assist with avoiding the need to add a compat ABI.
> 
> Signed-off-by: Christopher Clark 
> ---
>  xen/common/argo.c  | 498 
> +
>  xen/include/asm-arm/guest_access.h |   2 +
>  xen/include/asm-x86/guest_access.h |   2 +
>  xen/include/public/argo.h  |  64 +
>  4 files changed, 566 insertions(+)
> 
> diff --git a/xen/common/argo.c b/xen/common/argo.c
> index 2a95e09..f4e82cf 100644
> --- a/xen/common/argo.c
> +++ b/xen/common/argo.c
> @@ -25,6 +25,7 @@
>  #include 
>  #include 
>  
> +DEFINE_XEN_GUEST_HANDLE(argo_pfn_t);
>  DEFINE_XEN_GUEST_HANDLE(argo_addr_t);
>  DEFINE_XEN_GUEST_HANDLE(argo_ring_t);
>  
> @@ -98,6 +99,25 @@ struct argo_domain
>  };
>  
>  /*
> + * Helper functions
> + */
> +
> +static inline uint16_t
> +argo_hash_fn(const struct argo_ring_id *id)

No need for the argo_ prefix for static functions, this is already an
argo specific file.

> +{
> +uint16_t ret;
> +
> +ret = (uint16_t)(id->addr.port >> 16);
> +ret ^= (uint16_t)id->addr.port;
> +ret ^= id->addr.domain_id;
> +ret ^= id->partner;
> +
> +ret &= (ARGO_HTABLE_SIZE - 1);

I'm having trouble figuring out what this is supposed to do, I think a
comment and the expected hash formula will help make sure the code is
correct.

Also doesn't this need to be documented in the public header?

> +return ret;
> +}
> +
> +/*
>   * locks
>   */
>  
> @@ -171,6 +191,74 @@ argo_ring_unmap(struct argo_ring_info *ring_info)
>  }
>  }
>  
> +/* caller must have L3 or W(L2) */
> +static int
> +argo_ring_map_page(struct argo_ring_info *ring_info, uint32_t i,
> +   uint8_t **page)
> +{
> +if ( i >= ring_info->nmfns )
> +{
> +printk(XENLOG_ERR "argo: ring (vm%u:%x vm%d) %p attempted to map 
> page"

You likely want to use gprintk here and below, or XENLOG_G_ERR, so
that the guest cannot DoS the console.

> +   " %u of %u\n", ring_info->id.addr.domain_id,
> +   ring_info->id.addr.port, ring_info->id.partner, ring_info,
> +   i, ring_info->nmfns);
> +return -EFAULT;
> +}
> +ASSERT(ring_info->mfns);
> +ASSERT(ring_info->mfn_mapping);

We are trying to move away from such assertions, and instead use
constructions that would prevent issues in non-debug builds. I would
write the above asserts as:

if ( !ring_info->mfns || !ring_info->mfn_mapping )
{
ASSERT_UNREACHABLE();
return -E;
}

That way non-debug builds won't trigger page faults if there's indeed
a way to get here with the wrong state, and debug builds will still
hit an assert.

> +
> +if ( !ring_info->mfn_mapping[i] )
> +{
> +/*
> + * TODO:
> + * The first page of the ring contains the ring indices, so both 
> read and
> + * write access to the page is required by the hypervisor, but 
> read-access
> + * is not needed for this mapping for the remainder of the ring.
> + * Since this mapping will remain resident in Xen's address space for
> + * the lifetime of the ring, and following the principle of least 
> privilege,
> + * it could be preferable to:
> + *  # add a XSM check to determine what policy is wanted here
> + *  # depending on the XSM query, optionally create this mapping as
> + *_write-only_ on platforms that can support it.
> + *(eg. Intel EPT/AMD NPT).
> + */
> +ring_info->mfn_mapping[i] = 
> map_domain_page_global(ring_info->mfns[i]);
> +
> +if ( !ring_info->mfn_mapping[i] )
> +{
> +printk(XENLOG_ERR "argo: ring (vm%u:%x vm%d) %p attempted to map 
> page"
> +   " %u of %u\n", ring_info->id.addr.domain_id,
> +   ring_info->id.addr.port, ring_info->id.partner, ring_info,
> +   i, ring_info->nmfns);
> +return -EFAULT;
> +}
> +argo_dprintk("mapping page %"PRI_mfn" to %p\n",
> +   mfn_x(ring_info->mfns[i]), ring_info->mfn_mapping[i]);
> +}
> +
> +if ( page )
> +*page = 

Re: [Xen-devel] [PATCH] libxl: Documentation about the domain configuration on disk

2018-12-12 Thread Anthony PERARD
On Fri, Dec 07, 2018 at 07:05:38PM +, Ian Jackson wrote:
> Wei Liu writes ("Re: [PATCH] libxl: Documentation about the domain 
> configuration on disk"):
> > On Thu, Dec 06, 2018 at 02:57:33PM +, Anthony PERARD wrote:
> > > Anyway, that comment block isn't very helpful because it basically says
> > > that we can't depriv QEMU, I mean do hotplug with a deprived QEMU. It
> > > assumes that we can keep a lock on the userdata while updating the
> > > guest, but we can't keep the lock while talking with QEMU (or more
> > > generaly: we can't keep the lock while doing any async operation).
> > > 
> > > But there is one useful piece of information:
> > > Here we maintain one invariant: every device in xenstore must have
> > > an entry in JSON file.
> > > (xenstore is describe as "primary reference" just before that sentence).
> > 
> > Yes. That.
> ...
> > When removing a CD, you only need to update the primary source -- QEMU
> > in this case, you can leave libxl-json untouched. It is allowed to have
> > stale entries in libxl-json. This is implied in "We may not even need
> > this ..." further above.
> 
> If you leave a state entry in libxl-json then you would reject
> attempts to insert a new cd, because the libxl-json would tell you one
> is already present.
> 
> I think the lock difficulty Anthony identifies is real.  We need to
> either develop a new set of locking rules, or make all acquisitions of
> the libxl-json lock slow.
> 
> 
> The invariant that we want to maintain is:
> 
>   * Nothing may exist in the primary config without
> a corresponding entry in libxl-json.
> 
> The rules that implement that are:
> 
>   * No-one may edit the libxl-json without holding the lock.
> 
>   * You may not cause a thing to be added to the primary config
> unless you have held the libxl-json lock continuously
> since ensuring that the libxl-json config describes it.
> 
>   * Conversely you may not cause a thing to be removed from
> the libxl-json unless you have held the libxl-json lock
> continueously since ensuring the thing is absent
> from the primary config.
> 
> And unfortunately much code acquiring the libxl-json lock expects it
> to be fast.
> 
> 
> How about the following scheme.  We split the libxl-json lock into
> two.  I'm going to call them the fast lock and the slow lock.
> 
>   * The fast lock is the existing libxl-json lock.
> 
>   * The slow lock is outside the libxl-json lock in the lock
> hierarchy.  It is also outside the libxl_ctx lock.  It is
> to be acquired by an ao event callback.
> 
>   * No-one may read or edit the libxl-json without holding the fast
> lock across their read operation, or their read/modify/write
> cycle.
> 
>   * However, there are special rules for thing removal/addition, for
> things added/removed via qmp.  Call these `qmp things'.  It is
> permissible to add or remove a qmp thing across two separate
> acquisitions of the fast lock, one to read the old state of the
> thing, and one to read/modify/write to update (only) the new state
> of the thing.  This is subject to the thing add/removal rule, from
> before, which becomes:
> 
>   * You may not cause a thing to be added to the primary config
> unless you have held the relevant thing lock continuously
> since ensuring that the libxl-json config describes it.
> 
>   * Conversely you may not cause a thing to be removed from the
> libxl-json unless you have held the relevant thing lock
> continuously since ensuring the thing is absent from the primary
> config.
> 
>   * The `relevant thing lock' is the slow lock for qmp things, and the
> fast lock for other things.
> 
>   * Acquiring the fast lock fails for a destroyed domain, as at
> present.

I found out that this isn't exactly true. There is a short window where
a thread can aquire the fast lock for a domain that is about to be
destroyed. But this is fine as long as the thread tries to read the
userdata before doing anything else. (The window is between
userdata_destroyall and xc_domain_destroy.)

> 
> I think this maintains the invariant.
> 
> I haven't figured out domain destruction.  Ideally domain destruction
> could happen without taking the slow lock.
> 
> I think this is probably possible if we make sure that qemu is always
> killed before the libxl-json is removed.  The result is that if a qmp
> thing operation races with domain destruction, and in its 1st read
> gets an existing libxl-json from before destruction, the qmp thing
> will, when it acquires the fast lock again, necessarily the libxl-json
> will not exist, and the qmp operation will bomb out.

I've check the domain destruction functions in libxl. And that appear to
be true from my understanding. Relevent actions taken in libxl, in that
order:
- kill QEMU
- userdata_destroyall
- xc_domain_destroy

> But I don't exactly know how this relates to domain creation.  In
> general I haven't 

Re: [Xen-devel] [PATCH v11] tolerate jitter in cpu_khz calculation to avoid TSC emulation

2018-12-12 Thread Jan Beulich
>>> On 12.12.18 at 16:20,  wrote:
> Improve decision when vTSC emulation will be activated for a domU with
> tsc_mode=default. The current approach is to compare the cpu_khz value
> from two physical hosts. Since this value is not accurate, it can not be
> used verbatim to decide if vTSC emulation needs to be enabled. Without
> this change each TSC access from domU will be emulated after migration,
> which causes a significant perfomance drop for workloads that make use
> of rdtsc.
> 
> If a domU uses TSC as clocksoure it also must run NTP in some way to
> avoid the potential drift what will most likely happen, independent of
> any migration.

Which drift? While anyone's well advised to run NTP, a completely
isolated set of systems may have no need to, if their interactions don't
depend on exactly matching time.

> The calculation of the drift is based on the time
> returned by remote servers versus how fast the local clock advances. NTP
> can handle a drift up to 500PPM. This means the local clocksource can
> run up to 500us slower or faster. This calculation is based on the TSC
> frequency of the host where the domU was started. Once a domU is
> migrated to a host of a different class, like from "2.3GHz" to "2.4GHz",
> the TSC frequency changes, but the domU kernel may not recalibrate
> itself.

That's why we switch to emulated (or hardware scaling) mode in that
case. It's anyway not really clear to me what this entire ...

> As a result, the drift will be larger and might be outside of
> the 500 PPM range. In addition, the kernel may notice the change of
> speed in which the TSC advances and could change the clocksource. All
> this depends of course on the type of OS that is running in the domU.

... (up to here) paragraph is supposed to tell the reader.

> @@ -1885,6 +1888,16 @@ void __init early_time_init(void)
>  printk("Detected %lu.%03lu MHz processor.\n", 
> cpu_khz / 1000, cpu_khz % 1000);
>  
> +tmp = 1000 * 1000;
> +tmp += VTSC_NTP_PPM_TOLERANCE;
> +tmp *= cpu_khz;
> +tmp /= 1000 * 1000;
> +tmp -= cpu_khz;
> +if (tmp >= VTSC_JITTER_RANGE_KHZ)
> +tmp -= VTSC_JITTER_RANGE_KHZ;

Besides the style issue in the if() - how can this be correct? This
clearly introduces a discontinuity (just consider the case where
tmp is exactly VTSC_JITTER_RANGE_KHZ before the if()). And
I also can't see how it guarantees the resulting value to be
below VTSC_JITTER_RANGE_KHZ. Did you perhaps mean to
cap the value (i.e. = instead of -= )?

> +vtsc_tolerance_khz = (unsigned int)tmp;

Stray cast.

> +printk("Tolerating vtsc jitter for domUs: %u kHz.\n", 
> vtsc_tolerance_khz);

Please omit the full stop; the printk() in context above shouldn't
have one either.

> @@ -2223,8 +2237,25 @@ void tsc_set_info(struct domain *d,
>   * When a guest is created, gtsc_khz is passed in as zero, making
>   * d->arch.tsc_khz == cpu_khz. Thus no need to check incarnation.
>   */
> +disable_vtsc = d->arch.tsc_khz == cpu_khz;
> +
> +if ( tsc_mode == TSC_MODE_DEFAULT && gtsc_khz && vtsc_tolerance_khz )
> +{
> +long khz_diff;
> +
> +khz_diff = ABS((long)(cpu_khz - gtsc_khz));

I think

khz_diff = ABS((long)cpu_khz - gtsc_khz);

or some such would be less fragile, if e.g. we decided to make
cpu_khz an unsigned int (which is an option, as I don't think
we'll see above 4THz systems any time soon).

> +disable_vtsc = khz_diff <= vtsc_tolerance_khz;
> +
> +printk(XENLOG_G_INFO "d%d: host has %lu kHz,"
> +   " domU expects %u kHz,"
> +   " difference of %ld is %s tolerance of %u\n",
> +   d->domain_id, cpu_khz, gtsc_khz, khz_diff,
> +   disable_vtsc ? "within" : "outside",
> +   vtsc_tolerance_khz);
> +}
> +
>  if ( tsc_mode == TSC_MODE_DEFAULT && host_tsc_is_safe() &&
> - (d->arch.tsc_khz == cpu_khz ||
> + (disable_vtsc ||
>(is_hvm_domain(d) &&
> hvm_get_tsc_scaling_ratio(d->arch.tsc_khz))) )
>  {

In any event I don't follow why all of the sudden this becomes
an always-on mode, with not even a boot command line override.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 09/10] libxl: Kill QEMU with "reaper" ruid

2018-12-12 Thread Ian Jackson
George Dunlap writes ("[PATCH v2 09/10] libxl: Kill QEMU with "reaper" ruid"):
> Using kill(-1) to killing an untrusted dm process with the real uid
> equal to the dm_uid isn't guaranteed to succeed: the process in
> question may be able to kill the reaper process after the setresuid()
> and before the kill().
> 
> Instead, set the real uid to the QEMU user for domain 0
> (QEMU_USER_RANGE_BASE + 0).  The reaper process will still be able to
> kill the dm process, but not vice versa.
> 
> This, in turn, requires locking to make sure that only one reaper
> process is using that uid at a time; otherwise one reaper process may
> kill the other reaper process.
> 
> Create a lockfile in RUNDIR/dm-reaper-lock, and grab the lock before
> executing kill.
> 
> In the event that we can't get the lock for some reason, go ahead with
> the kill using dm_uid for both real and effective UIDs.  This isn't
> guaranteed to work, but it's no worse than not trying to kill the
> process at all.

Thanks.  Only minor comments here.

> +/*
> + * Look up "reaper UID".  If present and non-root, returns 0 and sets
> + * reaper_uid.  If not present, returns 0 and leaves reaper_uid unset;
> + * otherwise returns libxl-style error.
> + */

`leaves reaper_uid unset' is ambiguous.  It might mean `leaves it
unchanged' or `leaves it set to a sentinel value'.  The implementation
seems to be the former, which means that all callers must set it to a
sentinel value.

I think it would be better if it explicitly set it to (uid_t)-1
in that case.

...

I looked further and saw that actually you meant `leaves it unchanged'
and the caller is expected to set it to the value that it will want to
use if there is no dedicated reaper uid.  This is rather too subtle
for me.

Can you please put the logic that falls back to
reaper_uid=dm_dedicated_uid closer to the point of use ?

> +if(user_base->pw_uid == 0) {
 ^
missing space.

.oO{ I wonder how that checkpatch project is coming on... }

> +static int get_reaper_lock_and_uid(libxl__destroy_devicemodel_state *ddms,
> +   uid_t *reaper_uid)
> +{
...
> +fd = open(lockfile, O_RDWR|O_CREAT, 0666);

I think this mode is wrong.  If the process is running with a g+w
umask the file is g+w which is not desirable.  I suggest 0644.

> +if (fd < 0) {
> +/* All other errno: EBADF, EINVAL, ENOLCK, EWOULDBLOCK */
> +LOGED(ERROR, domid,
> +  "unexpected error while trying to open lockfile %s, errno=%d",
> +  lockfile, errno);
> +return ERROR_FAIL;
;4~
`All other errno' - other to what ?  I think this comment has become
detached from its code.

> +/* Try to lock the file, retrying on EINTR */
> +for (;;) {
> +r = flock(fd, LOCK_EX);
> +if (!r)
> +break;
> +if (errno != EINTR) {
> +/* All other errno: EBADF, EINVAL, ENOLCK, EWOULDBLOCK */
> +LOGED(ERROR, domid,
> +  "unexpected error while trying to lock %s, fd=%d, 
> errno=%d",
> +  lockfile, fd, errno);
> +return ERROR_FAIL;
> +}
> +}

LGTM.

> +/*
> + * Get reaper_uid.  If we can't find such a uid, return an error.
> + *
> + * FIXME: This means that domain destruction will fail if
> + * device_model_user is set but QEMU_USER_RANGE_BASE doesn't exist.
> + */
> +return libxl__get_reaper_uid(gc, reaper_uid);
> +}

Did you mean to introduce this new FIXME ?  I think device_model_user
was there before your series (even though it probably won't work very
well...)  Forgive me if I am missing something.

If you did mean to, then you need to explain in the commit message why
it is OK.

>  /*
>   * Destroy all processes of the given uid by setresuid to the
>   * specified uid and kill(-1).  NB this MUST BE CALLED FROM A SEPARATE
> - * PROCESS from the normal libxl process.  Returns a libxl-style error
> - * code.
> + * PROCESS from the normal libxl process, and should exit immediately
> + * after return.  Returns a libxl-style error code.

I think this rule that you must _exit immediately after return is not
new so can you move that comment into the patch that introduces the
setresuid ?  Also it would be marginally better to explicitly say
_exit rather than exit.

>  /*
...
> + * NB: Even if we don't have a separate reaper_uid, the parent can
> + * know whether we won the race by looking at the status variable;
> + * so we don't strictly need to return failure in this case.  But
> + * if there's a misconfiguration, it's better to alert the
> + * administator sooner rather than later; so if we fail to get a
> + * reaper uid, report an error even if the kill succeeds.

I approve of this reasoning and also of it being explained in a
comment.  Thanks.

> @@ -2853,7 +2952,7 @@ static int 
> kill_device_model_uid_child(libxl__destroy_devicemodel_state *ddms,
>  r = kill(-1, 9);
>  if (r && 

Re: [Xen-devel] [PATCH v2 10/10] dm_depriv: Mark `UID cleanup` as completed

2018-12-12 Thread Ian Jackson
George Dunlap writes ("[PATCH v2 10/10] dm_depriv: Mark `UID cleanup` as 
completed"):
> Signed-off-by: George Dunlap 

Acked-by: Ian Jackson 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH for-4.12] dt: fix integer check in dt_device_is_compatible

2018-12-12 Thread Julien Grall

Hi Stefano,

On 11/12/2018 18:46, Stefano Stabellini wrote:

cplen is unsigned, thus, it can never be < 0. Use a signed integer local
variable instead.


The current code checks > 0. Looking at the code I don't think it can ever be 
negative. So can you details what is the problem you are trying to resolve?


Cheers,



Signed-off-by: Stefano Stabellini 
---
  xen/common/device_tree.c | 7 +--
  1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/xen/common/device_tree.c b/xen/common/device_tree.c
index 8fc401d..df274cc 100644
--- a/xen/common/device_tree.c
+++ b/xen/common/device_tree.c
@@ -213,17 +213,20 @@ bool_t dt_device_is_compatible(const struct 
dt_device_node *device,
  {
  const char* cp;
  u32 cplen, l;
+s64 ilen;
  
  cp = dt_get_property(device, "compatible", );

  if ( cp == NULL )
  return 0;
-while ( cplen > 0 )
+
+ilen = cplen;
+while ( ilen > 0 )
  {
  if ( dt_compat_cmp(cp, compat) == 0 )
  return 1;
  l = strlen(cp) + 1;
  cp += l;
-cplen -= l;
+ilen -= l;
  }
  
  return 0;




--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 08/10] libxl: Kill QEMU by uid when possible

2018-12-12 Thread Ian Jackson
George Dunlap writes ("[PATCH v2 08/10] libxl: Kill QEMU by uid when possible"):
> The privcmd fd that a dm_restrict'ed QEMU has gives it permission to
> one specific domain ID.  This domain ID will probably eventually be
> used again.  It is therefore necessary to make absolutely sure that a
> rogue QEMU process cannot hang around after its domain has exited.

Thanks.  Detailed comments mostly on error handling follow...

> diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
> index f9e0bf6578..cd3208f4b8 100644
> --- a/tools/libxl/libxl_internal.h
> +++ b/tools/libxl/libxl_internal.h
> @@ -1135,7 +1135,7 @@ typedef struct {
>  const char *shim_cmdline;
>  const char *pv_cmdline;
>  
> -char *dm_runas;
> +char *dm_runas, *dm_uid;

I think `dm_uid' is misnamed.  It is only set if the dm has a
*dedicated* uid.  If the dm is run as uid 0 or as a shared qemu uid,
it is set to NULL.

> @@ -3706,6 +3706,8 @@ struct libxl__destroy_devicemodel_state {
>  uint32_t domid;
>  libxl__devicemodel_destroy_cb *callback; /* May be called re-entrantly */
>  /* private to implementation */
> +libxl__ev_child destroyer;
> +int rc; /* Accumulated return value for the destroy operation */

Excellent.

> diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c
> index 7f9c6a62fe..53fdf8daf7 100644
> --- a/tools/libxl/libxl_dm.c
> +++ b/tools/libxl/libxl_dm.c
> @@ -129,6 +129,8 @@ static int libxl__domain_get_device_model_uid(libxl__gc 
> *gc,
>  int rc;
>  char *user;
>  uid_t intended_uid;
> +bool kill_by_uid;
> +

I guess this new blank line is deliberate.  I think it is probably a
good idea.

> @@ -148,8 +150,10 @@ static int libxl__domain_get_device_model_uid(libxl__gc 
> *gc,
>  LOGD(ERROR, guest_domid, "Couldn't find device_model_user %s",
>   user);
>  rc = ERROR_INVAL;
> -} else
> +} else {
>  intended_uid = user_base->pw_uid;
> +kill_by_uid = true;
> +}
>  
>  goto out;
>  }

I think your changes to the out block newly imply that all `goto out'
with `rc=0' must also set kill_by_uid.

I have not (in this revision) applied this patch and searched for that
because I think there will have to be a respin to consistently apply
the rules I previously identified.

(I mention this here partly so that when I review v3 I know that my
former self wanted to double check that.)

> @@ -226,6 +234,8 @@ out:
>  }
>  
>  state->dm_runas = user;
> +if (kill_by_uid)
> +state->dm_uid = GCSPRINTF("%ld", (long)intended_uid);
>  }

More stuff is accumulating in this post-0-check success path.
Perhaps this means my suggestion of
   if (!rc) { ... }  if (!rc) { ... }
will be most convenient.

> +/*
> + * If we're starting the dm with a non-root UID, save the UID so
> + * that we can reliably kill it and any subprocesses
> + */
> +if (state->dm_uid)
> +libxl__xs_printf(gc, XBT_NULL,
> + GCSPRINTF("%s/image/device-model-uid", dom_path),

My comment about the misnamed libxl variable applies to
device-model-uid too I think.

> +#define PROPAGATE_RC if(!ddms->rc) ddms->rc = rc
  ^

I like this macro.  But (i) it needs proper macro hygiene - either a
do{ }while(0) block, or refactoring into an expression (and then
putting in parens).  And (ii) you missed out a space.

(The references to ddms and rc are anaphoric, not macro parameters, so
do not need parens.)

> +path = DEVICE_MODEL_XS_PATH(gc, LIBXL_TOOLSTACK_DOMID, domid, "");
> +rc = libxl__xs_rm_checked(gc, XBT_NULL, path);
> +if (rc) {
> +PROPAGATE_RC;
>  LOGD(ERROR, domid, "xs_rm failed for %s", path);
> +}

Your new macro makes this a nicely transparent pasttern.

> +/*
> + * See if we should try to kill by uid
> + */
> +path = GCSPRINTF("/local/domain/%d/image/device-model-uid", domid);
> +rc = libxl__xs_read_checked(gc, XBT_NULL, path, _uid_str);
> +
> +/*
> + * If there was an error here, accumulate the error and fall back
> + * to killing by pid.
> + */
> +if (rc) {
> +PROPAGATE_RC;
> +LOGD(ERROR, domid, "Reading dm UID path failed for %s", path);
> +}

From the comment for libxl__xs_read_checked:
 | * On error, *result_out is undefined.
Arguably this is a bear trap.  Maybe you would like to fix it there
rather than by setting dm_uid_str to 0 here.

> +reaper_pid = libxl__ev_child_fork(gc, >destroyer,
> +  kill_device_model_uid_cb);
> +if (reaper_pid < 0) {
> +rc = ERROR_FAIL;
> +PROPAGATE_RC;
> +/*
> + * Note that if this fails, we still don't kill by pid, to
> + * make sure that an untrusted DM has not "maliciously"
> + * exited (potentially causing us to kill an unrelated
> +

Re: [Xen-devel] [PATCH v2 1/2] x86/dom0: rename paging function

2018-12-12 Thread Jan Beulich
>>> On 12.12.18 at 16:56,  wrote:
> On Wed, Dec 12, 2018 at 03:32:53AM -0700, Jan Beulich wrote:
>> >>> On 12.12.18 at 11:04,  wrote:
>> > You mentioned there's some code (for PV?) to calculate the size of the
>> > page tables but I'm having trouble finding it (mainly because I'm not
>> > that familiar with PV), could you point me to it?
>> 
>> In dom0_construct_pv() you'll find a loop starting with
>> "for ( nr_pt_pages = 2; ; nr_pt_pages++ )". It's not the neatest,
>> but at least we've never had reports of failure.
> 
> That seems quite complicated, what about using the formula below:
> 
> /*
>  * Approximate the memory required for the HAP/IOMMU page tables by
>  * pessimistically assuming every guest page will use a p2m page table
>  * entry.
>  */
> return DIV_ROUND_UP((
> /* Account for one entry in the L1 per page. */
> nr_pages +
> /* Account for one entry in the L2 per 512 pages. */
> DIV_ROUND_UP(nr_pages, 512) +
> /* Account for one entry in the L3 per 512^2 pages. */
> DIV_ROUND_UP(nr_pages, 512 * 512) +
> /* Account for one entry in the L4 per 512^3 pages. */
> DIV_ROUND_UP(nr_pages, 512 * 512 * 512) +
> ) * 8, PAGE_SIZE << PAGE_ORDER_4K);
> 
> That takes into account higher level page table structures.

That's a fair approximation without 2M and 1G pages available. I'm
unconvinced we want to over-estimate this heavily in the more
common case of large page mappings being available. Otoh this
provides enough resources to later also deal with shattering of
large pages.

The MMIO side of things of course still remains unclear.

What I don't understand in any case though is
"PAGE_SIZE << PAGE_ORDER_4K". This is x86 code - why not
just PAGE_SIZE?

Jan



___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] x86: fix paging_log_dirty_op to work with paging guests

2018-12-12 Thread Jan Beulich
>>> On 12.12.18 at 15:54,  wrote:
> Fix this by releasing the target paging lock before attempting to
> perform the copy of the dirty bitmap, and then forcing a restart of
> the whole process in case there have been changes to the dirty bitmap
> tables.

I'm afraid it's not that simple: The writer side (paging_mark_pfn_dirty())
uses the same lock, and I think the assumption is that while the copying
takes place no updates to the bitmap can occur. Then again the subject
domain gets paused anyway, so updates probably can't happen (the
"probably" of course needs to be eliminated). But at the very least I'd
expect a more thorough discussion here as to why dropping the lock
intermediately is fine. The preemption mechanism can't be used as sole
reference, because that one doesn't drop the lock in the middle of an
iteration.

> --- a/xen/arch/x86/mm/paging.c
> +++ b/xen/arch/x86/mm/paging.c
> @@ -408,6 +408,7 @@ static int paging_log_dirty_op(struct domain *d,
>  unsigned long *l1 = NULL;
>  int i4, i3, i2;
>  
> + again:
>  if ( !resuming )
>  {

Considering that you set "resuming" to 1 before jumping here,
why don't you put the label after this if()? I'd even consider pulling
the label even further down (perhaps to immediately before the
last if() ahead of the main loop), and have the path branching there
re-acquire the lock (and drop some of the other state setting that
then becomes unnecessary).

> @@ -488,6 +490,16 @@ static int paging_log_dirty_op(struct domain *d,
>  bytes = (unsigned int)((sc->pages - pages + 7) >> 3);
>  if ( likely(peek) )
>  {
> +if ( paging_mode_enabled(current->domain) )
> +/*
> + * Drop the target p2m lock, or else Xen will panic
> + * when trying to acquire the p2m lock of the caller
> + * due to invalid lock order. Note that there are no
> + * lock ordering issues here, and the panic is due to
> + * the fact that the lock level tracking doesn't 
> record
> + * the domain the lock belongs to.
> + */
> +paging_unlock(d);

This makes it sound as if tracking the domain would help. It doesn't,
at least not as long as there is not also some sort of ordering or
hierarchy between domains. IOW I'd prefer if the "doesn't" became
"can't".

Now before we go this route I think we need to consider whether
this really is the only place where the two locks get into one
another's way. That's because I don't think we want to introduce
more of such, well, hackery, and hence we'd otherwise need a
better solution. For example the locking model could be adjusted
to allow such nesting in the general case: Dom0 and any domain
whose ->target matches the subject domain here could be given
a slightly different "weight" in the lock order violation check logic.

I'm also uncertain about the overall effect this change has, in
that now the lock gets dropped for _every_ page to be copied.

> @@ -505,6 +517,23 @@ static int paging_log_dirty_op(struct domain *d,
>  clear_page(l1);
>  unmap_domain_page(l1);
>  }
> +if ( paging_mode_enabled(current->domain) && peek )
> +{
> +d->arch.paging.preempt.log_dirty.i4 = i4;
> +d->arch.paging.preempt.log_dirty.i3 = i3;
> +d->arch.paging.preempt.log_dirty.i2 = i2 + 1;

If i2 + 1 == LOGDIRTY_NODE_ENTRIES I think it would be better

> +d->arch.paging.preempt.log_dirty.done = pages;
> +d->arch.paging.preempt.dom = current->domain;
> +d->arch.paging.preempt.op = sc->op;
> +resuming = 1;

true?

> +if ( l2 )
> +unmap_domain_page(l2);
> +if ( l3 )
> +unmap_domain_page(l3);
> +if ( l4 )
> +unmap_domain_page(l4);
> +goto again;

At the very least for safety reasons you should set l2, l3, and l4
back to NULL here.

> @@ -513,6 +542,7 @@ static int paging_log_dirty_op(struct domain *d,
>  {
>  d->arch.paging.preempt.log_dirty.i4 = i4;
>  d->arch.paging.preempt.log_dirty.i3 = i3 + 1;
> +d->arch.paging.preempt.log_dirty.i2 = 0;
>  rv = -ERESTART;
>  break;
>  }
> @@ -525,6 +555,7 @@ static int paging_log_dirty_op(struct domain *d,
>  {
>  d->arch.paging.preempt.log_dirty.i4 = i4 + 1;
>  d->arch.paging.preempt.log_dirty.i3 = 0;
> +d->arch.paging.preempt.log_dirty.i2 = 0;
>  rv = -ERESTART;
>  }

With these I don't see why you need storage in the 

Re: [Xen-devel] [PATCH 07/25] xen (ARM, x86): add errno-returning functions for copy

2018-12-12 Thread Roger Pau Monné
On Fri, Nov 30, 2018 at 05:32:46PM -0800, Christopher Clark wrote:
> Applied to both x86 and ARM headers.
> 
> Signed-off-by: Christopher Clark 
> ---
>  xen/include/asm-arm/guest_access.h | 25 +
>  xen/include/asm-x86/guest_access.h | 29 +
>  xen/include/xen/guest_access.h |  3 +++
>  3 files changed, 57 insertions(+)
> 
> diff --git a/xen/include/asm-arm/guest_access.h 
> b/xen/include/asm-arm/guest_access.h
> index 224d2a0..7b6f89c 100644
> --- a/xen/include/asm-arm/guest_access.h
> +++ b/xen/include/asm-arm/guest_access.h
> @@ -24,6 +24,11 @@ int access_guest_memory_by_ipa(struct domain *d, paddr_t 
> ipa, void *buf,
>  #define __raw_copy_from_guest raw_copy_from_guest
>  #define __raw_clear_guest raw_clear_guest
>  
> +#define raw_copy_from_guest_errno(dst, src, len) \
> +(raw_copy_from_guest((dst), (src), (len)) ? -EFAULT : 0)
> +#define raw_copy_to_guest_errno(dst, src, len)   \
> +(raw_copy_to_guest((dst), (src), (len)) ? -EFAULT : 0)

Since the only error that you return is EFAULT, I don't really see the
point in adding all those helpers. You achieve exactly the same by
returning a boolean and doing the translation to EFAULT in the caller
if required.

It might have been nice to have the copy to/from set of functions
return an error value, but adding a new set of helpers that have the
same functionality but just differ in the return value look
redundant.

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC 11/16] irq: skip action avalability check for guest's IRQ

2018-12-12 Thread Andrii Anisov


On 12.12.18 17:08, Julien Grall wrote:

Is it just because it adds a couple more instruction in the guest case?

And add unlikely for XEN IRQ case. That was the idea.


The check is mainly there to catch error in debug build.

I supposed a race with `release_irq()`, but found out that we are safe with 
`_IRQ_DISABLED` flag when `action` is not set.
So I'll keep the code in its current place (as of mainline) and wrap it with 
`#ifndef NDEBUG`.

--
Sincerely,
Andrii Anisov.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 1/2] x86/dom0: rename paging function

2018-12-12 Thread Roger Pau Monné
On Wed, Dec 12, 2018 at 03:32:53AM -0700, Jan Beulich wrote:
> >>> On 12.12.18 at 11:04,  wrote:
> > On Wed, Dec 12, 2018 at 02:53:26AM -0700, Jan Beulich wrote:
> >> >>> On 12.12.18 at 10:14,  wrote:
> >> > On Tue, Dec 11, 2018 at 08:33:08AM -0700, Jan Beulich wrote:
> >> >> >>> On 11.12.18 at 16:19,  wrote:
> >> >> > On Tue, Dec 11, 2018 at 08:08:51AM -0700, Jan Beulich wrote:
> >> >> >> >>> On 05.12.18 at 15:54,  wrote:
> >> >> >> > To note it's calculating the approximate amount of memory required 
> >> >> >> > by
> >> >> >> > shadow paging.
> >> >> >> 
> >> >> >> I don't understand this logic, and ...
> >> >> >> 
> >> >> >> > @@ -325,7 +325,7 @@ unsigned long __init dom0_compute_nr_pages(
> >> >> >> >  break;
> >> >> >> >  
> >> >> >> >  /* Reserve memory for shadow or HAP. */
> >> >> >> > -avail -= dom0_paging_pages(d, nr_pages);
> >> >> >> > +avail -= dom0_shadow_pages(d, nr_pages);
> >> >> >> >  }
> >> >> >> 
> >> >> >> ... the comment here (and lack of conditional restricting the
> >> >> >> code to shadow mode) appear to support me: Have you
> >> >> >> been mislead by the function having a comment referring
> >> >> >> to libxl_get_required_shadow_memory()? I think if anything
> >> >> >> that libxl function would want to be renamed (to replace
> >> >> >> "shadow" by something more generic in its name).
> >> >> > 
> >> >> > But the logic in dom0_shadow_pages to calculate the size of the paging
> >> >> > memory pool is specifically for shadow AFAICT, I don't think HAP needs
> >> >> > to take the number of vCPUs into account, since there's only a
> >> >> > single p2m for the whole domain. OTOH shadow needs to take the number
> >> >> > of vCPUs into account because each one will have a different shadow.
> >> >> 
> >> >> Yes, the vCPU count aspect is indeed shadow specific. However,
> >> >> as said in reply to the other patch, the calculation here was at
> >> >> least supposed to also take into account the P2M part of the
> >> >> needed allocations. Yet the P2M part ought to be similar between
> >> >> both modes.
> >> >> 
> >> >> > Note that patch 2 in this series adds a function to calculate the size
> >> >> > of the paging memory pool for HAP, and a conditional is added to the
> >> >> > expression above that takes into account whether shadow or HAP is in
> >> >> > use when subtracting from the amount of available memory.
> >> >> 
> >> >> Well, assuming we can settle on what shape patch 2 should take
> >> >> I can see the point in doing the rename here, but then with an
> >> >> adjusted description: Especially in light of the code comment still
> >> >> visible above you'll want to point out that the rename is in
> >> >> preparation of splitting the calculations. Since I question the split,
> >> >> though, the rename (in a separate patch) is questionable to me
> >> >> too. If we used uniform P2M calculations and added just shadow's
> >> >> per-vCPU extra on top, no rename in a separate patch would
> >> >> seem warranted.
> >> > 
> >> > The current calculations in dom0_paging_pages assume 1 page is needed
> >> > for each 1MB of guest memory for the p2m, do you think this is OK?
> >> > (and suitable to be used for HAP/IOMMU page tables also)
> >> 
> >> Well, 1 page per 1Mb means the same as your current 8 bytes
> >> per page times 2 (for separate P2M and IOMMU tables), afaict.
> > 
> > I was planning to use 1 page per 1Mb for the p2m, and then 1 page per
> > 1Mb for the IOMMU, so 16 bytes per page.
> 
> Well, that's (as said for patch 2) quite a bit of an over-estimate,
> but then again reserving a little too much is perhaps better than
> reserving too little.
> 
> > You mentioned there's some code (for PV?) to calculate the size of the
> > page tables but I'm having trouble finding it (mainly because I'm not
> > that familiar with PV), could you point me to it?
> 
> In dom0_construct_pv() you'll find a loop starting with
> "for ( nr_pt_pages = 2; ; nr_pt_pages++ )". It's not the neatest,
> but at least we've never had reports of failure.

That seems quite complicated, what about using the formula below:

/*
 * Approximate the memory required for the HAP/IOMMU page tables by
 * pessimistically assuming every guest page will use a p2m page table
 * entry.
 */
return DIV_ROUND_UP((
/* Account for one entry in the L1 per page. */
nr_pages +
/* Account for one entry in the L2 per 512 pages. */
DIV_ROUND_UP(nr_pages, 512) +
/* Account for one entry in the L3 per 512^2 pages. */
DIV_ROUND_UP(nr_pages, 512 * 512) +
/* Account for one entry in the L4 per 512^3 pages. */
DIV_ROUND_UP(nr_pages, 512 * 512 * 512) +
) * 8, PAGE_SIZE << PAGE_ORDER_4K);

That takes into account higher level page table structures.

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 07/10] libxl: Make killing of device model asynchronous

2018-12-12 Thread Ian Jackson
George Dunlap writes ("[PATCH v2 07/10] libxl: Make killing of device model 
asynchronous"):
> Or at least, give it an asynchronous interface so that we can make it
> actually asynchronous in subsequent patches.
> 
> Create state structures and callback function signatures.  Add the
> state structure to libxl__destroy_domid_state.  Break
> libxl__destroy_domid down into two functions.

Acked-by: Ian Jackson 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 05/10] libxl: Do root checks once in libxl__domain_get_device_model_uid

2018-12-12 Thread Ian Jackson
George Dunlap writes ("[PATCH v2 05/10] libxl: Do root checks once in 
libxl__domain_get_device_model_uid"):
> At the moment, we check for equivalence to literal "root" before
> deciding whether to add the `runas` command-line option to QEMU.  This
> is unsatisfactory for several reasons.
...
> v2:
> - Refactor to use `out` rather than multiple labels
> - Only check for root once
> - Use 'out' rather than direct returns for errors (only use direct returns
>   for early `succeed-without-setting-runas` paths)
> - Use `rc` rather than `ret` to more closely align with CODING_STYLE
> - Fill out comments about the cases we're handling
> - Return ERROR_DEVICE_EXISTS rather than ERROR_FAIL if there's another
>   username that maps to our calculated uid
> - Report an error if the specified device_model_user doesn't exist

Thanks.  This is all much better now.  I still have a few style nits,
and one comment on your reuse of a libxl error code, but AFAICT the
algorithm is right.


> +if (!user_base) {
> +LOGD(ERROR, guest_domid, "Couldn't find device_model_user %s",
> + user);
> +rc = ERROR_INVAL;
> +} else
> +intended_uid = user_base->pw_uid;
>  
> +goto out;

1. Would you mind adding the missing { } ?
  | ... To avoid confusion, either all
  | the blocks in an if...else chain have braces, or none of them do.

2. Can you please set rc in the other path ?  I think this is supposed
to be a success path.  Relying on the previous value of rc seems very
action at a distance.  (Later: yes, I see this this is a success path.
See my comments about the rules implied by your `out:' block.)

TBH I would prefer two goto outs rather than one, like this:

> +if (!user_base) {
> +LOGD(ERROR, guest_domid, "Couldn't find device_model_user %s",
> + user);
> +rc = ERROR_INVAL;
  +goto out;
  +} else {
> +intended_uid = user_base->pw_uid;
  +rc = 0;
  +goto out;
  +}
> +}

which seems easier to read since it is not necessary to untangle the
whole if, and look at the context, to see the correctness of the
individual bits.

Or you could treat the !user_base as an error block and write this:

> +if (!user_base) {
> +LOGD(ERROR, guest_domid, "Couldn't find device_model_user %s",
> + user);
> +rc = ERROR_INVAL;
  +goto out;
  +}
  +intended_uid = user_base->pw_uid;
  +rc = 0;
  +goto out;
> +}



> +/*
> + * If dm_restrict isn't set, and we don't have a specified user, don't
> + * bother setting a `-runas` parameter.
> + */
>  if (!libxl_defbool_val(b_info->dm_restrict)) {
>  LOGD(DEBUG, guest_domid,
>   "dm_restrict disabled, starting QEMU as root");
>  return 0;
>  }

Why `return 0' here but `goto out' earlier ?  IMO all the success
returns should be the same.

> +rc = userlookup_helper_getpwuid(gc, intended_uid,
>   _clash_pwbuf, _clash);
> -if (ret < 0)
> -return ret;
> +if (rc < 0)
> +goto out;

In my last mail I suggested this should be `ret < 0' but of course now
it should be `if (rc)' ...

>  if (user_clash) {
>  LOGD(ERROR, guest_domid,
>   "wanted to use uid %ld (%s + %d) but that is user %s !",
>   (long)intended_uid, LIBXL_QEMU_USER_RANGE_BASE,
>   guest_domid, user_clash->pw_name);
> -return ERROR_FAIL;
> +rc = ERROR_DEVICE_EXISTS;
> +goto out;

I appreciate your desire to specify particular error value, but I am
far from convinced that ERROR_DEVICE_EXISTS is appropriate.  It would
lead someone to ask which device was in the way.

We generally use EINVAL for bad configurations.  If you prefer, feel
free to introduce a new error code.  32-bit signed integers are pretty
cheap.

> +if (rc < 0)
> +goto out;
>  if (user_base) {
>  LOGD(WARN, guest_domid, "Could not find user %s, falling back to %s",
>   LIBXL_QEMU_USER_RANGE_BASE, LIBXL_QEMU_USER_SHARED);
> -goto end_search;
> +intended_uid = user_base->pw_uid;
> +goto out;

Here we have this pattern again with a `goto out' without a preceding
assignment to `rc'.  AFAICT the rules implied by your out block are:

 * Every goto out must be preceded by an assignment to rc.
   IMO there is no reason this should not immediately precede
   the goto out.

 * Additionally, if rc is 0 then the goto out must also be preceded
   relatively recently by an assignment to intended_uid.

> +out:
> +if (!rc) {
> +if (intended_uid == 0) {
> +LOGD(ERROR, guest_domid, "intended_uid is 0 (root)!");
> +return ERROR_INVAL;
> +}
> +
> +state->dm_runas = user;
> +}
> +
> 

Re: [Xen-devel] [PATCH for-4.12 v2 16/17] xen/arm: Implement Set/Way operations

2018-12-12 Thread Julien Grall



On 07/12/2018 21:29, Stefano Stabellini wrote:

CC'ing Dario

Dario, please give a look at the preemption question below.


On Fri, 7 Dec 2018, Julien Grall wrote:

On 06/12/2018 23:32, Stefano Stabellini wrote:

On Tue, 4 Dec 2018, Julien Grall wrote:

So you may not execute them before returning to the guest introducing
long delay. That's why we execute the rest of the code with interrupts
masked. If sotfirq_pending() returns 0 then you know there were no
more softirq pending to handle. All the new one will be signaled via
an interrupt than can only come up when irq are unmasked.

The one before executing vCPU work can potentially be avoided. The reason I
added it is it can take some times before p2m_flush_vm() will call softirq. As
we do this on return to guest we may have already been executed for some time
in the hypervisor. So this give us a chance to preempt if the vCPU consumed
his sliced.


This one is difficult to tell whether it is important or if it would be
best avoided.

For Dario: basically we have a long running operation to perform, we
thought that the best place for it would be on the path returning to
guest (leave_hypervisor_tail). The operation can interrupt itself
checking sotfirq_pending() once in a while to avoid blocking the pcpu
for too long.

The question is: is it better to check sotfirq_pending() even before
starting? Or every so often during the operating is good enough? Does it
even matter?
I am not sure to understand what is your concern here. Checking for 
softirq_pending() often is not an issue. The issue is when we happen to not 
check it. At the moment, I would prefer to be over cautious until we figure out 
whether this is a real issue.


If you are concerned about the performance impact, this is only called when a 
guest is using set/way.


Cheers,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 03/10] libxl: Clean up userlookup_helper_getpw* helper

2018-12-12 Thread Ian Jackson
George Dunlap writes ("[PATCH v2 03/10] libxl: Clean up 
userlookup_helper_getpw* helper"):
> Bring conventions more in line with libxl__xs_read_checked():
> - If found, return 0 and set pointer to non-NULL
> - If not found, return 0 and set pointer to NULL
> - On error, return libxl-style error number.
> 
> Update documentation to match.
...
>  #define DEFINE_USERLOOKUP_HELPER(NAME,SPEC_TYPE,STRUCTNAME,SYSCONF) \
>  static int userlookup_helper_##NAME(libxl__gc *gc,  \
> @@ -83,7 +89,7 @@ static int libxl__create_qemu_logfile(libxl__gc *gc, char 
> *name)
>  struct STRUCTNAME *resultp = NULL;  \

git diff has excelled itself in choice of heading line, hasn't it ?

> @@ -142,14 +147,14 @@ static int libxl__domain_get_device_model_uid(libxl__gc 
> *gc,
>   _pwbuf, _base);
>  if (ret < 0)
>  return ret;
> -if (ret > 0) {
> +if (user_base) {

I would prefer to also:

  -if (ret < 0)
  +if (ret)
   return ret;

which is more conventional for rc and might reduce the impact of bugs
where the function returned a positive.  (Twice.)

With or without that change,

Acked-by: Ian Jackson 

Thanks for the cleanup!

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] xen: xen-pciback: Reset MSI-X state when exposing a device

2018-12-12 Thread Jan Beulich
>>> On 12.12.18 at 16:18,  wrote:
> On Wed, Dec 12, 2018 at 01:51:01AM -0700, Jan Beulich wrote:
> On 12.12.18 at 08:06,  wrote:
>>> On Wed, Dec 05, 2018 at 09:01:33AM -0500, Boris Ostrovsky wrote:
On 12/5/18 4:32 AM, Roger Pau Monné wrote:
> On Wed, Dec 05, 2018 at 10:19:17AM +0800, Chao Gao wrote:
>> I find some pass-thru devices don't work any more across guest reboot.
>> Assigning it to another guest also meets the same issue. And the only
>> way to make it work again is un-binding and binding it to pciback.
>> Someone reported this issue one year ago [1]. More detail also can be
>> found in [2].
>>
>> The root-cause is Xen's internal MSI-X state isn't reset properly
>> during reboot or re-assignment. In the above case, Xen set maskall bit
>> to mask all MSI interrupts after it detected a potential security
>> issue. Even after device reset, Xen didn't reset its internal maskall
>> bit. As a result, maskall bit would be set again in next write to
>> MSI-X message control register.
>>
>> Given that PHYSDEVOPS_prepare_msix() also triggers Xen resetting MSI-X
>> internal state of a device, we employ it to fix this issue rather than
>> introducing another dedicated sub-hypercall.
>>
>> Note that PHYSDEVOPS_release_msix() will fail if the mapping between
>> the device's msix and pirq has been created. This limitation prevents
>> us calling this function when detaching a device from a guest during
>> guest shutdown. Thus it is called right before calling
>> PHYSDEVOPS_prepare_msix().
> s/PHYSDEVOPS/PHYSDEVOP/ (no final S). And then I would also drop the
> () at the end of the hypercall name since it's not a function.
>
> I'm also wondering why the release can't be done when the device is
> detached from the guest (or the guest has been shut down). This makes
> me worry about the raciness of the attach/detach procedure: if there's
> a state where pciback assumes the device has been detached from the
> guest, but there are still pirqs bound, an attempt to attach to
> another guest in such state will fail.

I wonder whether this additional reset functionality could be done out
of xen_pcibk_xenbus_remove(). We first do a (best effort) device reset
and then do the extra things that are not properly done there.
>>> 
>>> No. It cannot be done in xen_pcibk_xenbus_remove() without modifying
>>> the handler of PHYSDEVOP_release_msix. To do a successful Xen internal
>>> MSI-X state reset, PHYSDEVOP_{release, prepare}_msix should be finished
>>> without error. But ATM, xen expects that no msi is bound to pirq when
>>> doing PHYSDEVOP_release_msix. Otherwise it fails with error code -EBUSY.
>>> However, the expectation isn't guaranteed in xen_pcibk_xenbus_remove().
>>> In some cases, if qemu fails to unmap MSIs, MSIs are unmapped by Xen
>>> at last minute, which happens after device reset in 
>>> xen_pcibk_xenbus_remove().
>>
>>But that may need taking care of: I don't think it is a good idea to have
>>anything left from the prior owning domain when the device gets reset.
>>I.e. left over IRQ bindings should perhaps be forcibly cleared before
>>invoking the reset;
> 
> Agree. How about pciback to track the established IRQ bindings? Then
> pciback can clear irq binding before invoking the reset.

How would pciback even know of those mappings, when it's qemu
who establishes (and manages) them?

>>in fact I'd expect this to happen in the course of
>>domain destruction, and I'd expect the device reset to come after the
>>domain was cleaned up. Perhaps simply an ordering issue in the tool
>>stack?
> 
> I don't think reversing the sequences of device reset and domain
> destruction would be simple. Furthermore, during device hot-unplug,
> device reset is done when the owner is alive. So if we use domain
> destruction to enforce all irq binding cleared, in theory, it won't be
> applicable to hot-unplug case (if qemu's hot-unplug logic is
> compromised).

Even in the hot-unplug case the tool stack could issue unbind
requests, behind the back of the possibly compromised qemu,
once neither the guest nor qemu have access to the device
anymore.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v11] tolerate jitter in cpu_khz calculation to avoid TSC emulation

2018-12-12 Thread Olaf Hering
Improve decision when vTSC emulation will be activated for a domU with
tsc_mode=default. The current approach is to compare the cpu_khz value
from two physical hosts. Since this value is not accurate, it can not be
used verbatim to decide if vTSC emulation needs to be enabled. Without
this change each TSC access from domU will be emulated after migration,
which causes a significant perfomance drop for workloads that make use
of rdtsc.

If a domU uses TSC as clocksoure it also must run NTP in some way to
avoid the potential drift what will most likely happen, independent of
any migration. The calculation of the drift is based on the time
returned by remote servers versus how fast the local clock advances. NTP
can handle a drift up to 500PPM. This means the local clocksource can
run up to 500us slower or faster. This calculation is based on the TSC
frequency of the host where the domU was started. Once a domU is
migrated to a host of a different class, like from "2.3GHz" to "2.4GHz",
the TSC frequency changes, but the domU kernel may not recalibrate
itself. As a result, the drift will be larger and might be outside of
the 500 PPM range. In addition, the kernel may notice the change of
speed in which the TSC advances and could change the clocksource. All
this depends of course on the type of OS that is running in the domU.

If the domU is migrated to another host of the same class, both hosts
may have a slightly different TSC frequency. The difference is small
enough and most likely within the drift range that NTP can handle.

The formula to set the tolerance for this host calculates the ticks
within a timespan of 500 PPM, which is 500us. From this number the
assumed jitter in the TSC frequency measurement must be substracted
because Xen itself can not know if the estimated value in cpu_khz is at
the edge or in the middle of the range of possible freqencies. Data
collected during the incident which triggered this change showed a
jitter of up to 200 KHz across systems of the same class. The resulting
tolerance is larger than needed, and it is expected to still cover the
possible drift that NTP can handle.

Signed-off-by: Olaf Hering 
--

v11:
 - trim patch and use calculated tolerance value, no admin interaction
   required
v10:
 - rebase to ae01a8e315
 - remove changes for libxl and save/restore protocol, the feature has
   to be per host instead of per guest
 - add newline to tsc_set_info (Andrew)
 - add pointer to xen-tscmode(7) in xl.cfg(5)/vtsc_tolerance_khz (Andrew)
 - mention potential clock drift in the domU (Andrew)
 - reword the newly added paragraph in xen-tscmode(7) (Andrew),
   and also mention that it is about the measured/estimated TSC value
   rather than the real value. The latter is simply unknown.
 - use uint32 for internal representation of 
xen_domctl_tsc_info.vtsc_tolerance_khz
   and remove padding field
 - add math for real TSC frequency to xen-tscmode
v9:
 - extend commit msg, mention potential issues with xc_sr_rec_tsc_info._res1
v8:
 - adjust also python stream checker for added tolerance member
v7:
 - use uint16 in libxl_types.idl to match type used elsewhere in the patch
v6:
 - mention default value in xl.cfg
 - tsc_set_info: remove usage of __func__, use %d for domid
 - tsc_set_info: use ABS to calculate khz_diff
v5:
 - reduce functionality to allow setting of the tolerance value
   only at initial domU startup
v4:
 - add missing copyback in XEN_DOMCTL_set_vtsc_tolerance_khz
v3:
 - rename vtsc_khz_tolerance to vtsc_tolerance_khz
 - separate domctls to adjust values
 - more docs
 - update libxl.h
 - update python tests
 - flask check bound to tsc permissions
 - not runtime tested due to dlsym() build errors in staging
---
 xen/arch/x86/time.c | 33 -
 1 file changed, 32 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
index 24d4c2794b..2ffdc2ea8f 100644
--- a/xen/arch/x86/time.c
+++ b/xen/arch/x86/time.c
@@ -43,6 +43,9 @@ static char __initdata opt_clocksource[10];
 string_param("clocksource", opt_clocksource);
 
 unsigned long __read_mostly cpu_khz;  /* CPU clock frequency in kHz. */
+#define VTSC_NTP_PPM_TOLERANCE 500UL  /* Amount of drift NTP will handle */
+#define VTSC_JITTER_RANGE_KHZ 200UL   /* Assumed jitter in cpu_khz */
+static unsigned int __read_mostly vtsc_tolerance_khz;
 DEFINE_SPINLOCK(rtc_lock);
 unsigned long pit0_ticks;
 
@@ -1885,6 +1888,16 @@ void __init early_time_init(void)
 printk("Detected %lu.%03lu MHz processor.\n", 
cpu_khz / 1000, cpu_khz % 1000);
 
+tmp = 1000 * 1000;
+tmp += VTSC_NTP_PPM_TOLERANCE;
+tmp *= cpu_khz;
+tmp /= 1000 * 1000;
+tmp -= cpu_khz;
+if (tmp >= VTSC_JITTER_RANGE_KHZ)
+tmp -= VTSC_JITTER_RANGE_KHZ;
+vtsc_tolerance_khz = (unsigned int)tmp;
+printk("Tolerating vtsc jitter for domUs: %u kHz.\n", vtsc_tolerance_khz);
+
 setup_irq(0, 0, );
 }
 
@@ -2208,6 +2221,7 @@ void tsc_set_info(struct domain *d,
 
 

Re: [Xen-devel] [PATCH] xen: xen-pciback: Reset MSI-X state when exposing a device

2018-12-12 Thread Chao Gao
On Wed, Dec 12, 2018 at 01:51:01AM -0700, Jan Beulich wrote:
 On 12.12.18 at 08:06,  wrote:
>> On Wed, Dec 05, 2018 at 09:01:33AM -0500, Boris Ostrovsky wrote:
>>>On 12/5/18 4:32 AM, Roger Pau Monné wrote:
 On Wed, Dec 05, 2018 at 10:19:17AM +0800, Chao Gao wrote:
> I find some pass-thru devices don't work any more across guest reboot.
> Assigning it to another guest also meets the same issue. And the only
> way to make it work again is un-binding and binding it to pciback.
> Someone reported this issue one year ago [1]. More detail also can be
> found in [2].
>
> The root-cause is Xen's internal MSI-X state isn't reset properly
> during reboot or re-assignment. In the above case, Xen set maskall bit
> to mask all MSI interrupts after it detected a potential security
> issue. Even after device reset, Xen didn't reset its internal maskall
> bit. As a result, maskall bit would be set again in next write to
> MSI-X message control register.
>
> Given that PHYSDEVOPS_prepare_msix() also triggers Xen resetting MSI-X
> internal state of a device, we employ it to fix this issue rather than
> introducing another dedicated sub-hypercall.
>
> Note that PHYSDEVOPS_release_msix() will fail if the mapping between
> the device's msix and pirq has been created. This limitation prevents
> us calling this function when detaching a device from a guest during
> guest shutdown. Thus it is called right before calling
> PHYSDEVOPS_prepare_msix().
 s/PHYSDEVOPS/PHYSDEVOP/ (no final S). And then I would also drop the
 () at the end of the hypercall name since it's not a function.

 I'm also wondering why the release can't be done when the device is
 detached from the guest (or the guest has been shut down). This makes
 me worry about the raciness of the attach/detach procedure: if there's
 a state where pciback assumes the device has been detached from the
 guest, but there are still pirqs bound, an attempt to attach to
 another guest in such state will fail.
>>>
>>>I wonder whether this additional reset functionality could be done out
>>>of xen_pcibk_xenbus_remove(). We first do a (best effort) device reset
>>>and then do the extra things that are not properly done there.
>> 
>> No. It cannot be done in xen_pcibk_xenbus_remove() without modifying
>> the handler of PHYSDEVOP_release_msix. To do a successful Xen internal
>> MSI-X state reset, PHYSDEVOP_{release, prepare}_msix should be finished
>> without error. But ATM, xen expects that no msi is bound to pirq when
>> doing PHYSDEVOP_release_msix. Otherwise it fails with error code -EBUSY.
>> However, the expectation isn't guaranteed in xen_pcibk_xenbus_remove().
>> In some cases, if qemu fails to unmap MSIs, MSIs are unmapped by Xen
>> at last minute, which happens after device reset in 
>> xen_pcibk_xenbus_remove().
>
>But that may need taking care of: I don't think it is a good idea to have
>anything left from the prior owning domain when the device gets reset.
>I.e. left over IRQ bindings should perhaps be forcibly cleared before
>invoking the reset;

Agree. How about pciback to track the established IRQ bindings? Then
pciback can clear irq binding before invoking the reset.

>in fact I'd expect this to happen in the course of
>domain destruction, and I'd expect the device reset to come after the
>domain was cleaned up. Perhaps simply an ordering issue in the tool
>stack?

I don't think reversing the sequences of device reset and domain
destruction would be simple. Furthermore, during device hot-unplug,
device reset is done when the owner is alive. So if we use domain
destruction to enforce all irq binding cleared, in theory, it won't be
applicable to hot-unplug case (if qemu's hot-unplug logic is
compromised).

Thanks
Chao

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC 11/16] irq: skip action avalability check for guest's IRQ

2018-12-12 Thread Julien Grall



On 12/12/2018 14:58, Andrii Anisov wrote:



On 12.12.18 16:49, Julien Grall wrote:
ASSERT only tells you that desc->action was NULL. It does not tell you which 
IRQ has the desc->action == NULL.

Ah, yes.

It is a bit a shame we don't have way to provide another message with ASSERT 
to help you debugging.

We might have implemented an assert with a message.
And guys on stackoverflow suggest for workaround `ASSERT(condition && "error 
message")`.


That's not going to help for showing the IRQ number.



No, I am suggesting to have the current if ( test_bit(...) ) protecred with 
#ifndef NDEBUG. No code duplication.

I would not like to leave even non-debug build without the check.


The check is mainly there to catch error in debug build. So I am quite confused 
what is your goal by moving the check. Is it just because it adds a couple more 
instruction in the guest case?


Cheers,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC 11/16] irq: skip action avalability check for guest's IRQ

2018-12-12 Thread Andrii Anisov



On 12.12.18 16:49, Julien Grall wrote:

ASSERT only tells you that desc->action was NULL. It does not tell you which IRQ 
has the desc->action == NULL.

Ah, yes.


It is a bit a shame we don't have way to provide another message with ASSERT to 
help you debugging.

We might have implemented an assert with a message.
And guys on stackoverflow suggest for workaround `ASSERT(condition && "error 
message")`.


No, I am suggesting to have the current if ( test_bit(...) ) protecred with 
#ifndef NDEBUG. No code duplication.

I would not like to leave even non-debug build without the check.

--
Sincerely,
Andrii Anisov.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH] x86: fix paging_log_dirty_op to work with paging guests

2018-12-12 Thread Roger Pau Monne
When the caller of paging_log_dirty_op is a paging mode guest Xen
would choke when trying to copy the dirty bitmap to the guest provided
buffer because the paging lock of the target is already locked, and
trying to lock the paging lock of the caller will cause the mm lock
order checks to trigger:

(XEN) mm locking order violation: 64 > 16
(XEN) Xen BUG at ./mm-locks.h:143
(XEN) [ Xen-4.12-unstable  x86_64  debug=y   Tainted:  C   ]
(XEN) CPU:4
(XEN) RIP:e008:[] p2m.c#_mm_read_lock+0x41/0x50
(XEN) RFLAGS: 00010286   CONTEXT: hypervisor (d0v3)
[...]
(XEN) Xen call trace:
(XEN)[] p2m.c#_mm_read_lock+0x41/0x50
(XEN)[] vmac.c#p2m_get_page_from_gfn+0x46/0x200
(XEN)[] vmac.c#hap_p2m_ga_to_gfn_4_levels+0x4a/0x270
(XEN)[] vmac.c#hvm_translate_get_page+0x33/0x1c0
(XEN)[] hvm.c#__hvm_copy+0x8d/0x230
(XEN)[] vmac.c#hvm_copy_to_guest_linear+0x46/0x60
(XEN)[] vmac.c#copy_to_user_hvm+0x79/0x90
(XEN)[] paging.c#paging_log_dirty_op+0x2c9/0x6d0
(XEN)[] vmac.c#arch_do_domctl+0x63e/0x1c00
(XEN)[] vmac.c#do_domctl+0x450/0x1020
(XEN)[] vmac.c#hvm_hypercall+0x1c9/0x4b0
(XEN)[] vmac.c#vmx_vmexit_handler+0x6c1/0xea0
(XEN)[] vmac.c#vmx_asm_vmexit_handler+0xfa/0x270
(XEN)
(XEN)
(XEN) 
(XEN) Panic on CPU 4:
(XEN) Xen BUG at ./mm-locks.h:143
(XEN) 

Fix this by releasing the target paging lock before attempting to
perform the copy of the dirty bitmap, and then forcing a restart of
the whole process in case there have been changes to the dirty bitmap
tables.

Note that the path for non-paging guests remains the same, and the
paging lock is not dropped in that case.

Signed-off-by: Roger Pau Monné 
---
Cc: George Dunlap 
Cc: Jan Beulich 
Cc: Andrew Cooper 
Cc: Wei Liu 
---
 xen/arch/x86/mm/paging.c | 37 +---
 xen/include/asm-x86/domain.h |  1 +
 2 files changed, 35 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/mm/paging.c b/xen/arch/x86/mm/paging.c
index d5836eb688..752f827f38 100644
--- a/xen/arch/x86/mm/paging.c
+++ b/xen/arch/x86/mm/paging.c
@@ -408,6 +408,7 @@ static int paging_log_dirty_op(struct domain *d,
 unsigned long *l1 = NULL;
 int i4, i3, i2;
 
+ again:
 if ( !resuming )
 {
 /*
@@ -468,17 +469,18 @@ static int paging_log_dirty_op(struct domain *d,
 l4 = paging_map_log_dirty_bitmap(d);
 i4 = d->arch.paging.preempt.log_dirty.i4;
 i3 = d->arch.paging.preempt.log_dirty.i3;
+i2 = d->arch.paging.preempt.log_dirty.i2;
 pages = d->arch.paging.preempt.log_dirty.done;
 
 for ( ; (pages < sc->pages) && (i4 < LOGDIRTY_NODE_ENTRIES); i4++, i3 = 0 )
 {
 l3 = (l4 && mfn_valid(l4[i4])) ? map_domain_page(l4[i4]) : NULL;
-for ( ; (pages < sc->pages) && (i3 < LOGDIRTY_NODE_ENTRIES); i3++ )
+for ( ; (pages < sc->pages) && (i3 < LOGDIRTY_NODE_ENTRIES);
+  i3++, i2 = 0 )
 {
 l2 = ((l3 && mfn_valid(l3[i3])) ?
   map_domain_page(l3[i3]) : NULL);
-for ( i2 = 0;
-  (pages < sc->pages) && (i2 < LOGDIRTY_NODE_ENTRIES);
+for ( ; (pages < sc->pages) && (i2 < LOGDIRTY_NODE_ENTRIES);
   i2++ )
 {
 unsigned int bytes = PAGE_SIZE;
@@ -488,6 +490,16 @@ static int paging_log_dirty_op(struct domain *d,
 bytes = (unsigned int)((sc->pages - pages + 7) >> 3);
 if ( likely(peek) )
 {
+if ( paging_mode_enabled(current->domain) )
+/*
+ * Drop the target p2m lock, or else Xen will panic
+ * when trying to acquire the p2m lock of the caller
+ * due to invalid lock order. Note that there are no
+ * lock ordering issues here, and the panic is due to
+ * the fact that the lock level tracking doesn't record
+ * the domain the lock belongs to.
+ */
+paging_unlock(d);
 if ( (l1 ? copy_to_guest_offset(sc->dirty_bitmap,
 pages >> 3, (uint8_t *)l1,
 bytes)
@@ -505,6 +517,23 @@ static int paging_log_dirty_op(struct domain *d,
 clear_page(l1);
 unmap_domain_page(l1);
 }
+if ( paging_mode_enabled(current->domain) && peek )
+{
+d->arch.paging.preempt.log_dirty.i4 = i4;
+d->arch.paging.preempt.log_dirty.i3 = i3;
+d->arch.paging.preempt.log_dirty.i2 = i2 + 1;
+d->arch.paging.preempt.log_dirty.done = pages;
+d->arch.paging.preempt.dom = current->domain;
+

[Xen-devel] [PATCH] libxl_create: Re-order callbacks of initiate_domain_create

2018-12-12 Thread Anthony PERARD
Callbacks should be in the order that there are going to be executed.
This patch fix the initiate_domain_create callbacks, and also reorder
the callbacks prototytes. That way, it's easier to follow the flow.

This patch:
- move libxl__colo_restore_setup_done after domcreate_bootloader_done.
- move domcreate_attach_devices after domcreate_devmodel_started.

No functional change.

Signed-off-by: Anthony PERARD 
---
 tools/libxl/libxl_create.c | 123 +++--
 1 file changed, 63 insertions(+), 60 deletions(-)

diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index fa573344bc..891175b15b 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -771,28 +771,31 @@ static int store_libxl_entry(libxl__gc *gc, uint32_t 
domid,
  */
 
 /* Event callbacks, in this order: */
-static void domcreate_devmodel_started(libxl__egc *egc,
-   libxl__dm_spawn_state *dmss,
-   int rc);
 static void domcreate_bootloader_console_available(libxl__egc *egc,
libxl__bootloader_state 
*bl);
-static void domcreate_bootloader_done(libxl__egc *egc,
-  libxl__bootloader_state *bl,
-  int rc);
-
-static void domcreate_launch_dm(libxl__egc *egc, libxl__multidev *aodevs,
-int ret);
-
 static void domcreate_console_available(libxl__egc *egc,
 libxl__domain_create_state *dcs);
 
+static void domcreate_bootloader_done(libxl__egc *egc,
+  libxl__bootloader_state *bl,
+  int rc);
+static void libxl__colo_restore_setup_done(libxl__egc *egc,
+   libxl__colo_restore_state *crs,
+   int rc);
 static void domcreate_stream_done(libxl__egc *egc,
   libxl__stream_read_state *srs,
   int ret);
-
 static void domcreate_rebuild_done(libxl__egc *egc,
libxl__domain_create_state *dcs,
int ret);
+static void domcreate_launch_dm(libxl__egc *egc, libxl__multidev *aodevs,
+int ret);
+static void domcreate_devmodel_started(libxl__egc *egc,
+   libxl__dm_spawn_state *dmss,
+   int rc);
+static void domcreate_attach_devices(libxl__egc *egc,
+ libxl__multidev *multidev,
+ int ret);
 
 /* Our own function to clean up and call the user's callback.
  * The final call in the sequence. */
@@ -1031,23 +1034,6 @@ static void domcreate_console_available(libxl__egc *egc,
 dcs->aop_console_how.for_event));
 }
 
-static void libxl__colo_restore_setup_done(libxl__egc *egc,
-   libxl__colo_restore_state *crs,
-   int rc)
-{
-libxl__domain_create_state *dcs = CONTAINER_OF(crs, *dcs, crs);
-
-EGC_GC;
-
-if (rc) {
-LOGD(ERROR, dcs->guest_domid, "colo restore setup fails: %d", rc);
-domcreate_stream_done(egc, >srs, rc);
-return;
-}
-
-libxl__stream_read_start(egc, >srs);
-}
-
 static void domcreate_bootloader_done(libxl__egc *egc,
   libxl__bootloader_state *bl,
   int rc)
@@ -1145,6 +1131,23 @@ static void domcreate_bootloader_done(libxl__egc *egc,
 domcreate_stream_done(egc, >srs, rc);
 }
 
+static void libxl__colo_restore_setup_done(libxl__egc *egc,
+   libxl__colo_restore_state *crs,
+   int rc)
+{
+libxl__domain_create_state *dcs = CONTAINER_OF(crs, *dcs, crs);
+
+EGC_GC;
+
+if (rc) {
+LOGD(ERROR, dcs->guest_domid, "colo restore setup fails: %d", rc);
+domcreate_stream_done(egc, >srs, rc);
+return;
+}
+
+libxl__stream_read_start(egc, >srs);
+}
+
 void libxl__srm_callout_callback_restore_results(xen_pfn_t store_mfn,
   xen_pfn_t console_mfn, void *user)
 {
@@ -1509,6 +1512,38 @@ const struct libxl_device_type *device_type_tbl[] = {
 NULL
 };
 
+static void domcreate_devmodel_started(libxl__egc *egc,
+   libxl__dm_spawn_state *dmss,
+   int ret)
+{
+libxl__domain_create_state *dcs = CONTAINER_OF(dmss, *dcs, sdss.dm);
+STATE_AO_GC(dmss->spawn.ao);
+int domid = dcs->guest_domid;
+
+/* convenience aliases */
+libxl_domain_config *const d_config = dcs->guest_config;
+
+if (ret) {
+LOGD(ERROR, domid, "device model did not start: 

Re: [Xen-devel] [RFC 11/16] irq: skip action avalability check for guest's IRQ

2018-12-12 Thread Julien Grall

Hi,

On 12/12/2018 13:51, Andrii Anisov wrote:

Julien,


On 12.12.18 13:59, Julien Grall wrote:

An ASSERT(...) is already embed in irq_get_guest_info(desc).

Yep.

I thought about the ASSERT(...) over the current printk yesterday. But I 
discarded it because it does not give you more information if something went 
really wrong as the stack trace would not really be helpful in that context.

But ASSERT also prints a failed condition text.


ASSERT only tells you that desc->action was NULL. It does not tell you which IRQ 
has the desc->action == NULL. It is a bit a shame we don't have way to provide 
another message with ASSERT to help you debugging.





So I would prefer the #ifdef NDEBUG solution.

I'm a bit confused about your suggestion.
Do you want to see two identical pieces of code before and after the `if ( 
test_bit(_IRQ_GUEST, >status) )`? One with `#ifndef NDEBUG`, other without.

Is my understanding correct?


No, I am suggesting to have the current if ( test_bit(...) ) protecred with 
#ifndef NDEBUG. No code duplication.


Cheers,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [xen-4.10-testing test] 131151: regressions - FAIL

2018-12-12 Thread Ian Jackson
Jan Beulich writes ("Re: [Xen-devel] [xen-4.10-testing test] 131151: 
regressions - FAIL"):
> So this can, by now, be called a reliably recurring failure.

I concur with the assertions made on irc that 129676 got an (un)lucky
pass and that it should be force pushed.  So I have done that, with
  b6e203bc80e9d3e1dc7eb579d9665a77700d78cc

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC 11/16] irq: skip action avalability check for guest's IRQ

2018-12-12 Thread Andrii Anisov

Julien,


On 12.12.18 13:59, Julien Grall wrote:

An ASSERT(...) is already embed in irq_get_guest_info(desc).

Yep.


I thought about the ASSERT(...) over the current printk yesterday. But I 
discarded it because it does not give you more information if something went 
really wrong as the stack trace would not really be helpful in that context.

But ASSERT also prints a failed condition text.


So I would prefer the #ifdef NDEBUG solution.

I'm a bit confused about your suggestion.
Do you want to see two identical pieces of code before and after the `if ( 
test_bit(_IRQ_GUEST, >status) )`? One with `#ifndef NDEBUG`, other 
without.
Is my understanding correct?


--
Sincerely,
Andrii Anisov.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC 08/16] gic: separate ppi processing

2018-12-12 Thread Julien Grall

Hi,

On 12/12/2018 12:52, Andrii Anisov wrote:



On 12.12.18 14:46, Julien Grall wrote:

I haven't answered because I am waiting the numbers.

I got it.
But TBM will not show effect of this patch.


If TBM does not show effect, then you need a different benchmark. Probably by 
looking at how the PPI latency in Xen.


Even the patch breaks TBM's 
functionality because TBM relies on a phys timer interrupt to be assigned to the 
domain (go through the spi path).


Well, the problem is your patch assumes PPI cannot be assigned to the guest. 
While this is the case today, there are plan for handing over PPI (such as 
timer) to guest. This will help to remove the hack we have for the timer in both 
Xen code base and Guest code base.


I would actually prefer if we try to optimize do_IRQ. Actually, this patch has 
some good ideas that can be re-used in the current code.


For instance, I think you can drop the logic that use _IRQ_PENDING because the 
interrupt can never be received to another PE while active (this will be cleared 
by desc->handler->end()).


Similarly the test on _IRQ_INPROGRESS could be dropped. Although, I would still 
like to keep set/clear _IRQ_INPROGRESS.


With those changes, do_IRQ becomes a lot more like the function do_ppi. The only 
difference is the spinlock. But I don't believe this will bring that much 
difference in performance impact. After all the lock should unlikely be contended.


Cheers,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC 08/16] gic: separate ppi processing

2018-12-12 Thread Andrii Anisov



On 12.12.18 14:46, Julien Grall wrote:

I haven't answered because I am waiting the numbers.

I got it.
But TBM will not show effect of this patch. Even the patch breaks TBM's 
functionality because TBM relies on a phys timer interrupt to be assigned to 
the domain (go through the spi path).

--
Sincerely,
Andrii Anisov.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC 13/16] gic-vgic: skip irqs locking

2018-12-12 Thread Andrii Anisov



On 12.12.18 14:44, Julien Grall wrote:

At the moment, I am happy with the second chunk of this patch to go. I am still 
unconvinced #1 is the right thing to go.

I got it. So keep the patch still for now.


I plan to send separately patches you have reviewed first. Then evaluate the 
rest with TBM. Does it fit your vision?

Can you send separately patches I said I am happy with for Xen 4.12? We can 
then continue to discuss the others with numbers and see if we can merge them 
in time.

Yep.

--
Sincerely,
Andrii Anisov.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC 08/16] gic: separate ppi processing

2018-12-12 Thread Julien Grall

Hi,

On 12/12/2018 12:37, Andrii Anisov wrote:

Julien,

Sorry for bothering you too much.
What about this patch?


I haven't answered because I am waiting the numbers.

Cheers,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC 13/16] gic-vgic: skip irqs locking

2018-12-12 Thread Julien Grall

Hi Andrii,

On 12/12/2018 12:35, Andrii Anisov wrote:



On 12.12.18 14:07, Julien Grall wrote:

This chunk relies on patch #1, am I correct?

For sure, it is.

 If so, this should be written in the commit message that this was introduced 
recently.

 This helps to figure out whether the patch can be merged before the rest.

Do you mean I can prepare it for 4.12 together with #1?


At the moment, I am happy with the second chunk of this patch to go. I am still 
unconvinced #1 is the right thing to go.




I plan to send separately patches you have reviewed first. Then evaluate the 
rest with TBM. Does it fit your vision?
Can you send separately patches I said I am happy with for Xen 4.12? We can then 
continue to discuss the others with numbers and see if we can merge them in time.


Cheers,


--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC 08/16] gic: separate ppi processing

2018-12-12 Thread Andrii Anisov

Julien,

Sorry for bothering you too much.
What about this patch?

--
Sincerely,
Andrii Anisov.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC 13/16] gic-vgic: skip irqs locking

2018-12-12 Thread Andrii Anisov



On 12.12.18 14:07, Julien Grall wrote:

This chunk relies on patch #1, am I correct?

For sure, it is.


 If so, this should be written in the commit message that this was introduced 
recently.
 This helps to figure out whether the patch can be merged before the rest.

Do you mean I can prepare it for 4.12 together with #1?

I plan to send separately patches you have reviewed first. Then evaluate the 
rest with TBM. Does it fit your vision?


I would add an ASSERT(!local_is_irq_enabled()) on top to show that this should 
be called with interrupt disabled.

OK.

--
Sincerely,
Andrii Anisov.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Interrupt injection with ISR set on Intel hardware

2018-12-12 Thread Roger Pau Monné
On Wed, Dec 12, 2018 at 11:48:52AM +, Tian, Kevin wrote:
> > From: Roger Pau Monné [mailto:roger@citrix.com]
> > Sent: Wednesday, December 12, 2018 7:25 PM
> > 
> > On Wed, Dec 12, 2018 at 10:36:44AM +, Tian, Kevin wrote:
> > > > From: Roger Pau Monné [mailto:roger@citrix.com]
> > > > Sent: Monday, October 15, 2018 6:30 PM
> > > > (XEN)   [22642] POWERTYPE 4
> > > > (XEN)   [22643] IDLE PPR 0x0020
> > > > (XEN)IRR
> > > >
> > 00
> > > > 00
> > > > (XEN)ISR
> > > >
> > 02
> > > > 00
> > > > (XEN)   [22644] WAKE PPR 0x0020
> > > > (XEN)IRR
> > > >
> > 02
> > > > 00
> > > > (XEN)ISR
> > > >
> > 02
> > > > 00
> > >
> > > looks pending IRR (0x21) doesn't always trigger a spurious interrupt?
> > 
> > Yes, that's correct. Having a pending IRR and going idle doesn't
> > always trigger the spurious interrupt re-injection.
> > 
> > > is it a fixed pattern after how many rounds of Cstate enter/exit with
> > > pending IRR(0x21) then you see assertion happened (in this example
> > > it happens at 3rd time)?
> > 
> > It's not a fixed pattern, here's another trace with IRR(0x21) being
> > pending just once during the Cstate transitions:
> 
> did you observe a case where such asset may occur when IRR(0x21)
> is cleared but ISR (0x21) is set?

No, I've always seen both ISR and IRR set when the interrupt injection
happens. This of course doesn't mean it's not possible, but I have not
seen any trace with ISR(0x21) set and IRR(0x21) clear.

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC 06/16] gic: drop interrupts enabling on interrupts processing

2018-12-12 Thread Julien Grall

Hi,

On 28/11/2018 22:06, Julien Grall wrote:

On 28/11/2018 21:32, Andrii Anisov wrote:

From: Andrii Anisov 

This reduces the number of context switches in case we have an
interrupt storm. We will read out all of those interrupt in the
loop anyway.


This needs a better explanation. You might want to have a look at the details I 
provided in another discussion.


In general, I would like any changes in the vGIC to be very detailed. As they 
are usually subbtle implication on the rest of the code base.


This patch is suitable for Xen 4.12 with more comments in the commit message.

Cheers,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC 13/16] gic-vgic: skip irqs locking

2018-12-12 Thread Julien Grall

Hi,

On 28/11/2018 21:32, Andrii Anisov wrote:

From: Andrii Anisov 

Those fucntions are called under IRQs disabled already, so avoid


s/fucntions/


additional flags saving and restore.

Signed-off-by: Andrii Anisov 
---
  xen/arch/arm/gic-vgic.c | 10 --
  1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/xen/arch/arm/gic-vgic.c b/xen/arch/arm/gic-vgic.c
index 5b73bbd..e511f91 100644
--- a/xen/arch/arm/gic-vgic.c
+++ b/xen/arch/arm/gic-vgic.c
@@ -268,7 +268,6 @@ static void gic_update_one_lr(struct vcpu *v, int i)
  void vgic_sync_from_lrs(struct vcpu *v)
  {
  int i = 0;
-unsigned long flags;
  unsigned int nr_lrs = gic_get_nr_lrs();
  
  /* The idle domain has no LRs to be cleared. Since gic_restore_state

@@ -279,7 +278,7 @@ void vgic_sync_from_lrs(struct vcpu *v)
  
  gic_hw_ops->update_hcr_status(GICH_HCR_UIE, false);
  
-spin_lock_irqsave(>arch.vgic.lock, flags);

+spin_lock(>arch.vgic.lock);


This chunk relies on patch #1, am I correct? If so, this should be written in 
the commit message that this was introduced recently. This helps to figure out 
whether the patch can be merged before the rest.


  
  while ((i = find_next_bit((const unsigned long *) _cpu(lr_mask),

nr_lrs, i)) < nr_lrs ) {
@@ -287,7 +286,7 @@ void vgic_sync_from_lrs(struct vcpu *v)
  i++;
  }
  
-spin_unlock_irqrestore(>arch.vgic.lock, flags);

+spin_unlock(>arch.vgic.lock);
  }
  
  static void gic_restore_pending_irqs(struct vcpu *v)

@@ -295,11 +294,10 @@ static void gic_restore_pending_irqs(struct vcpu *v)
  int lr = 0;
  struct pending_irq *p, *t, *p_r;
  struct list_head *inflight_r;
-unsigned long flags;
  unsigned int nr_lrs = gic_get_nr_lrs();
  int lrs = nr_lrs;
  
-spin_lock_irqsave(>arch.vgic.lock, flags);

+spin_lock(>arch.vgic.lock);


I would add an ASSERT(!local_is_irq_enabled()) on top to show that this should 
be called with interrupt disabled.


  
  if ( list_empty(>arch.vgic.lr_pending) )

  goto out;
@@ -343,7 +341,7 @@ found:
  }
  
  out:

-spin_unlock_irqrestore(>arch.vgic.lock, flags);
+spin_unlock(>arch.vgic.lock);
  }
  
  void gic_clear_pending_irqs(struct vcpu *v)




Cheers,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [Qemu-devel] xen_disk qdevification

2018-12-12 Thread Markus Armbruster
Olaf Hering  writes:

> On Fri, Nov 02, Kevin Wolf wrote:
>
>> A while ago, a downstream patch review found out that there are some QMP
>> commands that would immediately crash if a xen_disk device were present
>> because of the lacking qdevification. This is not the code quality
>> standard I envision for QEMU. It's time for non-qdev devices to go.
>
> Do you have that backwards by any chance? IMO the presence of assert()
> contributes to bad code quality, not the drivers that trigger those
> asserts. It is bad enough that two QEMU releases went out while being in
> bad shape.

Converting block devices to the qdev infrastructure (introduced in 2009)
has been a longwinded affair.  We've repeatedly reminded people in
charge of the stragglers to update their code.

Neglecting to update code to current infrastructure creates a burden and
a risk.  The burden is on the maintainers of the infrastructure: they
get to drag along outmoded infrastructure.  The risk is on the users of
the code: it becomes a special case, and eventually acquires its special
bugs.

Putting the blame for them entirely on the maintainers of the
infrastructure is not fair.

> Anyway, hopefully Paul or whoever will find the time and energy to
> convert the code at some point.

Yes, Paul's taking care of it.  Much appreciated.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] xen_disk qdevification (was: [PATCH 0/3] Performance improvements for xen_disk v2)

2018-12-12 Thread Kevin Wolf
Am 12.12.2018 um 09:59 hat Olaf Hering geschrieben:
> On Fri, Nov 02, Kevin Wolf wrote:
> 
> > A while ago, a downstream patch review found out that there are some QMP
> > commands that would immediately crash if a xen_disk device were present
> > because of the lacking qdevification. This is not the code quality
> > standard I envision for QEMU. It's time for non-qdev devices to go.
> 
> Do you have that backwards by any chance? IMO the presence of assert()
> contributes to bad code quality, not the drivers that trigger those
> asserts.

You like shooting the messenger, it seems? Bugs aren't bad, only
catching them is?

But anyway, in this case, I seem to remember it was a plain old
segfault, not a failed assertion.

Kevin


signature.asc
Description: PGP signature
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC 11/16] irq: skip action avalability check for guest's IRQ

2018-12-12 Thread Julien Grall



On 12/12/2018 11:30, Andrii Anisov wrote:



On 11.12.18 16:48, Julien Grall wrote:

And you can't see any potential race in that code happening in the future?
It is protected with `desc->lock` so far. If one decided to get it from under 
the lock, the race is possible with `release_irq()`.



Also getting an unknown
interrupt is very unlikely on a non-debug platform.


I am tempted to keep the code at the same place but protect with an #ifndef 
NDEBUG. What do you think?

Well, I think about a correspondent ASSERT for the guest IRQ. Like following:


  if ( test_bit(_IRQ_GUEST, >status) )
  {>>   struct irq_guest *info = irq_get_guest_info(desc);

+    ASSERT( desc->action != NULL );

What would you prefer?


An ASSERT(...) is already embed in irq_get_guest_info(desc).

I thought about the ASSERT(...) over the current printk yesterday. But I 
discarded it because it does not give you more information if something went 
really wrong as the stack trace would not really be helpful in that context.


So I would prefer the #ifdef NDEBUG solution.

Cheers,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [Qemu-devel] [PATCH for-3.2 v5 07/19] hw: apply accel compat properties without touching globals

2018-12-12 Thread Marc-André Lureau
Hi
On Mon, Dec 10, 2018 at 8:55 PM Igor Mammedov  wrote:
>
> On Mon, 10 Dec 2018 17:45:22 +0100
> Igor Mammedov  wrote:
>
> > On Tue,  4 Dec 2018 18:20:11 +0400
> > Marc-André Lureau  wrote:
> >
> > > Instead of registering compat properties as globals, let's keep them
> > > in their own array, to avoid mixing with user globals.
> > >
> > > Introduce object_apply_global_props() function, to apply compatibility
> > > properties from a GPtrArray.
> > >
> > > Signed-off-by: Marc-André Lureau 
> > > ---
> > >  include/hw/qdev-core.h | 10 ++
> > >  include/qom/object.h   |  3 +++
> > >  include/sysemu/accel.h |  4 +---
> > >  accel/accel.c  | 12 
> > >  hw/core/qdev.c |  9 +
> > >  hw/xen/xen-common.c|  9 ++---
> > >  qom/object.c   | 25 +
> > >  vl.c   |  1 -
> > >  8 files changed, 54 insertions(+), 19 deletions(-)
> > >
> > [...]
> > > diff --git a/hw/xen/xen-common.c b/hw/xen/xen-common.c
> > > index 6ec14c73ca..4532aa8632 100644
> > > --- a/hw/xen/xen-common.c
> > > +++ b/hw/xen/xen-common.c
> > > @@ -174,18 +174,21 @@ static GlobalProperty xen_compat_props[] = {
> > >  .driver = "migration",
> > >  .property = "send-section-footer",
> > >  .value = "off",
> > > -},
> > > -{ /* end of list */ },
> > > +}
> > >  };
> > >
> > >  static void xen_accel_class_init(ObjectClass *oc, void *data)
> > >  {
> > >  AccelClass *ac = ACCEL_CLASS(oc);
> > > +
> > >  ac->name = "Xen";
> > >  ac->init_machine = xen_init;
> > >  ac->setup_post = xen_setup_post;
> > >  ac->allowed = _allowed;
> > > -ac->global_props = xen_compat_props;
> > > +ac->compat_props = g_ptr_array_new();
> > where is matching free for that?
> can we at least annotate it somehow so that valgrind won't complain about 
> this leak?

If you check my commits on qemu, you should see that I do care (too
much?) about leaks :)

In this case though, I don't see valgrind or asan complaining, I guess
it's still a reachable reference.
Do you think a FIXME comment would be helpful?

(/me wish we had a proper object system, GObject, but that ship as
long sailed..)

>
> >
> > > +
> > > +compat_props_add(ac->compat_props,
> > > + xen_compat_props, G_N_ELEMENTS(xen_compat_props));
> > >  }
> > >
> > >  #define TYPE_XEN_ACCEL ACCEL_CLASS_NAME("xen")
> > > diff --git a/qom/object.c b/qom/object.c
> > > index 17921c0a71..dbdab0aead 100644
> > > --- a/qom/object.c
> > > +++ b/qom/object.c
> > > @@ -370,6 +370,31 @@ static void object_post_init_with_type(Object *obj, 
> > > TypeImpl *ti)
> > >  }
> > >  }
> > >
> > > +void object_apply_global_props(Object *obj, const GPtrArray *props, 
> > > Error **errp)
> > > +{
> > > +Error *err = NULL;
> > > +int i;
> > > +
> > > +if (!props) {
> > > +return;
> > > +}
> > > +
> > > +for (i = 0; i < props->len; i++) {
> > > +GlobalProperty *p = g_ptr_array_index(props, i);
> > > +
> > > +if (object_dynamic_cast(obj, p->driver) == NULL) {
> > > +continue;
> > > +}
> > > +p->used = true;
> > > +object_property_parse(obj, p->value, p->property, );
> > > +if (err != NULL) {
> > > +error_prepend(, "can't apply global %s.%s=%s: ",
> > > +  p->driver, p->property, p->value);
> > > +error_propagate(errp, err);
> > > +}
> > > +}
> > > +}
> > > +
> > >  static void object_initialize_with_type(void *data, size_t size, 
> > > TypeImpl *type)
> > >  {
> > >  Object *obj = data;
> > > diff --git a/vl.c b/vl.c
> > > index a5ae5f23d2..88ba658572 100644
> > > --- a/vl.c
> > > +++ b/vl.c
> > > @@ -2968,7 +2968,6 @@ static void user_register_global_props(void)
> > >   */
> > >  static void register_global_properties(MachineState *ms)
> > >  {
> > > -accel_register_compat_props(ms->accelerator);
> > >  machine_register_compat_props(ms);
> > >  user_register_global_props();
> > >  }
> >
> >
>
>


-- 
Marc-André Lureau

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC 05/16] gic-vgic: Drop an excessive clear_lrs

2018-12-12 Thread Julien Grall



On 12/12/2018 11:01, Andrii Anisov wrote:

Hello Julien,

On 11.12.18 16:33, Julien Grall wrote:


With #ifndef NDEBUG and the appropriate comment:

Will do.


Reviewed-by: Julien Grall 

Thank you.


Feel free to resent it alone so it can be merged to Xen 4.12.
What about getting it together with "[RFC 11/16] irq: skip action avalability 
check for guest's IRQ"?


I am fine with that :).

Cheers,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 15/25] argo: implement the sendv op

2018-12-12 Thread Jan Beulich
>>> On 01.12.18 at 02:32,  wrote:
> +static void
> +argo_signal_domain(struct domain *d)
> +{
> +argo_dprintk("signalling domid:%d\n", d->domain_id);
> +
> +if ( !d->argo ) /* This can happen if the domain is being destroyed */
> +return;

If such a precaution is necessary, how is it guaranteed that
the pointer doesn't change to NULL between the check above
and ...

> +evtchn_send(d, d->argo->evtchn_port);

... the use here?

> +static int
> +argo_iov_count(XEN_GUEST_HANDLE_PARAM(argo_iov_t) iovs, uint8_t niov,
> +   uint32_t *count)
> +{
> +argo_iov_t iov;
> +uint32_t sum_iov_lens = 0;
> +int ret;
> +
> +if ( niov > ARGO_MAXIOV )
> +return -EINVAL;
> +
> +while ( niov-- )
> +{
> +ret = copy_from_guest_errno(, iovs, 1);
> +if ( ret )
> +return ret;
> +
> +/* check each to protect sum against integer overflow */
> +if ( iov.iov_len > ARGO_MAX_RING_SIZE )
> +return -EINVAL;
> +
> +sum_iov_lens += iov.iov_len;
> +
> +/*
> + * Again protect sum from integer overflow
> + * and ensure total msg size will be within bounds.
> + */
> +if ( sum_iov_lens > ARGO_MAX_MSG_SIZE )
> +return -EINVAL;

So you do overflow checks here. But how does this help when ...

> +guest_handle_add_offset(iovs, 1);
> +}
> +
> +*count = sum_iov_lens;
> +return 0;
> +}
> +
> +static int
> +argo_ringbuf_insert(struct domain *d,
> +struct argo_ring_info *ring_info,
> +const struct argo_ring_id *src_id,
> +XEN_GUEST_HANDLE_PARAM(argo_iov_t) iovs, uint8_t niov,
> +uint32_t message_type, unsigned long *out_len)
> +{
> +argo_ring_t ring;
> +struct argo_ring_message_header mh = { 0 };
> +int32_t sp;
> +int32_t ret = 0;
> +uint32_t len;
> +uint32_t iov_len;
> +uint32_t sum_iov_len = 0;
> +
> +ASSERT(spin_is_locked(_info->lock));
> +
> +if ( (ret = argo_iov_count(iovs, niov, )) )
> +return ret;
> +
> +if ( ((ARGO_ROUNDUP(len) + sizeof (struct argo_ring_message_header) ) >=
> +  ring_info->len)
> + || (len > ARGO_MAX_MSG_SIZE) )
> +return -EMSGSIZE;
> +
> +do {
> +ret =  argo_ringbuf_get_rx_ptr(ring_info, _ptr);
> +if ( ret )
> +break;
> +
> +argo_sanitize_ring(, ring_info);
> +
> +argo_dprintk("ring.tx_ptr=%d ring.rx_ptr=%d ring.len=%d"
> + " ring_info->tx_ptr=%d\n",
> + ring.tx_ptr, ring.rx_ptr, ring.len, ring_info->tx_ptr);
> +
> +if ( ring.rx_ptr == ring.tx_ptr )
> +sp = ring_info->len;
> +else
> +{
> +sp = ring.rx_ptr - ring.tx_ptr;
> +if ( sp < 0 )
> +sp += ring.len;
> +}
> +
> +if ( (ARGO_ROUNDUP(len) + sizeof(struct argo_ring_message_header)) 
> >= sp )
> +{
> +argo_dprintk("EAGAIN\n");
> +ret = -EAGAIN;
> +break;
> +}
> +
> +mh.len = len + sizeof(struct argo_ring_message_header);
> +mh.source.port = src_id->addr.port;
> +mh.source.domain_id = src_id->addr.domain_id;
> +mh.message_type = message_type;
> +
> +/*
> + * For this copy to the guest ring, tx_ptr is always 16-byte aligned
> + * and the message header is 16 bytes long.
> + */
> +BUILD_BUG_ON(sizeof(struct argo_ring_message_header) != 
> ARGO_ROUNDUP(1));
> +
> +if ( (ret = argo_memcpy_to_guest_ring(ring_info,
> +  ring.tx_ptr + 
> sizeof(argo_ring_t),
> +  ,
> +  XEN_GUEST_HANDLE_NULL(uint8_t),
> +  sizeof(mh))) )
> +break;
> +
> +ring.tx_ptr += sizeof(mh);
> +if ( ring.tx_ptr == ring_info->len )
> +ring.tx_ptr = 0;
> +
> +while ( niov-- )
> +{
> +XEN_GUEST_HANDLE_PARAM(uint8_t) bufp_hnd;
> +XEN_GUEST_HANDLE(uint8_t) buf_hnd;
> +argo_iov_t iov;
> +
> +ret = copy_from_guest_errno(, iovs, 1);

... here you copy the structure again from guest memory, at
which point it may have changed? I see you do some checks
further down, but the question then is - is the checking in
argo_iov_count() redundant and hence unnecessary? Are
you really safe here against inconsistencies between the
first and second reads? If so, a thorough explanation in a
comment is needed here.

> +if ( ret )
> +break;
> +
> +bufp_hnd = guest_handle_from_ptr((uintptr_t)iov.iov_base, 
> uint8_t);

Please use a handle in the public interface instead of such a
cast.

> +buf_hnd = guest_handle_from_param(bufp_hnd, uint8_t);
> +iov_len 

Re: [Xen-devel] Interrupt injection with ISR set on Intel hardware

2018-12-12 Thread Tian, Kevin
> From: Roger Pau Monné [mailto:roger@citrix.com]
> Sent: Wednesday, December 12, 2018 7:25 PM
> 
> On Wed, Dec 12, 2018 at 10:36:44AM +, Tian, Kevin wrote:
> > > From: Roger Pau Monné [mailto:roger@citrix.com]
> > > Sent: Monday, October 15, 2018 6:30 PM
> > > (XEN)   [22642] POWERTYPE 4
> > > (XEN)   [22643] IDLE PPR 0x0020
> > > (XEN)IRR
> > >
> 00
> > > 00
> > > (XEN)ISR
> > >
> 02
> > > 00
> > > (XEN)   [22644] WAKE PPR 0x0020
> > > (XEN)IRR
> > >
> 02
> > > 00
> > > (XEN)ISR
> > >
> 02
> > > 00
> >
> > looks pending IRR (0x21) doesn't always trigger a spurious interrupt?
> 
> Yes, that's correct. Having a pending IRR and going idle doesn't
> always trigger the spurious interrupt re-injection.
> 
> > is it a fixed pattern after how many rounds of Cstate enter/exit with
> > pending IRR(0x21) then you see assertion happened (in this example
> > it happens at 3rd time)?
> 
> It's not a fixed pattern, here's another trace with IRR(0x21) being
> pending just once during the Cstate transitions:

did you observe a case where such asset may occur when IRR(0x21)
is cleared but ISR (0x21) is set?

I want to understand whether pending ISR alone triggers such 
problem or another pending IRR together is necessary... 

> 
> (XEN) *** Pending EOI error ***
> (XEN)   cpu #1, irq 30, vector 0x21, sp 1
> (XEN) Peoi stack: sp 1
> (XEN)   [ 0] irq  30, vec 0x21, ready 0, ISR 1, TMR 0, IRR 0
> (XEN) Peoi stack trace records:
> (XEN)   [ 7886] ACK_POST PPR 0x0010
> (XEN)IRR
> 00
> 00
> (XEN)ISR
> 00
> 00
> (XEN)   [ 7887] POWERTYPE 5
> (XEN)   [ 7888] IDLE PPR 0x0010
> (XEN)IRR
> 00
> 00
> (XEN)ISR
> 00
> 00
> (XEN)   [ 7889] WAKE PPR 0x0010
> (XEN)IRR
> 00
> 04
> (XEN)ISR
> 00
> 00
> (XEN)   [ 7890] ACK_PRE  PPR 0x00f0
> (XEN)IRR
> 00
> 00
> (XEN)ISR
> 00
> 04
> (XEN)   [ 7891] ACK_POST PPR 0x0010
> (XEN)IRR
> 00
> 00
> (XEN)ISR
> 00
> 00
> (XEN)   [ 7892] POWERTYPE 5
> (XEN)   [ 7893] IDLE PPR 0x0010
> (XEN)IRR
> 00
> 00
> (XEN)ISR
> 00
> 00
> (XEN)   [ 7894] WAKE PPR 0x0010
> (XEN)IRR
> 00
> 04
> (XEN)ISR
> 00
> 00
> (XEN)   [ 7895] ACK_PRE  PPR 0x00f0
> (XEN)IRR
> 00
> 00
> (XEN)ISR
> 00
> 04
> (XEN)   [ 7896] ACK_POST PPR 0x0010
> (XEN)IRR
> 00
> 00
> (XEN)ISR
> 00
> 00
> (XEN)   [ 7897] POWERTYPE 5
> (XEN)   [ 7898] IDLE PPR 0x0010
> (XEN)IRR
> 00
> 00
> (XEN)ISR
> 00
> 00
> (XEN)   [ 7899] WAKE PPR 0x0010
> (XEN)IRR
> 00
> 04
> (XEN)ISR
> 00
> 00
> (XEN)   [ 7900] ACK_PRE  PPR 0x00f0
> (XEN)IRR
> 00
> 00
> (XEN)ISR
> 00
> 04
> (XEN)   [ 7901] ACK_POST PPR 0x0010
> (XEN)IRR
> 00
> 00
> (XEN)ISR
> 

Re: [Xen-devel] [RFC 11/16] irq: skip action avalability check for guest's IRQ

2018-12-12 Thread Andrii Anisov



On 11.12.18 16:48, Julien Grall wrote:

And you can't see any potential race in that code happening in the future?

It is protected with `desc->lock` so far. If one decided to get it from under 
the lock, the race is possible with `release_irq()`.


Also getting an unknown
interrupt is very unlikely on a non-debug platform.


I am tempted to keep the code at the same place but protect with an #ifndef 
NDEBUG. What do you think?

Well, I think about a correspondent ASSERT for the guest IRQ. Like following:


  if ( test_bit(_IRQ_GUEST, >status) )
  {>>   struct irq_guest *info = irq_get_guest_info(desc);

+ASSERT( desc->action != NULL );

What would you prefer?

--
Sincerely,
Andrii Anisov.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Interrupt injection with ISR set on Intel hardware

2018-12-12 Thread Roger Pau Monné
On Wed, Dec 12, 2018 at 10:36:44AM +, Tian, Kevin wrote:
> > From: Roger Pau Monné [mailto:roger@citrix.com]
> > Sent: Monday, October 15, 2018 6:30 PM
> > (XEN)   [22642] POWERTYPE 4
> > (XEN)   [22643] IDLE PPR 0x0020
> > (XEN)IRR
> > 00
> > 00
> > (XEN)ISR
> > 02
> > 00
> > (XEN)   [22644] WAKE PPR 0x0020
> > (XEN)IRR
> > 02
> > 00
> > (XEN)ISR
> > 02
> > 00
> 
> looks pending IRR (0x21) doesn't always trigger a spurious interrupt?

Yes, that's correct. Having a pending IRR and going idle doesn't
always trigger the spurious interrupt re-injection.

> is it a fixed pattern after how many rounds of Cstate enter/exit with
> pending IRR(0x21) then you see assertion happened (in this example
> it happens at 3rd time)?

It's not a fixed pattern, here's another trace with IRR(0x21) being
pending just once during the Cstate transitions:

(XEN) *** Pending EOI error ***
(XEN)   cpu #1, irq 30, vector 0x21, sp 1
(XEN) Peoi stack: sp 1
(XEN)   [ 0] irq  30, vec 0x21, ready 0, ISR 1, TMR 0, IRR 0
(XEN) Peoi stack trace records:
(XEN)   [ 7886] ACK_POST PPR 0x0010
(XEN)IRR 

(XEN)ISR 

(XEN)   [ 7887] POWERTYPE 5
(XEN)   [ 7888] IDLE PPR 0x0010
(XEN)IRR 

(XEN)ISR 

(XEN)   [ 7889] WAKE PPR 0x0010
(XEN)IRR 
0004
(XEN)ISR 

(XEN)   [ 7890] ACK_PRE  PPR 0x00f0
(XEN)IRR 

(XEN)ISR 
0004
(XEN)   [ 7891] ACK_POST PPR 0x0010
(XEN)IRR 

(XEN)ISR 

(XEN)   [ 7892] POWERTYPE 5
(XEN)   [ 7893] IDLE PPR 0x0010
(XEN)IRR 

(XEN)ISR 

(XEN)   [ 7894] WAKE PPR 0x0010
(XEN)IRR 
0004
(XEN)ISR 

(XEN)   [ 7895] ACK_PRE  PPR 0x00f0
(XEN)IRR 

(XEN)ISR 
0004
(XEN)   [ 7896] ACK_POST PPR 0x0010
(XEN)IRR 

(XEN)ISR 

(XEN)   [ 7897] POWERTYPE 5
(XEN)   [ 7898] IDLE PPR 0x0010
(XEN)IRR 

(XEN)ISR 

(XEN)   [ 7899] WAKE PPR 0x0010
(XEN)IRR 
0004
(XEN)ISR 

(XEN)   [ 7900] ACK_PRE  PPR 0x00f0
(XEN)IRR 

(XEN)ISR 
0004
(XEN)   [ 7901] ACK_POST PPR 0x0010
(XEN)IRR 

(XEN)ISR 

(XEN)   [ 7902] POWERTYPE 5
(XEN)   [ 7903] IDLE PPR 0x0010
(XEN)IRR 

(XEN)ISR 

(XEN)   [ 7904] WAKE PPR 0x0010
(XEN)IRR 

(XEN)ISR 

(XEN)   [ 7905] PUSH {sp  0, irq  

Re: [Xen-devel] [PATCH v2 2/2] x86/dom0: improve paging memory usage calculations

2018-12-12 Thread Roger Pau Monné
On Wed, Dec 12, 2018 at 12:14:53PM +0100, Roger Pau Monné wrote:
> On Wed, Dec 12, 2018 at 03:57:41AM -0700, Jan Beulich wrote:
> > >>> On 12.12.18 at 11:16,  wrote:
> > > There are also further issues that I wanted to discuss in a separate
> > > thread, what about foreign mappings? Dom0 will likely map a non
> > > trivial amount of grants and foreign mappings, which will also require
> > > p2m/IOMMU page table entries.
> > 
> > Hmm, good point. Then again this is a runtime requirement,
> > whereas here we want to get the boot time estimate right. At
> > runtime lack of memory for P2M tables will simply result in
> > -ENOMEM.
> 
> But Xen runtime memory is also tied to the boot estimates if there's
> no dom0_mem parameter specified on the command line. I would expect
 ^ wouldn't
> Dom0 to balloon down memory when it attempts to map BARs, even at
> runtime.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

  1   2   >