Re: [Xen-devel] [PATCH] xen: xen-pciback: Reset MSI-X state when exposing a device
>>> On 13.12.18 at 04:46, wrote: > On Wed, Dec 12, 2018 at 08:21:39AM -0700, Jan Beulich wrote: > On 12.12.18 at 16:18, wrote: >>> On Wed, Dec 12, 2018 at 01:51:01AM -0700, Jan Beulich wrote: >>> On 12.12.18 at 08:06, wrote: > On Wed, Dec 05, 2018 at 09:01:33AM -0500, Boris Ostrovsky wrote: >>On 12/5/18 4:32 AM, Roger Pau Monné wrote: >>> On Wed, Dec 05, 2018 at 10:19:17AM +0800, Chao Gao wrote: I find some pass-thru devices don't work any more across guest reboot. Assigning it to another guest also meets the same issue. And the only way to make it work again is un-binding and binding it to pciback. Someone reported this issue one year ago [1]. More detail also can be found in [2]. The root-cause is Xen's internal MSI-X state isn't reset properly during reboot or re-assignment. In the above case, Xen set maskall bit to mask all MSI interrupts after it detected a potential security issue. Even after device reset, Xen didn't reset its internal maskall bit. As a result, maskall bit would be set again in next write to MSI-X message control register. Given that PHYSDEVOPS_prepare_msix() also triggers Xen resetting MSI-X internal state of a device, we employ it to fix this issue rather than introducing another dedicated sub-hypercall. Note that PHYSDEVOPS_release_msix() will fail if the mapping between the device's msix and pirq has been created. This limitation prevents us calling this function when detaching a device from a guest during guest shutdown. Thus it is called right before calling PHYSDEVOPS_prepare_msix(). >>> s/PHYSDEVOPS/PHYSDEVOP/ (no final S). And then I would also drop the >>> () at the end of the hypercall name since it's not a function. >>> >>> I'm also wondering why the release can't be done when the device is >>> detached from the guest (or the guest has been shut down). This makes >>> me worry about the raciness of the attach/detach procedure: if there's >>> a state where pciback assumes the device has been detached from the >>> guest, but there are still pirqs bound, an attempt to attach to >>> another guest in such state will fail. >> >>I wonder whether this additional reset functionality could be done out >>of xen_pcibk_xenbus_remove(). We first do a (best effort) device reset >>and then do the extra things that are not properly done there. > > No. It cannot be done in xen_pcibk_xenbus_remove() without modifying > the handler of PHYSDEVOP_release_msix. To do a successful Xen internal > MSI-X state reset, PHYSDEVOP_{release, prepare}_msix should be finished > without error. But ATM, xen expects that no msi is bound to pirq when > doing PHYSDEVOP_release_msix. Otherwise it fails with error code -EBUSY. > However, the expectation isn't guaranteed in xen_pcibk_xenbus_remove(). > In some cases, if qemu fails to unmap MSIs, MSIs are unmapped by Xen > at last minute, which happens after device reset in > xen_pcibk_xenbus_remove(). But that may need taking care of: I don't think it is a good idea to have anything left from the prior owning domain when the device gets reset. I.e. left over IRQ bindings should perhaps be forcibly cleared before invoking the reset; >>> >>> Agree. How about pciback to track the established IRQ bindings? Then >>> pciback can clear irq binding before invoking the reset. >> >>How would pciback even know of those mappings, when it's qemu >>who establishes (and manages) them? > > I meant to expose some interfaces from pciback. And pciback serves > as the proxy of IRQ (un)binding APIs. If at all possible we should avoid having to change more parties (qemu, libxc, kernel, hypervisor) than really necessary. Remember that such a bug fix may want backporting, and making sure affected people have all relevant components updated is increasingly difficult with their number growing. in fact I'd expect this to happen in the course of domain destruction, and I'd expect the device reset to come after the domain was cleaned up. Perhaps simply an ordering issue in the tool stack? >>> >>> I don't think reversing the sequences of device reset and domain >>> destruction would be simple. Furthermore, during device hot-unplug, >>> device reset is done when the owner is alive. So if we use domain >>> destruction to enforce all irq binding cleared, in theory, it won't be >>> applicable to hot-unplug case (if qemu's hot-unplug logic is >>> compromised). >> >>Even in the hot-unplug case the tool stack could issue unbind >>requests, behind the back of the possibly compromised qemu, >>once neither the guest nor qemu have access to the device >>anymore. > > But currently, tool stack doesn't know the remaining IRQ bindings. > If tool stack can
Re: [Xen-devel] [PATCH v2 1/2] x86/dom0: rename paging function
>>> On 12.12.18 at 18:05, wrote: > On Wed, Dec 12, 2018 at 09:15:09AM -0700, Jan Beulich wrote: >> The MMIO side of things of course still remains unclear. > > Right, for the MMIO and the handling of grant and foreign mappings it's > not clear how we want to proceed. > > Maybe account for all host RAM (total_pages) plus MMIO BARs? Well, I thought we've already settled on it being impossible to account for all MMIO BARs at this point. >> What I don't understand in any case though is >> "PAGE_SIZE << PAGE_ORDER_4K". This is x86 code - why not >> just PAGE_SIZE? > > Oh, I've done it like that because this is related to p2m code, which > uses this way to get the page size. IIRC you told me to use this for > things like pvh_setup_e820. I don't mind switching to just PAGE_SIZE. Oh, I see. It's fine either way then. My general way of thinking here is that outside of x86 code we better use these PAGE_ORDER_* values, while in x86 specific code I don't see the point. But indeed the p2m code is littered with them. Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [freebsd-master test] 131265: all pass - PUSHED
flight 131265 freebsd-master real [real] http://logs.test-lab.xenproject.org/osstest/logs/131265/ Perfect :-) All tests in this flight passed as required version targeted for testing: freebsd 25cda747b021a1be758d3d1b128a670a23841f7f baseline version: freebsd f71d2bdf0c00378411ab60ec1ab76196b920e666 Last test of basis 131205 2018-12-10 09:21:49 Z2 days Testing same since 131265 2018-12-12 09:19:37 Z0 days1 attempts People who touched revisions under test: ae andrew arybchik avos bz cem cy dab delphij dim emaste eugen hselasky imp jhb jhibbits kib kp luporl markj mav mckusick mjg shurd yuripv jobs: build-amd64-freebsd-againpass build-amd64-freebsd pass build-amd64-xen-freebsd pass sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Pushing revision : To xenbits.xen.org:/home/xen/git/freebsd.git f71d2bdf0c0..25cda747b02 25cda747b021a1be758d3d1b128a670a23841f7f -> tested/master ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] x86/VT-x: Don't activate VMCS Shadowing outside of nested vmx mode
> From: Andrew Cooper [mailto:andrew.coop...@citrix.com] > Sent: Saturday, December 8, 2018 4:07 AM > nested vmx mode > > By default on capable hardware, > SECONDARY_EXEC_ENABLE_VMCS_SHADOWING is > activated unilaterally. The VMCS Link pointer is initialised to ~0, but the > VMREAD/VMWRITE bitmap pointers are not. > > This causes the 16bit IVT and Bios Data Area get interpreted as the > read/write > permission bitmap for guests which blindly execute VMREAD/VMWRITE > instructions. > > This is not a security issue because the VMCS Link pointer being ~0 causes > VMREAD/VMWRITE to complete with VMFailInvalid (rather than modifying > a > potential shadow VMCS), and the contents of MFN 0 has already been > determined > not to contain any interesting data because of L1TF's ability to read that 4k > frame. > > Leave VMCS Shadowing disabled by default, and toggle it in > nvmx_{set,clear}_vmcs_pointer(). This isn't the most efficient course of > action, but it is the most simple way of leaving nested-virt working as it did > before. > > While editing construct_vmcs(), collect all default secondary_exec_control > modifications together. The disabling of PML is latently buggy because it > happens after secondary_exec_control are written into the VMCS, although > there > is an unconditional update later which writes the correct value into > hardware. > > Signed-off-by: Andrew Cooper Acked-by: Kevin Tian ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v4 3/4] iommu: elide flushing for higher order map/unmap operations
> From: Paul Durrant [mailto:paul.durr...@citrix.com] > Sent: Thursday, December 6, 2018 11:34 PM > > This patch removes any implicit flushing that occurs in the implementation > of map and unmap operations and adds new iommu_map/unmap() > wrapper > functions. To maintain sematics of the iommu_legacy_map/unmap() > wrapper > functions, these are modified to call the new wrapper functions and then > perform an explicit flush operation. > > Because VT-d currently performs two different types of flush dependent > upon > whether a PTE is being modified versus merely added (i.e. replacing a non- > present PTE) 'iommu flush flags' are defined by this patch and the > iommu_ops map_page() and unmap_page() methods are modified to OR > the type > of flush necessary for the PTE that has been populated or depopulated into > an accumulated flags value. The accumulated value can then be passed into > the explicit flush operation. > > The ARM SMMU implementations of map_page() and unmap_page() > currently > perform no implicit flushing and therefore the modified methods do not > adjust the flush flags. > > NOTE: The per-cpu 'iommu_dont_flush_iotlb' is respected by the > iommu_legacy_map/unmap() wrapper functions and therefore this now > applies to all IOMMU implementations rather than just VT-d. > > Signed-off-by: Paul Durrant Reviewed-by: Kevin Tian ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v4 2/4] iommu: rename wrapper functions
> From: Paul Durrant [mailto:paul.durr...@citrix.com] > Sent: Thursday, December 6, 2018 11:34 PM > > A subsequent patch will add semantically different versions of > iommu_map/unmap() so, in advance of that change, this patch renames > the > existing functions to iommu_legacy_map/unmap() and modifies all call- > sites. > It also adjusts a comment that refers to iommu_map_page(), which was re- > named by a previous patch. > > This patch is purely cosmetic. No functional change. > > Signed-off-by: Paul Durrant > Acked-by: Jan Beulich A Reviewed-by: Kevin Tian ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 2/2] x86/hvm: Corrections to RDTSCP intercept handling
> From: Andrew Cooper [mailto:andrew.coop...@citrix.com] > Sent: Saturday, December 1, 2018 1:07 AM > > For both VT-x and SVM, the RDTSCP intercept will trigger if the pipeline > supports the instruction, but the guest may have not have rdtscp in its > featureset. Bring the vmexit handlers in line with the main emulator > behaviour by optionally handing back #UD. > > Next on the AMD side, if RDTSCP actually ends up being intercepted on a > debug > build, we first update regs->rcx, then call __get_instruction_length() asking > for RDTSC. As the two instructions are different (and indeed, different > lengths!), __get_instruction_length_from_list() fails and hands back a #GP > fault. > > This can demonstrated by putting a guest into tsc_mode="always emulate" > and > executing an rdtscp instruction: > > (d1) --- Xen Test Framework --- > (d1) Environment: HVM 64bit (Long mode 4 levels) > (d1) Test rdtscp > (d1) TSC mode 1 > (XEN) emulate.c:159:d1v0 __get_instruction_length_from_list: Mismatch > between expected and actual instruction: > (XEN) emulate.c:163:d1v0 list[0] val 8, { opc 0xf0031, modrm 0 }, list > entries: 1 > (XEN) emulate.c:165:d1v0 rip 0x10475f, nextrip 0x104762, len 3 > (XEN) Insn_len emulation failed (1): d1v0 64bit @ 0008:0010475f -> 0f 01 > f9 5b 31 ff 31 c0 e9 c4 db ff ff 00 00 00 > (d1) ** > (d1) PANIC: Unhandled exception at 0008:0010475f > (d1) Vec 13 #GP[] > (d1) ** > > First, teach __get_instruction_length() to cope with RDTSCP, and improve > svm_vmexit_do_rdtsc() to ask for the correct instruction. Move the regs- > >rcx > adjustment into this function to ensure it gets done after we are done > potentially raising faults. > > Reported-by: Paul Durrant > Signed-off-by: Andrew Cooper Reviewed-by: Kevin Tian ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [linux-4.19 test] 131255: regressions - FAIL
flight 131255 linux-4.19 real [real] http://logs.test-lab.xenproject.org/osstest/logs/131255/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsm 7 xen-boot fail REGR. vs. 129313 test-amd64-amd64-xl-qemuu-debianhvm-amd64-xsm 7 xen-boot fail REGR. vs. 129313 test-amd64-amd64-amd64-pvgrub 7 xen-bootfail REGR. vs. 129313 test-amd64-amd64-xl-qemuu-dmrestrict-amd64-dmrestrict 7 xen-boot fail REGR. vs. 129313 test-amd64-amd64-i386-pvgrub 7 xen-boot fail REGR. vs. 129313 test-amd64-amd64-libvirt-pair 10 xen-boot/src_host fail REGR. vs. 129313 test-amd64-amd64-libvirt-pair 11 xen-boot/dst_host fail REGR. vs. 129313 test-amd64-i386-examine 8 reboot fail REGR. vs. 129313 test-amd64-amd64-pair10 xen-boot/src_hostfail REGR. vs. 129313 test-amd64-amd64-pair11 xen-boot/dst_hostfail REGR. vs. 129313 test-amd64-i386-pair 10 xen-boot/src_hostfail REGR. vs. 129313 test-amd64-i386-pair 11 xen-boot/dst_hostfail REGR. vs. 129313 test-amd64-i386-xl-raw7 xen-boot fail REGR. vs. 129313 test-amd64-i386-qemut-rhel6hvm-intel 7 xen-boot fail REGR. vs. 129313 test-amd64-amd64-examine 8 reboot fail REGR. vs. 129313 test-amd64-i386-qemut-rhel6hvm-amd 12 guest-start/redhat.repeat fail REGR. vs. 129313 Tests which are failing intermittently (not blocking): test-amd64-i386-xl-qemut-debianhvm-amd64-xsm 7 xen-boot fail pass in 131208 test-amd64-i386-libvirt-pair 10 xen-boot/src_host fail pass in 131208 test-amd64-i386-libvirt-pair 11 xen-boot/dst_host fail pass in 131208 test-amd64-amd64-libvirt-vhd 17 guest-start/debian.repeat fail pass in 131208 Regressions which are regarded as allowable (not blocking): test-amd64-amd64-rumprun-amd64 17 rumprun-demo-xenstorels/xenstorels.repeat fail REGR. vs. 129313 Tests which did not succeed, but are not blocking: test-amd64-i386-xl-qemuu-debianhvm-amd64-xsm 7 xen-boot fail never pass test-amd64-i386-xl-pvshim12 guest-start fail never pass test-amd64-i386-libvirt 13 migrate-support-checkfail never pass test-amd64-i386-libvirt-xsm 13 migrate-support-checkfail never pass test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-armhf-armhf-xl-arndale 13 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 14 saverestore-support-checkfail never pass test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2 fail never pass test-amd64-amd64-xl-qemut-win7-amd64 17 guest-stop fail never pass test-amd64-amd64-libvirt-vhd 12 migrate-support-checkfail never pass test-amd64-amd64-libvirt 13 migrate-support-checkfail never pass test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stop fail never pass test-armhf-armhf-xl-credit1 13 migrate-support-checkfail never pass test-armhf-armhf-xl-credit1 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit2 13 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 14 saverestore-support-checkfail never pass test-armhf-armhf-xl 13 migrate-support-checkfail never pass test-armhf-armhf-xl 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-rtds 13 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-cubietruck 13 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 14 saverestore-support-checkfail never pass test-armhf-armhf-libvirt 13 migrate-support-checkfail never pass test-armhf-armhf-libvirt 14 saverestore-support-checkfail never pass test-amd64-i386-xl-qemut-ws16-amd64 17 guest-stop fail never pass test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop fail never pass test-amd64-amd64-xl-qemuu-ws16-amd64 17 guest-stop fail never pass test-amd64-i386-xl-qemut-win7-amd64 17 guest-stop fail never pass test-amd64-i386-xl-qemuu-ws16-amd64 17 guest-stop fail never pass test-armhf-armhf-libvirt-raw 12 migrate-support-checkfail never pass test-armhf-armhf-libvirt-raw 13 saverestore-support-checkfail never pass test-armhf-armhf-xl-multivcpu 13 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-vhd 12 migrate-support-checkfail never
[Xen-devel] [libvirt test] 131256: regressions - FAIL
flight 131256 libvirt real [real] http://logs.test-lab.xenproject.org/osstest/logs/131256/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: build-amd64-libvirt 6 libvirt-buildfail REGR. vs. 131219 test-armhf-armhf-libvirt16 guest-start/debian.repeat fail REGR. vs. 131219 Tests which did not succeed, but are not blocking: test-amd64-amd64-libvirt-vhd 1 build-check(1) blocked n/a test-amd64-amd64-libvirt 1 build-check(1) blocked n/a test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked n/a test-amd64-amd64-libvirt-xsm 1 build-check(1) blocked n/a test-amd64-amd64-libvirt-pair 1 build-check(1) blocked n/a test-armhf-armhf-libvirt-raw 13 saverestore-support-checkfail like 131219 test-armhf-armhf-libvirt 14 saverestore-support-checkfail like 131219 test-amd64-i386-libvirt-xsm 13 migrate-support-checkfail never pass test-amd64-i386-libvirt 13 migrate-support-checkfail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-armhf-armhf-libvirt-raw 12 migrate-support-checkfail never pass test-armhf-armhf-libvirt 13 migrate-support-checkfail never pass version targeted for testing: libvirt daf4e2abca914fd8340d2243e2cb070a43dd7149 baseline version: libvirt fa30ee04a2a7205c3d664c67b88dd8df9cb1fb40 Last test of basis 131219 2018-12-11 04:19:05 Z2 days Testing same since 131256 2018-12-12 04:19:19 Z1 days1 attempts People who touched revisions under test: Daniel P. Berrangé Erik Skultety Julio Faracco Michal Privoznik Nikolay Shirokovskiy jobs: build-amd64-xsm pass build-i386-xsm pass build-amd64 pass build-armhf pass build-i386 pass build-amd64-libvirt fail build-armhf-libvirt pass build-i386-libvirt pass build-amd64-pvopspass build-armhf-pvopspass build-i386-pvops pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm blocked test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsmpass test-amd64-amd64-libvirt-xsm blocked test-amd64-i386-libvirt-xsm pass test-amd64-amd64-libvirt blocked test-armhf-armhf-libvirt fail test-amd64-i386-libvirt pass test-amd64-amd64-libvirt-pairblocked test-amd64-i386-libvirt-pair pass test-armhf-armhf-libvirt-raw pass test-amd64-amd64-libvirt-vhd blocked sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Not pushing. commit daf4e2abca914fd8340d2243e2cb070a43dd7149 Author: Julio Faracco Date: Fri Nov 30 20:43:37 2018 +0800 tests: Adding test case to include multiple network definitions. This commit includes a test case for multiple network definitions. It is useful right now, but it will be more useful when the index used by LXC version 3.X is implemented to support this new settings. The version 3.X is using indexes to specify each network settings. Signed-off-by: Julio Faracco ACKed-by: Michal Privoznik commit 53762677a84c7bf124714d9e607da8c115084775 Author: Julio Faracco Date: Fri Nov 30 20:43:36 2018 +0800 lxc: Initializing IPv6 and IPv4 gateway to overwrite old settings. This commit fixes a bug when you have multiple network settings defined. Basically, if you set an IPv6 or IPv4 gateway, it carries on next network settings. It is
Re: [Xen-devel] [PATCH] xen: xen-pciback: Reset MSI-X state when exposing a device
On Wed, Dec 12, 2018 at 08:21:39AM -0700, Jan Beulich wrote: On 12.12.18 at 16:18, wrote: >> On Wed, Dec 12, 2018 at 01:51:01AM -0700, Jan Beulich wrote: >> On 12.12.18 at 08:06, wrote: On Wed, Dec 05, 2018 at 09:01:33AM -0500, Boris Ostrovsky wrote: >On 12/5/18 4:32 AM, Roger Pau Monné wrote: >> On Wed, Dec 05, 2018 at 10:19:17AM +0800, Chao Gao wrote: >>> I find some pass-thru devices don't work any more across guest reboot. >>> Assigning it to another guest also meets the same issue. And the only >>> way to make it work again is un-binding and binding it to pciback. >>> Someone reported this issue one year ago [1]. More detail also can be >>> found in [2]. >>> >>> The root-cause is Xen's internal MSI-X state isn't reset properly >>> during reboot or re-assignment. In the above case, Xen set maskall bit >>> to mask all MSI interrupts after it detected a potential security >>> issue. Even after device reset, Xen didn't reset its internal maskall >>> bit. As a result, maskall bit would be set again in next write to >>> MSI-X message control register. >>> >>> Given that PHYSDEVOPS_prepare_msix() also triggers Xen resetting MSI-X >>> internal state of a device, we employ it to fix this issue rather than >>> introducing another dedicated sub-hypercall. >>> >>> Note that PHYSDEVOPS_release_msix() will fail if the mapping between >>> the device's msix and pirq has been created. This limitation prevents >>> us calling this function when detaching a device from a guest during >>> guest shutdown. Thus it is called right before calling >>> PHYSDEVOPS_prepare_msix(). >> s/PHYSDEVOPS/PHYSDEVOP/ (no final S). And then I would also drop the >> () at the end of the hypercall name since it's not a function. >> >> I'm also wondering why the release can't be done when the device is >> detached from the guest (or the guest has been shut down). This makes >> me worry about the raciness of the attach/detach procedure: if there's >> a state where pciback assumes the device has been detached from the >> guest, but there are still pirqs bound, an attempt to attach to >> another guest in such state will fail. > >I wonder whether this additional reset functionality could be done out >of xen_pcibk_xenbus_remove(). We first do a (best effort) device reset >and then do the extra things that are not properly done there. No. It cannot be done in xen_pcibk_xenbus_remove() without modifying the handler of PHYSDEVOP_release_msix. To do a successful Xen internal MSI-X state reset, PHYSDEVOP_{release, prepare}_msix should be finished without error. But ATM, xen expects that no msi is bound to pirq when doing PHYSDEVOP_release_msix. Otherwise it fails with error code -EBUSY. However, the expectation isn't guaranteed in xen_pcibk_xenbus_remove(). In some cases, if qemu fails to unmap MSIs, MSIs are unmapped by Xen at last minute, which happens after device reset in xen_pcibk_xenbus_remove(). >>> >>>But that may need taking care of: I don't think it is a good idea to have >>>anything left from the prior owning domain when the device gets reset. >>>I.e. left over IRQ bindings should perhaps be forcibly cleared before >>>invoking the reset; >> >> Agree. How about pciback to track the established IRQ bindings? Then >> pciback can clear irq binding before invoking the reset. > >How would pciback even know of those mappings, when it's qemu >who establishes (and manages) them? I meant to expose some interfaces from pciback. And pciback serves as the proxy of IRQ (un)binding APIs. > >>>in fact I'd expect this to happen in the course of >>>domain destruction, and I'd expect the device reset to come after the >>>domain was cleaned up. Perhaps simply an ordering issue in the tool >>>stack? >> >> I don't think reversing the sequences of device reset and domain >> destruction would be simple. Furthermore, during device hot-unplug, >> device reset is done when the owner is alive. So if we use domain >> destruction to enforce all irq binding cleared, in theory, it won't be >> applicable to hot-unplug case (if qemu's hot-unplug logic is >> compromised). > >Even in the hot-unplug case the tool stack could issue unbind >requests, behind the back of the possibly compromised qemu, >once neither the guest nor qemu have access to the device >anymore. But currently, tool stack doesn't know the remaining IRQ bindings. If tool stack can maintaine IRQ binding information of a pass-thru device (stored in Xenstore?), we can come up with a clean solution without modifying linux kernel and Xen. Thanks Chao ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Interrupt injection with ISR set on Intel hardware
btw can you also capture ISR/IRR/PPR right before local_irq_enable()? though I didn't see a reason why code in-between may impact those bits, it doesn't hurt to capture the context right before interrupt is raised. :-) > -Original Message- > From: Tian, Kevin > Sent: Thursday, December 13, 2018 9:28 AM > To: Roger Pau Monné > Cc: xen-devel@lists.xenproject.org; Wei Liu ; Jan > Beulich ; Andrew Cooper > ; Nakajima, Jun > Subject: RE: Interrupt injection with ISR set on Intel hardware > > > From: Roger Pau Monné [mailto:roger@citrix.com] > > Sent: Wednesday, December 12, 2018 8:18 PM > > > > On Wed, Dec 12, 2018 at 11:48:52AM +, Tian, Kevin wrote: > > > > From: Roger Pau Monné [mailto:roger@citrix.com] > > > > Sent: Wednesday, December 12, 2018 7:25 PM > > > > > > > > On Wed, Dec 12, 2018 at 10:36:44AM +, Tian, Kevin wrote: > > > > > > From: Roger Pau Monné [mailto:roger@citrix.com] > > > > > > Sent: Monday, October 15, 2018 6:30 PM > > > > > > (XEN) [22642] POWERTYPE 4 > > > > > > (XEN) [22643] IDLE PPR 0x0020 > > > > > > (XEN)IRR > > > > > > > > > > > > > 00 > > > > > > 00 > > > > > > (XEN)ISR > > > > > > > > > > > > > 02 > > > > > > 00 > > > > > > (XEN) [22644] WAKE PPR 0x0020 > > > > > > (XEN)IRR > > > > > > > > > > > > > 02 > > > > > > 00 > > > > > > (XEN)ISR > > > > > > > > > > > > > 02 > > > > > > 00 > > > > > > > > > > looks pending IRR (0x21) doesn't always trigger a spurious interrupt? > > > > > > > > Yes, that's correct. Having a pending IRR and going idle doesn't > > > > always trigger the spurious interrupt re-injection. > > > > > > > > > is it a fixed pattern after how many rounds of Cstate enter/exit with > > > > > pending IRR(0x21) then you see assertion happened (in this example > > > > > it happens at 3rd time)? > > > > > > > > It's not a fixed pattern, here's another trace with IRR(0x21) being > > > > pending just once during the Cstate transitions: > > > > > > did you observe a case where such asset may occur when IRR(0x21) > > > is cleared but ISR (0x21) is set? > > > > No, I've always seen both ISR and IRR set when the interrupt injection > > happens. This of course doesn't mean it's not possible, but I have not > > seen any trace with ISR(0x21) set and IRR(0x21) clear. > > > > sorry but let me double confirm. You always see ISR[21]/IRR[21] being > set "before and after entering C3" to hit the problem, right? When > interrupt injection happens later, ISR[21] is set but IRR[21] is cleared (as > expected for normal interrupt delivery process). > > btw I checked your original mail: > > (XEN)[] mwait-idle.c#mwait_idle+0x2a5/0x381 > xen/arch/x86/cpu/mwait-idle.c:802 > >788if (cpu_is_haltable(cpu)) >789mwait_idle_with_hints(eax, > MWAIT_ECX_INTERRUPT_BREAK); >790 >791after = cpuidle_get_tick(); >792 >793cstate_restore_tsc(); >794trace_exit_reason(irq_traced); >795TRACE_6D(TRC_PM_IDLE_EXIT, cx->type, after, >796irq_traced[0], irq_traced[1], irq_traced[2], > irq_traced[3]); >797 >798/* Now back in C0. */ >799update_idle_stats(power, cx, before, after); >800local_irq_enable(); >801 > -> 802if (!(lapic_timer_reliable_states & (1 << cstate))) >803lapic_timer_on(); >804 >805sched_tick_resume(); >806cpufreq_dbs_timer_resume(); > > Looks above code is different from staging: > > acpi_processor_idle: > acpi_idle_do_entry: > acpi_processor_ffh_cstate_enter: > mwait_idle_with_hints > > there is no mwait_idle alone. and even with compiler optimization I didn't > find code sequence like above... > > Thanks > Kevin ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v4 6/6] x86/microcode: Synchronize late microcode loading
On 12/12/18 2:26 AM, Jan Beulich wrote: On 11.12.18 at 19:16, wrote: >> BTW: Apart from the fact its ugly and take a lng time to complete, do you >> have any practical isssues you want to highlight? maybe that can >> help upstream as well. > The situation for a kernel and a hypervisor might be different here > (but in the Linux case may then be more hypervisor-like when > considering KVM): The hypervisor needs to make sure in particular > time management within guests won't break. Stopping the > machine for an extended period of time may not be helpful there. > For processes in an OS the constraints might not be as tight, but > I could imaging problems even there in some less common cases. > > That said, I'm not convinced at all it is a good idea to load new > ucode while _any_ guests are running, but we're talking about a > last resort approach here anyway. > BTW, one thing I meant to mention about this patch earlier: we observed that updating microcode from only one sibling resulted in guest reading stale version value from another thread. This causes at least some Windows versions to crash. We ended up executing "wrmsr(0x8b, 0); cpuid_eax(1);" on all threads after the update. (Another alternative could be to intercept the reads) -boris ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [xen-unstable-smoke test] 131284: tolerable all pass - PUSHED
flight 131284 xen-unstable-smoke real [real] http://logs.test-lab.xenproject.org/osstest/logs/131284/ Failures :-/ but no regressions. Tests which did not succeed, but are not blocking: test-amd64-amd64-libvirt 13 migrate-support-checkfail never pass test-armhf-armhf-xl 13 migrate-support-checkfail never pass test-armhf-armhf-xl 14 saverestore-support-checkfail never pass version targeted for testing: xen 00c96d77422a4b84247bec5dadf434363d312cac baseline version: xen 9c35572f359cd6f71aa20b0991c74e032b8721d2 Last test of basis 131274 2018-12-12 17:00:36 Z0 days Testing same since 131284 2018-12-13 00:00:41 Z0 days1 attempts People who touched revisions under test: Julien Grall Matthew Daley Shameer Kolothum jobs: build-amd64 pass build-armhf pass build-amd64-libvirt pass test-armhf-armhf-xl pass test-amd64-amd64-xl-qemuu-debianhvm-i386 pass test-amd64-amd64-libvirt pass sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Pushing revision : To xenbits.xen.org:/home/xen/git/xen.git 9c35572f35..00c96d7742 00c96d77422a4b84247bec5dadf434363d312cac -> smoke ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Interrupt injection with ISR set on Intel hardware
> From: Roger Pau Monné [mailto:roger@citrix.com] > Sent: Wednesday, December 12, 2018 8:18 PM > > On Wed, Dec 12, 2018 at 11:48:52AM +, Tian, Kevin wrote: > > > From: Roger Pau Monné [mailto:roger@citrix.com] > > > Sent: Wednesday, December 12, 2018 7:25 PM > > > > > > On Wed, Dec 12, 2018 at 10:36:44AM +, Tian, Kevin wrote: > > > > > From: Roger Pau Monné [mailto:roger@citrix.com] > > > > > Sent: Monday, October 15, 2018 6:30 PM > > > > > (XEN) [22642] POWERTYPE 4 > > > > > (XEN) [22643] IDLE PPR 0x0020 > > > > > (XEN)IRR > > > > > > > > > 00 > > > > > 00 > > > > > (XEN)ISR > > > > > > > > > 02 > > > > > 00 > > > > > (XEN) [22644] WAKE PPR 0x0020 > > > > > (XEN)IRR > > > > > > > > > 02 > > > > > 00 > > > > > (XEN)ISR > > > > > > > > > 02 > > > > > 00 > > > > > > > > looks pending IRR (0x21) doesn't always trigger a spurious interrupt? > > > > > > Yes, that's correct. Having a pending IRR and going idle doesn't > > > always trigger the spurious interrupt re-injection. > > > > > > > is it a fixed pattern after how many rounds of Cstate enter/exit with > > > > pending IRR(0x21) then you see assertion happened (in this example > > > > it happens at 3rd time)? > > > > > > It's not a fixed pattern, here's another trace with IRR(0x21) being > > > pending just once during the Cstate transitions: > > > > did you observe a case where such asset may occur when IRR(0x21) > > is cleared but ISR (0x21) is set? > > No, I've always seen both ISR and IRR set when the interrupt injection > happens. This of course doesn't mean it's not possible, but I have not > seen any trace with ISR(0x21) set and IRR(0x21) clear. > sorry but let me double confirm. You always see ISR[21]/IRR[21] being set "before and after entering C3" to hit the problem, right? When interrupt injection happens later, ISR[21] is set but IRR[21] is cleared (as expected for normal interrupt delivery process). btw I checked your original mail: (XEN)[] mwait-idle.c#mwait_idle+0x2a5/0x381 xen/arch/x86/cpu/mwait-idle.c:802 788 if (cpu_is_haltable(cpu)) 789 mwait_idle_with_hints(eax, MWAIT_ECX_INTERRUPT_BREAK); 790 791 after = cpuidle_get_tick(); 792 793 cstate_restore_tsc(); 794 trace_exit_reason(irq_traced); 795 TRACE_6D(TRC_PM_IDLE_EXIT, cx->type, after, 796 irq_traced[0], irq_traced[1], irq_traced[2], irq_traced[3]); 797 798 /* Now back in C0. */ 799 update_idle_stats(power, cx, before, after); 800 local_irq_enable(); 801 -> 802 if (!(lapic_timer_reliable_states & (1 << cstate))) 803 lapic_timer_on(); 804 805 sched_tick_resume(); 806 cpufreq_dbs_timer_resume(); Looks above code is different from staging: acpi_processor_idle: acpi_idle_do_entry: acpi_processor_ffh_cstate_enter: mwait_idle_with_hints there is no mwait_idle alone. and even with compiler optimization I didn't find code sequence like above... Thanks Kevin ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [qemu-mainline test] 131240: regressions - FAIL
flight 131240 qemu-mainline real [real] http://logs.test-lab.xenproject.org/osstest/logs/131240/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-armhf-armhf-xl-cubietruck 16 guest-start/debian.repeat fail REGR. vs. 131172 Tests which did not succeed, but are not blocking: test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stopfail like 131172 test-armhf-armhf-libvirt 14 saverestore-support-checkfail like 131172 test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop fail like 131172 test-amd64-amd64-xl-qemuu-ws16-amd64 17 guest-stopfail like 131172 test-armhf-armhf-libvirt-raw 13 saverestore-support-checkfail like 131172 test-amd64-i386-xl-pvshim12 guest-start fail never pass test-amd64-i386-libvirt-xsm 13 migrate-support-checkfail never pass test-amd64-i386-libvirt 13 migrate-support-checkfail never pass test-amd64-amd64-libvirt 13 migrate-support-checkfail never pass test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2 fail never pass test-amd64-amd64-libvirt-vhd 12 migrate-support-checkfail never pass test-armhf-armhf-xl 13 migrate-support-checkfail never pass test-armhf-armhf-xl 14 saverestore-support-checkfail never pass test-armhf-armhf-libvirt 13 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 13 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit2 13 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit1 13 migrate-support-checkfail never pass test-armhf-armhf-xl-credit1 14 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-raw 12 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 13 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-rtds 13 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-arndale 13 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-vhd 12 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 13 saverestore-support-checkfail never pass test-amd64-i386-xl-qemuu-ws16-amd64 17 guest-stop fail never pass test-amd64-amd64-xl-qemuu-win10-i386 10 windows-installfail never pass test-amd64-i386-xl-qemuu-win10-i386 10 windows-install fail never pass version targeted for testing: qemuu32a1a94dd324d33578dca1dc96d7896a0244d768 baseline version: qemuu4f818e7b7f8ecb5c166d093b8859fec2ddeca2ef Last test of basis 131172 2018-12-09 12:08:42 Z3 days Testing same since 131240 2018-12-11 17:37:01 Z1 days1 attempts People who touched revisions under test: Peter Maydell jobs: build-amd64-xsm pass build-i386-xsm pass build-amd64 pass build-armhf pass build-i386 pass build-amd64-libvirt pass build-armhf-libvirt pass build-i386-libvirt pass build-amd64-pvopspass build-armhf-pvopspass build-i386-pvops pass test-amd64-amd64-xl pass test-armhf-armhf-xl pass test-amd64-i386-xl pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsmpass test-amd64-amd64-xl-qemuu-debianhvm-amd64-xsmpass test-amd64-i386-xl-qemuu-debianhvm-amd64-xsm pass test-amd64-amd64-libvirt-xsm pass test-amd64-i386-libvirt-xsm
[Xen-devel] [linux-4.14 test] 131238: tolerable FAIL - PUSHED
flight 131238 linux-4.14 real [real] http://logs.test-lab.xenproject.org/osstest/logs/131238/ Failures :-/ but no regressions. Tests which did not succeed, but are not blocking: test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail never pass test-amd64-i386-libvirt 13 migrate-support-checkfail never pass test-amd64-i386-libvirt-xsm 13 migrate-support-checkfail never pass test-amd64-amd64-libvirt 13 migrate-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2 fail never pass test-amd64-amd64-libvirt-vhd 12 migrate-support-checkfail never pass test-armhf-armhf-libvirt 13 migrate-support-checkfail never pass test-armhf-armhf-libvirt 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit1 13 migrate-support-checkfail never pass test-armhf-armhf-xl-credit1 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-cubietruck 13 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 13 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-rtds 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-multivcpu 13 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit2 13 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 14 saverestore-support-checkfail never pass test-amd64-i386-xl-qemut-win7-amd64 17 guest-stop fail never pass test-amd64-amd64-xl-qemuu-ws16-amd64 17 guest-stop fail never pass test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stop fail never pass test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop fail never pass test-amd64-i386-xl-pvshim12 guest-start fail never pass test-amd64-amd64-xl-qemut-ws16-amd64 17 guest-stop fail never pass test-amd64-amd64-xl-qemut-win7-amd64 17 guest-stop fail never pass test-armhf-armhf-libvirt-raw 12 migrate-support-checkfail never pass test-armhf-armhf-libvirt-raw 13 saverestore-support-checkfail never pass test-armhf-armhf-xl 13 migrate-support-checkfail never pass test-armhf-armhf-xl 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-arndale 13 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 14 saverestore-support-checkfail never pass test-amd64-i386-xl-qemuu-ws16-amd64 17 guest-stop fail never pass test-amd64-i386-xl-qemut-ws16-amd64 17 guest-stop fail never pass test-armhf-armhf-xl-vhd 12 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 13 saverestore-support-checkfail never pass test-amd64-i386-xl-qemut-win10-i386 10 windows-install fail never pass test-amd64-amd64-xl-qemut-win10-i386 10 windows-installfail never pass test-amd64-amd64-xl-qemuu-win10-i386 10 windows-installfail never pass test-amd64-i386-xl-qemuu-win10-i386 10 windows-install fail never pass version targeted for testing: linuxca48e5e30b75a28c12c43c7428c95735e4885e6b baseline version: linux2e390c487815669fb9bb35d7ea11883cc10a9b50 Last test of basis 130155 2018-11-15 23:53:54 Z 26 days Failing since130644 2018-11-21 08:41:02 Z 21 days 12 attempts Testing same since 131156 2018-12-09 02:01:10 Z3 days3 attempts 442 people touched revisions under test, not listing them all jobs: build-amd64-xsm pass build-i386-xsm pass build-amd64 pass build-armhf pass build-i386 pass build-amd64-libvirt pass build-armhf-libvirt pass build-i386-libvirt pass build-amd64-pvopspass build-armhf-pvopspass build-i386-pvops pass build-amd64-rumprun pass build-i386-rumprun pass test-amd64-amd64-xl pass
Re: [Xen-devel] [PATCH v5 5/7] xen/arm: zynqmp: eemi access control
On Tue, 11 Dec 2018, Julien Grall wrote: > Hi, > > On 03/12/2018 21:03, Stefano Stabellini wrote: > > From: "Edgar E. Iglesias" > > > > From: Edgar E. Iglesias > > > > Introduce data structs to implement basic access controls. > > Introduce the following three functions: > > > > domain_has_node_access: check access to the node > > domain_has_reset_access: check access to the reset line > > domain_has_mmio_access: check access to the register > > > > Signed-off-by: Edgar E. Iglesias > > Signed-off-by: Stefano Stabellini > > > > --- > > Statically defines: > > > > - pm_node_access > > It encodes the relationship between a node id and the start of the MMIO > > region of a device in the corresponding power domain. It is used for > > permission checking. Although the MMIO region start address is available > > on device tree and could be derived from there (we plan to improve that > > in the future), the relationship between a node id and corresponding > > devices is not described and needs to be hardcoded. > > > > - pm_reset_access > > Same as pm_node_access for reset lines. > > > > --- > > Changes in v5: > > - improve in-code comments > > - use mfn_t in struct pm_access > > - remove mmio_access table > > > > Changes in v4: > > - add #include as needed > > - add #if 0 for bisectability > > - use mfn_t in pm_check_access > > - add wrap-around ASSERT in domain_has_mmio_access > > - use GENMASK in domain_has_mmio_access > > - proper bound checks (== ARRAY_SIZE is out of bound) > > --- > > xen/arch/arm/platforms/xilinx-zynqmp-eemi.c | 348 > > > > 1 file changed, 348 insertions(+) > > > > diff --git a/xen/arch/arm/platforms/xilinx-zynqmp-eemi.c > > b/xen/arch/arm/platforms/xilinx-zynqmp-eemi.c > > index 369bb3f..92a02df 100644 > > --- a/xen/arch/arm/platforms/xilinx-zynqmp-eemi.c > > +++ b/xen/arch/arm/platforms/xilinx-zynqmp-eemi.c > > @@ -16,9 +16,357 @@ > >* GNU General Public License for more details. > >*/ > > +/* > > + * EEMI Power Management API access > > + * > > + * Refs: > > + * > > https://www.xilinx.com/support/documentation/user_guides/ug1200-eemi-api.pdf > > + * > > + * Background: > > + * The ZynqMP has a subsystem named the PMU with a CPU and special devices > > + * dedicated to running Power Management Firmware. Other masters in the > > + * system need to send requests to the PMU in order to for example: > > + * * Manage power state > > + * * Configure clocks > > + * * Program bitstreams for the programmable logic > > + * * etc > > + * > > + * Although the details of the setup are configurable, in the common case > > + * the PMU lives in the Secure world. NS World cannot directly communicate > > + * with it and must use proxy services from ARM Trusted Firmware to reach > > + * the PMU. > > + * > > + * Power Management on the ZynqMP is implemented in a layered manner. > > + * The PMU knows about various masters and will enforce access controls > > + * based on a pre-configured partitioning. This configuration dictates > > + * which devices are owned by the various masters and the PMU FW makes sure > > + * that a given master cannot turn off a device that it does not own or > > that > > + * is in use by other masters. > > + * > > + * The PMU is not aware of multiple execution states in masters. > > + * For example, it treats the ARMv8 cores as single units and does not > > + * distinguish between Secure vs NS OS's nor does it know about Hypervisors > > + * and multiple guests. It is up to software on the ARMv8 cores to present > > + * a unified view of its power requirements. > > + * > > + * To implement this unified view, ARM Trusted Firmware at EL3 provides > > + * access to the PM API via SMC calls. ARM Trusted Firmware is responsible > > + * for mediating between the Secure and the NS world, rejecting SMC calls > > + * that request changes that are not allowed. > > + * > > + * Xen running above ATF owns the NS world and is responsible for > > presenting > > + * unified PM requests taking all guests and the hypervisor into account. > > + * > > + * Implementation: > > + * The PM API contains different classes of calls. > > + * Certain calls are harmless to expose to any guest. > > + * These include calls to get the PM API Version, or to read out the > > version > > + * of the chip we're running on. > > + * > > + * In order to correctly virtualize these calls, we need to know if > > + * guests issuing these calls have ownership of the given device. > > + * The approach taken here is to map PM API Nodes identifying > > + * a device into base addresses for registers that belong to that > > + * same device. > > + * > > + * If the guest has access to devices registers, we give the guest > > + * access to PM API calls that affect that device. This is implemented > > + * by pm_node_access and domain_has_node_access(). > > + */ > > + > > +#include > > +#include > > #include > > #include > > +#if 0 > > +struct pm_access > > +{ > > +
Re: [Xen-devel] [PATCH v5 4/7] xen: introduce mfn_init macro
On Wed, 5 Dec 2018, Jan Beulich wrote: > >>> On 04.12.18 at 20:38, wrote: > > On Tue, 4 Dec 2018, Jan Beulich wrote: > >> >>> On 03.12.18 at 22:03, wrote: > >> > To be used in constant initializations of mfn_t variables, such as: > >> > > >> > static mfn_t node = mfn_init(MM_ADDR); > >> > > >> > It is necessary because static inline functions cannot be used as static > >> > initializers. > >> > >> We had been at this point once (quite some time ago), and got > >> away without such an addition. Did you try to find that old > >> discussion? Are there any new reasons to have such a construct? > >> Do you need this for other than setting a value to INVALID_MFN, > >> in which case INVALID_MFN_INITIALIZER ought to be suitable? > >> > >> This is not to say I'm entirely opposed. > >> > >> If we were to have such a construct, I wonder though whether > >> mfn_init() is suitable as a name. Simply MFN() perhaps, and then > >> also consistently have GFN() and DFN()? > > > > Hi Jan, > > > > I am happy with any name, and MFN() together with GFN() and DFN() look > > like a good option. > > > > The reason why it is needed is that without it I cannot introduce a > > statically initialized array of mfn_t type like the one in the following > > patch in the series: > > > > +static const struct pm_access pm_node_access[] = { > > +/* MM_RPU grants access to all RPU Nodes. */ > > +[NODE_RPU] = { mfn_init(MM_RPU) }, > > +[NODE_RPU_0] = { mfn_init(MM_RPU) }, > > +[NODE_RPU_1] = { mfn_init(MM_RPU) }, > > +[NODE_IPI_RPU_0] = { mfn_init(MM_RPU) }, > > > > [...] > > > > Where MM_RPU is a mfn, and the NODE_* are IDs defined as enum: > > > > #define MM_RPU 0xff9a0 > > > > enum pm_node_id { > > NODE_RPU = 6, > > NODE_RPU_0, > > NODE_RPU_1, > > > > [...] > > > > > > Originally I had: > > > > [NODE_RPU] = { MM_RPU }, > > > > but I changed the type to be mfn_t to address one of Julien's comments. > > You might get a better idea of the issue if you give a look at this > > branch: > > > > http://xenbits.xenproject.org/git-http/people/sstabellini/xen-unstable.git > > zynqmp-v5 > > Well, I have to admit that I'd rather not see ways to embed hard-coded > MFNs into code made available generically. May I suggest that you use > a macro with a name to your liking just locally to that one source file? OK, no problem > As a side note, I'm also puzzled by there being entries in the table which > don't have their MFNs specified. Oddly enough it looks as if > .hwdom_access was true if and only if no MFN is specified. Yes, the hwdom_access check could be turned into a check for MFN(INVALID_MFN). I'll do that it will actually make the array size smaller. > The term "node" of course is confusing too, considering its NUMA > meaning elsewhere in the hypervisor. That comes from the EEMI firmware specification: they use the term "node" to address a power domain "unit". ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v5 6/7] xen/arm: zynqmp: implement zynqmp_eemi
On Wed, 12 Dec 2018, Julien Grall wrote: > On 11/12/2018 22:23, Stefano Stabellini wrote: > > On Tue, 11 Dec 2018, Julien Grall wrote: > > > Hi Stefano, > > > > > > On 03/12/2018 21:03, Stefano Stabellini wrote: > > > > From: "Edgar E. Iglesias" > > > > > > > > From: Edgar E. Iglesias > > > > > > > > zynqmp_eemi uses the defined functions and structs to decide whether to > > > > make a call to the firmware, or to simply return a predefined value. > > > > > > > > Signed-off-by: Edgar E. Iglesias > > > > Signed-off-by: Stefano Stabellini > > > > --- > > > > Changes in v5: > > > > - remove mmio_access handling > > > > > > > > Changes in v4: > > > > - add #include as needed > > > > - improve comment > > > > - code style > > > > --- > > > >xen/arch/arm/platforms/xilinx-zynqmp-eemi.c | 181 > > > > +++- > > > >1 file changed, 125 insertions(+), 56 deletions(-) > > > > > > > > diff --git a/xen/arch/arm/platforms/xilinx-zynqmp-eemi.c > > > > b/xen/arch/arm/platforms/xilinx-zynqmp-eemi.c > > > > index 92a02df..9ecf286 100644 > > > > --- a/xen/arch/arm/platforms/xilinx-zynqmp-eemi.c > > > > +++ b/xen/arch/arm/platforms/xilinx-zynqmp-eemi.c > > > > @@ -76,10 +76,10 @@ > > > > #include > > > >#include > > > > +#include > > > >#include > > > >#include > > > >-#if 0 > > > >struct pm_access > > > >{ > > > >mfn_t mfn; > > > > @@ -309,67 +309,136 @@ static bool domain_has_reset_access(struct domain > > > > *d, > > > > uint32_t rst) > > > >return pm_check_access(pm_reset_access, d, rst); > > > >} > > > >-/* > > > > - * Check if a given domain has access to perform an indirect > > > > - * MMIO access. > > > > - * > > > > - * If the provided mask is invalid, it will be fixed up. > > > > - */ > > > > -static bool domain_has_mmio_access(struct domain *d, > > > > - bool write, paddr_t addr, > > > > - uint32_t *mask) > > > > > > Why do you remove code that you just introduced? > > > > I am really sorry about this, it was error applying a patch. This code > > should never have been introduced: the code should be removed from the > > previous patch. I'll fix it. > > > > > > > > +bool zynqmp_eemi(struct cpu_user_regs *regs) > > > >{ > > > > -unsigned int i; > > > > -bool ret = false; > > > > -uint32_t prot_mask = 0; > > > > - > > > > -/* > > > > - * The hardware domain gets read access to everything. > > > > - * Lower layers will do further filtering. > > > > - */ > > > > -if ( !write && is_hardware_domain(d) ) > > > > -return true; > > > > +struct arm_smccc_res res; > > > > +uint32_t fid = get_user_reg(regs, 0); > > > > +uint32_t nodeid = get_user_reg(regs, 1); > > > > > > You didn't address my concern regarding SMC32 vs SMC64 convention. As I > > > said > > > earlier on, at least CALL_COUNT, UID and VERSION are only accessible using > > > the > > > SMC32 convention. > > > > > > I can't tell for the other as the EEMI spec does not seem to specify it. I > > > would be surprised that EEMI would ignore tops bits of the ID given they > > > convey different information (e.g fast/yielding call, 32/64-bit > > > convention). > > > > > > Looking at the branch you mentioned earlier on, zynqmp_pm_invoke_fn > > > (drivers/firmware/xilinx/zynqmp.c) is definitely using the SMC64 calling > > > convention as described in the documentation above the function. > > > > > > So this needs to be fixed properly. > > > > OK, I'll add a check for the mandatory smc32 calls and forward them to > > firmware properly. > > smc32 is only part of the problem I mentioned. The main problem is you only > look at the function number (bits 0-15). This is not enough to know which > function is going to be called. You want to take into account the full > function identifier. We provide helpers to create them (see > ARM_SMCCC_CALL_VAL). Yes, I get what you are saying. I made a few changes in that direction. > > > > +unsigned int pm_fn = fid & 0x; > > > > +enum pm_ret_status ret; > > > >-/* Scan the ACL. */ > > > > -for ( i = 0; i < ARRAY_SIZE(pm_mmio_access); i++ ) > > > > +switch ( pm_fn ) > > > >{ > > > > -ASSERT(pm_mmio_access[i].start + pm_mmio_access[i].size >= > > > > - pm_mmio_access[i].start); > > > > - > > > > -if ( addr < pm_mmio_access[i].start ) > > > > -return false; > > > > -if ( addr >= pm_mmio_access[i].start + pm_mmio_access[i].size ) > > > > -continue; > > > > - > > > > -if ( write && pm_mmio_access[i].readonly ) > > > > -return false; > > > > -if ( pm_mmio_access[i].hwdom_access && !is_hardware_domain(d) ) > > > > -return false; > > > > -if ( !domain_has_node_access(d, pm_mmio_access[i].node) ) > > > > -return false; > > > > - > > > > -/* We've got
Re: [Xen-devel] [PATCH v5 3/7] xen/arm: zynqmp: introduce zynqmp specific defines
On Wed, 12 Dec 2018, Julien Grall wrote: > Hi Stefano, > > On 11/12/2018 19:22, Stefano Stabellini wrote: > > On Tue, 11 Dec 2018, Julien Grall wrote: > > > Hi, > > > > > > On 03/12/2018 21:03, Stefano Stabellini wrote: > > > What is the plan there? > > > > The plan is to extract the node_id from a power-domain node on device > > tree. Each device would have a phandler to link to the right > > power-domain node which contains a power-domain-id attribute. The > > power-domain-id attribute is the node_id here. > > > > The power-domain-id changes to the Xilinx MPSoC device tree are under > > discussion with the device tree community. > > If I understand correctly, we will never be able to remove the hardcoded > values in Xen. This is because some device-tree may not have the bindings. Am > I correct? If we want to support running on existing hardware and firmware releases, then you are correct. We won't be able to remove the hardcoded values. That ship has sailed, not much we can do about it. If in the future we decide to drop support for older firmware releases and ask users to update their firmare/devicetrees, then we'll be able to remove the hardcoded values. I think it is something we can consider. In fact, I would rather break compatibility than not providing support for basic power management functionalities at all. I think it is better to introduce basic EEMI support now, even if in a couple of Xen releases from now we'll make the decision to drop the hardcoded values and require new Xilinx device trees. Xilinx users are used to updating firmware on these boards every 6 months, and it would beneficial for them to be able to take a Xen Project release rather than be forced to use a Xilinx Xen release to have support for power management. When the new bindings become available, as we introduce support for them in Xen, we can decide whether to keep the hardcoded values or whether we should take the opportunity to drop them. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH for-4.12] dt: fix integer check in dt_device_is_compatible
On Wed, 12 Dec 2018, Julien Grall wrote: > Hi Stefano, > > On 11/12/2018 18:46, Stefano Stabellini wrote: > > cplen is unsigned, thus, it can never be < 0. Use a signed integer local > > variable instead. > > The current code checks > 0. Looking at the code I don't think it can ever be > negative. So can you details what is the problem you are trying to resolve? Yes, it can be "negative" (not actually negative because it is defined as unsigned), in fact it happens with the nodes added dynamically by grub at boot. This patches fixes booting from grub. Specifically ilen is initialed to 16, but strlen+1 is 17, so length becomes -1. The length of the property generated by grub seems to be short by 1. > > Signed-off-by: Stefano Stabellini > > --- > > xen/common/device_tree.c | 7 +-- > > 1 file changed, 5 insertions(+), 2 deletions(-) > > > > diff --git a/xen/common/device_tree.c b/xen/common/device_tree.c > > index 8fc401d..df274cc 100644 > > --- a/xen/common/device_tree.c > > +++ b/xen/common/device_tree.c > > @@ -213,17 +213,20 @@ bool_t dt_device_is_compatible(const struct > > dt_device_node *device, > > { > > const char* cp; > > u32 cplen, l; > > +s64 ilen; > > cp = dt_get_property(device, "compatible", ); > > if ( cp == NULL ) > > return 0; > > -while ( cplen > 0 ) > > + > > +ilen = cplen; > > +while ( ilen > 0 ) > > { > > if ( dt_compat_cmp(cp, compat) == 0 ) > > return 1; > > l = strlen(cp) + 1; > > cp += l; > > -cplen -= l; > > +ilen -= l; > > } > > return 0; > > > > -- > Julien Grall > ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 2/2] xen/arm: Stop relocating Xen
On Thu, 29 Nov 2018, Julien Grall wrote: > At the moment, Xen is relocated towards the end of the memory. While > this has the advantage to free space in low memory, the code is not > compliant with the break-before-make because it requires to switch > between two sets of page-table. This is not entirely trivial to fix as > it would require us to go through an identity mapping and disabling MMU. > > Furthermore, it looks like that some platform (such as the Hikey960) > may not be able to bring-up secondary CPUs if the entry is too high. > > I don't believe the low memory is an issue because Xen is quite tiny > (< 2MB). So the best solution is to stop relocating Xen. This has the > advantage to simplify the code and should speed-up the boot as relocation > is not necessary anymore. > > Note that the break-before-make issue is not fixed by this patch. > > Signed-off-by: Julien Grall > Reported-by: Matthew Daley > --- > xen/arch/arm/arm32/head.S | 54 +++ > xen/arch/arm/arm64/head.S | 50 +--- > xen/arch/arm/mm.c | 20 --- > xen/arch/arm/setup.c | 65 > +++ > xen/include/asm-arm/mm.h | 2 +- > 5 files changed, 18 insertions(+), 173 deletions(-) > > diff --git a/xen/arch/arm/arm32/head.S b/xen/arch/arm/arm32/head.S > index 93b51e9ef2..390a505e05 100644 > --- a/xen/arch/arm/arm32/head.S > +++ b/xen/arch/arm/arm32/head.S > @@ -469,58 +469,12 @@ fail: PRINT("- Boot failed -\r\n") > GLOBAL(_end_boot) > > /* > - * Copy Xen to new location and switch TTBR > + * Switch TTBR > * r1:r0 ttbr > - * r2 source address > - * r3 destination address > - * [sp]=>r4length > * > - * Source and destination must be word aligned, length is rounded up > - * to a 16 byte boundary. > - * > - * MUST BE VERY CAREFUL when saving things to RAM over the copy > + * TODO: This code does not comply with break-before-make. > */ > -ENTRY(relocate_xen) > -push {r4,r5,r6,r7,r8,r9,r10,r11} > - > -ldr r4, [sp, #8*4]/* Get 4th argument from stack */ > - > -/* Copy 16 bytes at a time using: > - * r5: counter > - * r6: data > - * r7: data > - * r8: data > - * r9: data > - * r10: source > - * r11: destination > - */ > -mov r5, r4 > -mov r10, r2 > -mov r11, r3 > -1: ldmia r10!, {r6, r7, r8, r9} > -stmia r11!, {r6, r7, r8, r9} > - > -subs r5, r5, #16 > -bgt 1b > - > -/* Flush destination from dcache using: > - * r5: counter > - * r6: step > - * r7: vaddr > - */ > -dsb/* So the CPU issues all writes to the range */ > - > -mov r5, r4 > -ldr r6, =dcache_line_bytes /* r6 := step */ > -ldr r6, [r6] > -mov r7, r3 > - > -1: mcr CP32(r7, DCCMVAC) > - > -add r7, r7, r6 > -subs r5, r5, r6 > -bgt 1b > - > +ENTRY(switch_ttbr) > dsb/* Ensure the flushes happen before > * continuing */ > isb/* Ensure synchronization with > previous > @@ -543,8 +497,6 @@ ENTRY(relocate_xen) > dsb/* Ensure completion of TLB+BP flush > */ > isb > > -pop {r4, r5,r6,r7,r8,r9,r10,r11} > - > mov pc, lr > > #ifdef CONFIG_EARLY_PRINTK > diff --git a/xen/arch/arm/arm64/head.S b/xen/arch/arm/arm64/head.S > index 9428c3f5a2..4589a37874 100644 > --- a/xen/arch/arm/arm64/head.S > +++ b/xen/arch/arm/arm64/head.S > @@ -605,52 +605,14 @@ fail: PRINT("- Boot failed -\r\n") > > GLOBAL(_end_boot) > > -/* Copy Xen to new location and switch TTBR > - * x0ttbr > - * x1source address > - * x2destination address > - * x3length > +/* > + * Switch TTBR > * > - * Source and destination must be word aligned, length is rounded up > - * to a 16 byte boundary. > + * x0ttbr > * > - * MUST BE VERY CAREFUL when saving things to RAM over the copy */ > -ENTRY(relocate_xen) > -/* Copy 16 bytes at a time using: > - * x9: counter > - * x10: data > - * x11: data > - * x12: source > - * x13: destination > - */ > -mov x9, x3 > -mov x12, x1 > -mov x13, x2 > - > -1: ldp x10, x11, [x12], #16 > -stp x10, x11, [x13], #16 > - > -subsx9, x9, #16 > -bgt 1b > - > -/* Flush destination from dcache using: > - * x9: counter > - * x10: step > - * x11: vaddr > - */ > -dsb sy/* So the CPU issues all writes to the range */ > - > -mov x9, x3 > -ldr x10, =dcache_line_bytes /* x10 := step */ > -ldr x10, [x10] > -
Re: [Xen-devel] [PATCH 1/2] xen/arm: mm: Set-up page permission for Xen mappings earlier on
On Thu, 29 Nov 2018, Julien Grall wrote: > Xen mapping is first create using a 2MB page and then shatterred in 4KB > page for fine-graine permission. However, it is not safe to break-down > superpage page without going to an intermediate step invalidating > the entry. > > As we are changing Xen mappings, we cannot go through the intermediate > step. The only solution is to create Xen mapping using 4KB entries > directly. As the Xen should always access the mappings according with > the runtime permission, it is then possible to set-up the permissions > while create the mapping. > > We are still playing with the fire as there are still some > break-before-make issue in setup_pagetables (i.e switch between 2 sets of > page-tables). But it should slightly be better than the current state. > > Signed-off-by: Julien Grall > Reported-by: Shameerali Kolothum Thodi > Reported-by: Jan-Peter Larsson Reviewed-by: Stefano Stabellini and committed > --- > I had few reports on new platforms where Xen reliably stale as soon as > SCTLR.WXN is turned on. This likely happens because of not complying > with Break-Before-Make when setting-up the permission as we > break-down a superpage to 4KB mappings. > --- > xen/arch/arm/mm.c | 49 ++--- > 1 file changed, 22 insertions(+), 27 deletions(-) > > diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c > index 987fcb9162..2556e57a99 100644 > --- a/xen/arch/arm/mm.c > +++ b/xen/arch/arm/mm.c > @@ -649,11 +649,31 @@ void __init setup_pagetables(unsigned long > boot_phys_offset, paddr_t xen_paddr) > } > #endif > > +/* Break up the Xen mapping into 4k pages and protect them separately. */ > +for ( i = 0; i < LPAE_ENTRIES; i++ ) > +{ > +mfn_t mfn = mfn_add(maddr_to_mfn(xen_paddr), i); > +unsigned long va = XEN_VIRT_START + (i << PAGE_SHIFT); > + > +if ( !is_kernel(va) ) > +break; > +pte = mfn_to_xen_entry(mfn, MT_NORMAL); > +pte.pt.table = 1; /* 4k mappings always have this bit set */ > +if ( is_kernel_text(va) || is_kernel_inittext(va) ) > +{ > +pte.pt.xn = 0; > +pte.pt.ro = 1; > +} > +if ( is_kernel_rodata(va) ) > +pte.pt.ro = 1; > +xen_xenmap[i] = pte; > +} > + > /* Initialise xen second level entries ... */ > /* ... Xen's text etc */ > > -pte = mfn_to_xen_entry(maddr_to_mfn(xen_paddr), MT_NORMAL); > -pte.pt.xn = 0;/* Contains our text mapping! */ > +pte = pte_of_xenaddr((vaddr_t)xen_xenmap); > +pte.pt.table = 1; > xen_second[second_table_offset(XEN_VIRT_START)] = pte; > > /* ... Fixmap */ > @@ -693,31 +713,6 @@ void __init setup_pagetables(unsigned long > boot_phys_offset, paddr_t xen_paddr) > clear_table(boot_second); > clear_table(boot_third); > > -/* Break up the Xen mapping into 4k pages and protect them separately. */ > -for ( i = 0; i < LPAE_ENTRIES; i++ ) > -{ > -mfn_t mfn = mfn_add(maddr_to_mfn(xen_paddr), i); > -unsigned long va = XEN_VIRT_START + (i << PAGE_SHIFT); > -if ( !is_kernel(va) ) > -break; > -pte = mfn_to_xen_entry(mfn, MT_NORMAL); > -pte.pt.table = 1; /* 4k mappings always have this bit set */ > -if ( is_kernel_text(va) || is_kernel_inittext(va) ) > -{ > -pte.pt.xn = 0; > -pte.pt.ro = 1; > -} > -if ( is_kernel_rodata(va) ) > -pte.pt.ro = 1; > -write_pte(xen_xenmap + i, pte); > -/* No flush required here as page table is not hooked in yet. */ > -} > - > -pte = pte_of_xenaddr((vaddr_t)xen_xenmap); > -pte.pt.table = 1; > -write_pte(xen_second + second_linear_offset(XEN_VIRT_START), pte); > -/* TLBFLUSH and ISB would be needed here, but wait until we set WXN */ > - > /* From now on, no mapping may be both writable and executable. */ > WRITE_SYSREG32(READ_SYSREG32(SCTLR_EL2) | SCTLR_WXN, SCTLR_EL2); > /* Flush everything after setting WXN bit. */ > -- > 2.11.0 > ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [ovmf test] 131245: regressions - FAIL
flight 131245 ovmf real [real] http://logs.test-lab.xenproject.org/osstest/logs/131245/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: build-i3866 xen-buildfail REGR. vs. 129475 build-i386-xsm6 xen-buildfail REGR. vs. 129475 build-amd64 6 xen-buildfail REGR. vs. 129475 build-amd64-xsm 6 xen-buildfail REGR. vs. 129475 Tests which did not succeed, but are not blocking: test-amd64-i386-xl-qemuu-ovmf-amd64 1 build-check(1) blocked n/a build-amd64-libvirt 1 build-check(1) blocked n/a test-amd64-amd64-xl-qemuu-ovmf-amd64 1 build-check(1) blocked n/a build-i386-libvirt1 build-check(1) blocked n/a version targeted for testing: ovmf e07092edca8442db4a941dbeea0cd196c7bf8ec9 baseline version: ovmf 5ae3184d8c59f7bbb84bad482df6b8020ba58188 Last test of basis 129475 2018-11-05 21:13:11 Z 37 days Failing since129526 2018-11-06 20:49:26 Z 36 days 152 attempts Testing same since 131245 2018-12-11 20:48:15 Z1 days1 attempts People who touched revisions under test: Achin Gupta Ard Biesheuvel Bob Feng bob.c.f...@intel.com BobCF Chasel Chiu Chasel, Chiu Chen A Chen Dandan Bi David Wei Eric Dong Feng, Bob C Fu Siyuan Gary Lin Hao Wu Jaben Carsey Jeff Brasen Jian J Wang Jiaxin Wu Jiewen Yao Laszlo Ersek Leif Lindholm Liming Gao Liu Yu Marc Zyngier Marcin Wojtas Ming Huang Pedroa Liu Ruiyu Ni shenglei Shenglei Zhang Star Zeng Sughosh Ganu Sumit Garg Sun, Zailiang Thomas Abraham Tomasz Michalec Vijayenthiran Subramaniam Wang BinX A Wu Jiaxin Yonghong Zhu yuchenlin Zailiang Sun Zhang, Chao B Zhao, ZhiqiangX ZhiqiangX Zhao zwei4 jobs: build-amd64-xsm fail build-i386-xsm fail build-amd64 fail build-i386 fail build-amd64-libvirt blocked build-i386-libvirt blocked build-amd64-pvopspass build-i386-pvops pass test-amd64-amd64-xl-qemuu-ovmf-amd64 blocked test-amd64-i386-xl-qemuu-ovmf-amd64 blocked sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Not pushing. (No revision log; it would be 3391 lines long.) ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v9 0/7] KVM: x86: Allow Qemu/KVM to use PVH entry point
On 12/12/2018 12:39 PM, Borislav Petkov wrote: On Tue, Dec 11, 2018 at 11:29:21AM -0800, Maran Wilson wrote: Is your question about what options you need to provide to Qemu? Or is your question about the SW implementation choices? Assuming the former... Yeah, that's what I wanted to know. But looking at it, I'm booting bzImage here just as quickly and as flexible so I don't see the advantage of this new method for my use case here of booting kernels in qemu. But maybe there's a good use case where firmware is slow and one doesn't really wanna noodle through it or when one does start a gazillion VMs per second or whatever... Right, the time saved is not something you would notice while starting a VM manually. But it does reduce the time to reach startup_64() in Linux by about 50% (going from around 94ms to around 47ms) when booting a VM using Qemu+qboot (for example). That time savings becomes pretty important when you are trying to use VMs as containers (for instance, as is the case with Kata containers) and trying to get the latency for launching such a container really low -- to come as close as possible to match the latency for launching more traditional containers that don't have the additional security/isolation of running within a separate VM. Thanks, -Maran Thx. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [xen-unstable test] 131233: regressions - FAIL
flight 131233 xen-unstable real [real] http://logs.test-lab.xenproject.org/osstest/logs/131233/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-amd64-examine 4 memdisk-try-append fail REGR. vs. 130985 Regressions which are regarded as allowable (not blocking): test-armhf-armhf-xl-rtds16 guest-start/debian.repeat fail REGR. vs. 130985 Tests which did not succeed, but are not blocking: test-amd64-amd64-xl-qemut-win7-amd64 17 guest-stopfail like 130985 test-armhf-armhf-libvirt 14 saverestore-support-checkfail like 130985 test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stopfail like 130985 test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop fail like 130985 test-amd64-i386-xl-qemut-win7-amd64 17 guest-stop fail like 130985 test-armhf-armhf-libvirt-raw 13 saverestore-support-checkfail like 130985 test-amd64-amd64-xl-qemuu-ws16-amd64 17 guest-stopfail like 130985 test-amd64-amd64-xl-qemut-ws16-amd64 17 guest-stopfail like 130985 test-amd64-i386-xl-qemuu-ws16-amd64 17 guest-stop fail like 130985 test-amd64-amd64-libvirt 13 migrate-support-checkfail never pass test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail never pass test-amd64-i386-libvirt 13 migrate-support-checkfail never pass test-amd64-i386-xl-pvshim12 guest-start fail never pass test-amd64-i386-libvirt-xsm 13 migrate-support-checkfail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-armhf-armhf-xl-arndale 13 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 14 saverestore-support-checkfail never pass test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2 fail never pass test-amd64-amd64-libvirt-vhd 12 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 13 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit1 13 migrate-support-checkfail never pass test-armhf-armhf-libvirt 13 migrate-support-checkfail never pass test-armhf-armhf-xl-credit1 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-cubietruck 13 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 13 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-multivcpu 14 saverestore-support-checkfail never pass test-armhf-armhf-xl 13 migrate-support-checkfail never pass test-armhf-armhf-xl 14 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-raw 12 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 13 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 14 saverestore-support-checkfail never pass test-amd64-i386-xl-qemut-ws16-amd64 17 guest-stop fail never pass test-armhf-armhf-xl-vhd 12 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 13 saverestore-support-checkfail never pass test-amd64-amd64-xl-qemuu-win10-i386 10 windows-installfail never pass test-amd64-i386-xl-qemuu-win10-i386 10 windows-install fail never pass test-amd64-amd64-xl-qemut-win10-i386 10 windows-installfail never pass test-amd64-i386-xl-qemut-win10-i386 10 windows-install fail never pass version targeted for testing: xen 76a68b902062a5ca9053f5cf6a3ab46148cb45f2 baseline version: xen 82855aba5bf91e50c81526167c11d4aeaf665e66 Last test of basis 130985 2018-12-03 17:11:24 Z9 days Failing since131065 2018-12-05 19:14:18 Z7 days5 attempts Testing same since 131233 2018-12-11 13:49:11 Z1 days1 attempts People who touched revisions under test: Andrew Cooper Anthony PERARD Doug Goldstein Ian Jackson Jan Beulich Juergen Gross Julien Grall Kevin Tian Oleksandr Tyshchenko Paul Durrant Roger Pau Monné Wei Liu jobs: build-amd64-xsm pass build-i386-xsm pass build-amd64-xtf pass build-amd64 pass build-armhf pass build-i386 pass build-amd64-libvirt pass build-armhf-libvirt
[Xen-devel] [linux-3.18 bisection] complete test-amd64-amd64-pair
branch xen-unstable xenbranch xen-unstable job test-amd64-amd64-pair testid xen-boot/dst_host Tree: linux git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git Tree: linuxfirmware git://xenbits.xen.org/osstest/linux-firmware.git Tree: qemu git://xenbits.xen.org/qemu-xen-traditional.git Tree: qemuu git://xenbits.xen.org/qemu-xen.git Tree: xen git://xenbits.xen.org/xen.git *** Found and reproduced problem changeset *** Bug is in tree: linux git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git Bug introduced: 7b8052e19304865477e03a0047062d977309a22f Bug not present: d255d18a34a8d53ccc4a019dc07e17b6e8cf6bd1 Last fail repro: http://logs.test-lab.xenproject.org/osstest/logs/131278/ commit 7b8052e19304865477e03a0047062d977309a22f Author: Jan Beulich Date: Mon Oct 19 04:23:29 2015 -0600 igb: fix NULL derefs due to skipped SR-IOV enabling [ Upstream commit be06998f96ecb93938ad2cce46c4289bf7cf45bc ] The combined effect of commits 6423fc3416 ("igb: do not re-init SR-IOV during probe") and ceee3450b3 ("igb: make sure SR-IOV init uses the right number of queues") causes VFs no longer getting set up, leading to NULL pointer dereferences due to the adapter's ->vf_data being NULL while ->vfs_allocated_count is non-zero. The first commit not only neglected the side effect of igb_sriov_reinit() that the second commit tried to account for, but also that of setting IGB_FLAG_HAS_MSIX, without which igb_enable_sriov() is effectively a no-op. Calling igb_{,re}set_interrupt_capability() as done here seems to address this, but I'm not sure whether this is better than sinply reverting the other two commits. Signed-off-by: Jan Beulich Tested-by: Aaron Brown Signed-off-by: Jeff Kirsher Signed-off-by: Sasha Levin For bisection revision-tuple graph see: http://logs.test-lab.xenproject.org/osstest/results/bisect/linux-3.18/test-amd64-amd64-pair.xen-boot--dst_host.html Revision IDs in each graph node refer, respectively, to the Trees above. Running cs-bisection-step --graph-out=/home/logs/results/bisect/linux-3.18/test-amd64-amd64-pair.xen-boot--dst_host --summary-out=tmp/131280.bisection-summary --basis-template=128858 --blessings=real,real-bisect --flight=131280 linux-3.18 test-amd64-amd64-pair xen-boot/dst_host Searching for failure / basis pass: 131231 fail [dst_host=debina1,src_host=debina0] / 130367 [dst_host=albana0,src_host=albana1] 130203 [dst_host=albana0,src_host=albana1] 130067 [dst_host=pinot1,src_host=pinot0] 129845 [dst_host=albana1,src_host=albana0] 129760 [dst_host=godello1,src_host=godello0] 128858 [dst_host=elbling0,src_host=elbling1] 128841 [dst_host=elbling0,src_host=elbling1] 128807 [dst_host=godello1,src_host=godello0] 128691 [dst_host=fiano1,src_host=fiano0] 128258 [dst_host=fiano1,src_host=fiano0] 128232 [dst_host=albana1,src_host=albana0] 128177 ok. Failure / basis pass flights: 131231 / 128177 (tree with no url: minios) (tree with no url: ovmf) (tree with no url: seabios) Tree: linux git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git Tree: linuxfirmware git://xenbits.xen.org/osstest/linux-firmware.git Tree: qemu git://xenbits.xen.org/qemu-xen-traditional.git Tree: qemuu git://xenbits.xen.org/qemu-xen.git Tree: xen git://xenbits.xen.org/xen.git Latest 3879c163e8681939b1d93139521aee983623884f c530a75c1e6a472b0eb9558310b518f0dfcd8860 d0d8ad39ecb51cd7497cd524484fe09f50876798 de5b678ca4dcdfa83e322491d478d66df56c1986 82855aba5bf91e50c81526167c11d4aeaf665e66 Basis pass 921b2fed6a79439ef1609ef4af0ada5cccb3555c c530a75c1e6a472b0eb9558310b518f0dfcd8860 9c0eed618f37dd5b4a57c8b3fbc48ef8913e3149 de5b678ca4dcdfa83e322491d478d66df56c1986 940185b2f6f343251c2b83bd96e599398cea51ec Generating revisions with ./adhoc-revtuple-generator git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git#921b2fed6a79439ef1609ef4af0ada5cccb3555c-3879c163e8681939b1d93139521aee983623884f git://xenbits.xen.org/osstest/linux-firmware.git#c530a75c1e6a472b0eb9558310b518f0dfcd8860-c530a75c1e6a472b0eb9558310b518f0dfcd8860 git://xenbits.xen.org/qemu-xen-traditional.git#9c0eed618f37dd5b4a57c8b3fbc48ef8913e3149-d0d8ad39ecb51cd7497cd524484fe09f50876798 git://xenbits.xen.org/qemu-xen.git#de5b678ca4dcdfa83e322491d478d66df56c1986-de5b678ca4dcdfa83e322491d478d66df56c1986 git://xenbits.xen.org/xen.git#940185b2f6f343251c2b83bd96e599398cea51ec-82855aba5bf91e50c81526167c11d4aeaf665e66 Loaded 3004 nodes in revision graph Searching for test results: 128096 [dst_host=chardonnay1,src_host=chardonnay0] 128177 pass 921b2fed6a79439ef1609ef4af0ada5cccb3555c c530a75c1e6a472b0eb9558310b518f0dfcd8860 9c0eed618f37dd5b4a57c8b3fbc48ef8913e3149 de5b678ca4dcdfa83e322491d478d66df56c1986 940185b2f6f343251c2b83bd96e599398cea51ec 128232 [dst_host=albana1,src_host=albana0] 128258
[Xen-devel] [linux-3.18 bisection] complete test-amd64-amd64-pair
branch xen-unstable xenbranch xen-unstable job test-amd64-amd64-pair testid xen-boot/src_host Tree: linux git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git Tree: linuxfirmware git://xenbits.xen.org/osstest/linux-firmware.git Tree: qemu git://xenbits.xen.org/qemu-xen-traditional.git Tree: qemuu git://xenbits.xen.org/qemu-xen.git Tree: xen git://xenbits.xen.org/xen.git *** Found and reproduced problem changeset *** Bug is in tree: linux git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git Bug introduced: 7b8052e19304865477e03a0047062d977309a22f Bug not present: d255d18a34a8d53ccc4a019dc07e17b6e8cf6bd1 Last fail repro: http://logs.test-lab.xenproject.org/osstest/logs/131278/ commit 7b8052e19304865477e03a0047062d977309a22f Author: Jan Beulich Date: Mon Oct 19 04:23:29 2015 -0600 igb: fix NULL derefs due to skipped SR-IOV enabling [ Upstream commit be06998f96ecb93938ad2cce46c4289bf7cf45bc ] The combined effect of commits 6423fc3416 ("igb: do not re-init SR-IOV during probe") and ceee3450b3 ("igb: make sure SR-IOV init uses the right number of queues") causes VFs no longer getting set up, leading to NULL pointer dereferences due to the adapter's ->vf_data being NULL while ->vfs_allocated_count is non-zero. The first commit not only neglected the side effect of igb_sriov_reinit() that the second commit tried to account for, but also that of setting IGB_FLAG_HAS_MSIX, without which igb_enable_sriov() is effectively a no-op. Calling igb_{,re}set_interrupt_capability() as done here seems to address this, but I'm not sure whether this is better than sinply reverting the other two commits. Signed-off-by: Jan Beulich Tested-by: Aaron Brown Signed-off-by: Jeff Kirsher Signed-off-by: Sasha Levin For bisection revision-tuple graph see: http://logs.test-lab.xenproject.org/osstest/results/bisect/linux-3.18/test-amd64-amd64-pair.xen-boot--src_host.html Revision IDs in each graph node refer, respectively, to the Trees above. Running cs-bisection-step --graph-out=/home/logs/results/bisect/linux-3.18/test-amd64-amd64-pair.xen-boot--src_host --summary-out=tmp/131278.bisection-summary --basis-template=128858 --blessings=real,real-bisect linux-3.18 test-amd64-amd64-pair xen-boot/src_host Searching for failure / basis pass: 131231 fail [dst_host=debina1,src_host=debina0] / 130367 [dst_host=albana0,src_host=albana1] 130203 [dst_host=albana0,src_host=albana1] 130067 [dst_host=pinot1,src_host=pinot0] 129845 [dst_host=albana1,src_host=albana0] 129760 [dst_host=godello1,src_host=godello0] 128858 [dst_host=elbling0,src_host=elbling1] 128841 [dst_host=elbling0,src_host=elbling1] 128807 [dst_host=godello1,src_host=godello0] 128691 [dst_host=fiano1,src_host=fiano0] 128258 [dst_host=fiano1,src_host=fiano0] 128232 [dst_host=albana1,src_host=albana0] 128177 ok. Failure / basis pass flights: 131231 / 128177 (tree with no url: minios) (tree with no url: ovmf) (tree with no url: seabios) Tree: linux git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git Tree: linuxfirmware git://xenbits.xen.org/osstest/linux-firmware.git Tree: qemu git://xenbits.xen.org/qemu-xen-traditional.git Tree: qemuu git://xenbits.xen.org/qemu-xen.git Tree: xen git://xenbits.xen.org/xen.git Latest 3879c163e8681939b1d93139521aee983623884f c530a75c1e6a472b0eb9558310b518f0dfcd8860 d0d8ad39ecb51cd7497cd524484fe09f50876798 de5b678ca4dcdfa83e322491d478d66df56c1986 82855aba5bf91e50c81526167c11d4aeaf665e66 Basis pass 921b2fed6a79439ef1609ef4af0ada5cccb3555c c530a75c1e6a472b0eb9558310b518f0dfcd8860 9c0eed618f37dd5b4a57c8b3fbc48ef8913e3149 de5b678ca4dcdfa83e322491d478d66df56c1986 940185b2f6f343251c2b83bd96e599398cea51ec Generating revisions with ./adhoc-revtuple-generator git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git#921b2fed6a79439ef1609ef4af0ada5cccb3555c-3879c163e8681939b1d93139521aee983623884f git://xenbits.xen.org/osstest/linux-firmware.git#c530a75c1e6a472b0eb9558310b518f0dfcd8860-c530a75c1e6a472b0eb9558310b518f0dfcd8860 git://xenbits.xen.org/qemu-xen-traditional.git#9c0eed618f37dd5b4a57c8b3fbc48ef8913e3149-d0d8ad39ecb51cd7497cd524484fe09f50876798 git://xenbits.xen.org/qemu-xen.git#de5b678ca4dcdfa83e322491d478d66df56c1986-de5b678ca4dcdfa83e322491d478d66df56c1986 git://xenbits.xen.org/xen.git#940185b2f6f343251c2b83bd96e599398cea51ec-82855aba5bf91e50c81526167c11d4aeaf665e66 Loaded 3004 nodes in revision graph Searching for test results: 128096 [dst_host=chardonnay1,src_host=chardonnay0] 128177 pass 921b2fed6a79439ef1609ef4af0ada5cccb3555c c530a75c1e6a472b0eb9558310b518f0dfcd8860 9c0eed618f37dd5b4a57c8b3fbc48ef8913e3149 de5b678ca4dcdfa83e322491d478d66df56c1986 940185b2f6f343251c2b83bd96e599398cea51ec 128232 [dst_host=albana1,src_host=albana0] 128258
[Xen-devel] [linux-3.18 test] 131231: regressions - FAIL
flight 131231 linux-3.18 real [real] http://logs.test-lab.xenproject.org/osstest/logs/131231/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-i386-examine 8 reboot fail REGR. vs. 128858 test-amd64-i386-xl-qemuu-debianhvm-amd64-shadow 7 xen-boot fail REGR. vs. 128858 test-amd64-i386-xl-raw7 xen-boot fail REGR. vs. 128858 test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsm 7 xen-boot fail REGR. vs. 128858 test-amd64-amd64-xl-qemuu-dmrestrict-amd64-dmrestrict 7 xen-boot fail REGR. vs. 128858 test-amd64-amd64-xl-multivcpu 7 xen-bootfail REGR. vs. 128858 test-amd64-i386-xl7 xen-boot fail REGR. vs. 128858 test-amd64-i386-xl-qemuu-debianhvm-amd64 7 xen-boot fail REGR. vs. 128858 test-amd64-i386-pair 10 xen-boot/src_hostfail REGR. vs. 128858 test-amd64-i386-pair 11 xen-boot/dst_hostfail REGR. vs. 128858 test-amd64-amd64-xl-pvshim7 xen-boot fail REGR. vs. 128858 test-amd64-amd64-xl-xsm 7 xen-boot fail REGR. vs. 128858 test-amd64-i386-xl-qemuu-ovmf-amd64 7 xen-boot fail REGR. vs. 128858 test-amd64-i386-libvirt-pair 10 xen-boot/src_hostfail REGR. vs. 128858 test-amd64-i386-libvirt-pair 11 xen-boot/dst_hostfail REGR. vs. 128858 test-amd64-amd64-rumprun-amd64 7 xen-boot fail REGR. vs. 128858 test-amd64-amd64-libvirt-pair 10 xen-boot/src_host fail REGR. vs. 128858 test-amd64-amd64-libvirt-pair 11 xen-boot/dst_host fail REGR. vs. 128858 test-amd64-amd64-xl-qemuu-ovmf-amd64 7 xen-boot fail REGR. vs. 128858 test-amd64-amd64-pair10 xen-boot/src_hostfail REGR. vs. 128858 test-amd64-amd64-pair11 xen-boot/dst_hostfail REGR. vs. 128858 Tests which are failing intermittently (not blocking): test-armhf-armhf-xl-multivcpu 17 guest-start.2 fail in 131095 pass in 131231 test-amd64-i386-qemut-rhel6hvm-amd 12 guest-start/redhat.repeat fail in 131192 pass in 131231 test-amd64-i386-rumprun-i386 7 xen-boot fail pass in 131095 test-amd64-i386-freebsd10-amd64 14 guest-saverestore fail pass in 131192 Tests which did not succeed, but are not blocking: test-armhf-armhf-xl-vhd 10 debian-di-install fail in 131095 like 128841 test-armhf-armhf-libvirt-raw 10 debian-di-install fail in 131095 like 128841 test-amd64-amd64-examine 4 memdisk-try-append fail in 131192 like 128807 test-armhf-armhf-libvirt 14 saverestore-support-checkfail like 128858 test-amd64-amd64-xl-qemut-win7-amd64 17 guest-stopfail like 128858 test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stopfail like 128858 test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop fail like 128858 test-amd64-i386-xl-qemut-win7-amd64 17 guest-stop fail like 128858 test-armhf-armhf-libvirt-raw 13 saverestore-support-checkfail like 128858 test-amd64-i386-xl-qemuu-dmrestrict-amd64-dmrestrict 10 debian-hvm-install fail never pass test-amd64-amd64-xl-pvhv2-intel 12 guest-start fail never pass test-amd64-amd64-xl-pvhv2-amd 12 guest-start fail never pass test-amd64-i386-xl-pvshim12 guest-start fail never pass test-amd64-amd64-libvirt 13 migrate-support-checkfail never pass test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail never pass test-amd64-i386-libvirt 13 migrate-support-checkfail never pass test-amd64-i386-libvirt-xsm 13 migrate-support-checkfail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2 fail never pass test-armhf-armhf-xl-arndale 13 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 14 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-vhd 12 migrate-support-checkfail never pass test-armhf-armhf-libvirt 13 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 13 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 14 saverestore-support-checkfail never pass test-armhf-armhf-xl 13 migrate-support-checkfail never pass test-armhf-armhf-xl 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-cubietruck 13 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 13 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-multivcpu 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit1 13 migrate-support-check
Re: [Xen-devel] [PATCH v9 0/7] KVM: x86: Allow Qemu/KVM to use PVH entry point
On Tue, Dec 11, 2018 at 11:29:21AM -0800, Maran Wilson wrote: > Is your question about what options you need to provide to Qemu? Or is your > question about the SW implementation choices? > > Assuming the former... Yeah, that's what I wanted to know. But looking at it, I'm booting bzImage here just as quickly and as flexible so I don't see the advantage of this new method for my use case here of booting kernels in qemu. But maybe there's a good use case where firmware is slow and one doesn't really wanna noodle through it or when one does start a gazillion VMs per second or whatever... Thx. -- Regards/Gruss, Boris. Good mailing practices for 400: avoid top-posting and trim the reply. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [xen-unstable-smoke test] 131274: tolerable all pass - PUSHED
flight 131274 xen-unstable-smoke real [real] http://logs.test-lab.xenproject.org/osstest/logs/131274/ Failures :-/ but no regressions. Tests which did not succeed, but are not blocking: test-amd64-amd64-libvirt 13 migrate-support-checkfail never pass test-armhf-armhf-xl 13 migrate-support-checkfail never pass test-armhf-armhf-xl 14 saverestore-support-checkfail never pass version targeted for testing: xen 9c35572f359cd6f71aa20b0991c74e032b8721d2 baseline version: xen a9c904c5a827144eb722cfb46634c60b739e19eb Last test of basis 131246 2018-12-11 21:00:46 Z0 days Testing same since 131274 2018-12-12 17:00:36 Z0 days1 attempts People who touched revisions under test: Julien Grall Razvan Cojocaru Stefano Stabellini jobs: build-amd64 pass build-armhf pass build-amd64-libvirt pass test-armhf-armhf-xl pass test-amd64-amd64-xl-qemuu-debianhvm-i386 pass test-amd64-amd64-libvirt pass sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Pushing revision : To xenbits.xen.org:/home/xen/git/xen.git a9c904c5a8..9c35572f35 9c35572f359cd6f71aa20b0991c74e032b8721d2 -> smoke ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 2/2] arm/irq: skip action avalability check for non-debug build
On Wed, 12 Dec 2018, Julien Grall wrote: > Title: s/avalability/availability/ > > On 12/12/2018 16:55, Andrii Anisov wrote: > > From: Andrii Anisov > > > > An IRQ with _IRQ_GUEST flag set always has an action. > > An IRQ with _IRQ_DISABLED flag cleared always have an action. > > s/have/has/ > > Those conditions are not sufficient to ensure desc->action is not NULL. You > also need to take the spinlock. > > While looking at the code, I noticed an interesting race with the release > code. Guest IRQ are released using the function gic_remove_irq_to_guest. The > sequence is roughly: > > 1) spin_lock(desc->lock); > 2) writel(desc->irq, ICENABLER); > 3) set_bit(_IRQ_DISABLED, >status); > 4) clear_bit(_IRQ_GUES, >status); > 5) desc->handler = _irq_type; > 6) spin_unlock(desc->lock); > > Even if 2) will disable the interrupt in the hardware, the interrupt may have > been received earlier on another CPU and waiting on the lock. As soon as the > lock is taken, the code will notice the irq disabled (thanks to 3)) and will > then end the interrupt. The callbak end for no_irq_type is a NOP, therefore > the interrupt will stay active and the priority will not be dropped. > > Because of that, the CPU will never be able to receive interrupt for guest > anymore. AFAICT, this can only happen if an interrupt is received while > destroying the assigned domain. > > I think 5) should be replaced with > > desc->handler = gic_hw_ops->gic_host_irq_type; > > Or we potentially need to update no_irq_type and EOI "spurious interrupt". > > I am not entirely sure which way is the best to address the race. Any > opinions? I think that changing the .end function of no_irq_type to be the same as the end function of the host_irq_type controller is the safest option: yes no_irq_type means no irqs but if we receive an interrupt we should still EOI it no matter what. > > Those flags checks cover all accesses to desc->action in do_IRQ, > so we can > > skip desc->action check. > > "in non-debug build". > > > Still keep it in place for debug build. > > "Keep in place for debug build to help diagnostics potential > misconfiguration". > > > > > Signed-off-by: Andrii Anisov > > Reviewed-by: Julien Grall > > Please don't add a reviewed-by tag until it was explicitly written by the > reviewer. > > > --- > > xen/arch/arm/irq.c | 2 ++ > > 1 file changed, 2 insertions(+) > > > > diff --git a/xen/arch/arm/irq.c b/xen/arch/arm/irq.c > > index d6a0273..4a02cc1 100644 > > --- a/xen/arch/arm/irq.c > > +++ b/xen/arch/arm/irq.c > > @@ -209,12 +209,14 @@ void do_IRQ(struct cpu_user_regs *regs, unsigned int > > irq, int is_fiq) > > spin_lock(>lock); > > desc->handler->ack(desc); > > +#ifndef NDEBUG > > if ( !desc->action ) > > { > > printk("Unknown %s %#3.3x\n", > > is_fiq ? "FIQ" : "IRQ", irq); > > goto out; > > } > > +#endif > > if ( test_bit(_IRQ_GUEST, >status) ) > > { > > > > Cheers, > > -- > Julien Grall > ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [linux-linus test] 131224: regressions - FAIL
flight 131224 linux-linus real [real] http://logs.test-lab.xenproject.org/osstest/logs/131224/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-amd64-pygrub 7 xen-boot fail REGR. vs. 125898 test-amd64-amd64-xl-qemut-debianhvm-amd64-xsm 7 xen-boot fail REGR. vs. 125898 test-amd64-amd64-xl-qemuu-win7-amd64 7 xen-boot fail REGR. vs. 125898 test-amd64-amd64-libvirt-xsm 7 xen-boot fail REGR. vs. 125898 test-amd64-amd64-xl-pvhv2-intel 7 xen-boot fail REGR. vs. 125898 test-amd64-amd64-xl-qcow2 7 xen-boot fail REGR. vs. 125898 test-amd64-amd64-rumprun-amd64 7 xen-boot fail REGR. vs. 125898 test-amd64-amd64-libvirt-vhd 7 xen-boot fail REGR. vs. 125898 test-amd64-amd64-libvirt-pair 10 xen-boot/src_host fail REGR. vs. 125898 test-amd64-amd64-libvirt-pair 11 xen-boot/dst_host fail REGR. vs. 125898 test-amd64-amd64-xl-multivcpu 7 xen-bootfail REGR. vs. 125898 test-amd64-amd64-pair10 xen-boot/src_hostfail REGR. vs. 125898 test-amd64-amd64-pair11 xen-boot/dst_hostfail REGR. vs. 125898 test-amd64-i386-examine 8 reboot fail REGR. vs. 125898 test-amd64-i386-xl-xsm7 xen-boot fail REGR. vs. 125898 test-amd64-i386-qemut-rhel6hvm-intel 7 xen-boot fail REGR. vs. 125898 test-amd64-i386-xl7 xen-boot fail REGR. vs. 125898 test-amd64-i386-xl-qemut-win10-i386 7 xen-boot fail REGR. vs. 125898 test-amd64-i386-xl-shadow 7 xen-boot fail REGR. vs. 125898 test-amd64-i386-rumprun-i386 7 xen-boot fail REGR. vs. 125898 test-amd64-i386-libvirt 7 xen-boot fail REGR. vs. 125898 test-amd64-i386-xl-qemut-debianhvm-amd64 7 xen-boot fail REGR. vs. 125898 test-amd64-i386-xl-qemut-debianhvm-amd64-xsm 7 xen-boot fail REGR. vs. 125898 test-amd64-i386-pair 10 xen-boot/src_hostfail REGR. vs. 125898 test-amd64-i386-pair 11 xen-boot/dst_hostfail REGR. vs. 125898 test-amd64-i386-libvirt-xsm 7 xen-boot fail REGR. vs. 125898 test-amd64-i386-libvirt-pair 10 xen-boot/src_hostfail REGR. vs. 125898 test-amd64-i386-libvirt-pair 11 xen-boot/dst_hostfail REGR. vs. 125898 test-amd64-amd64-examine 8 reboot fail REGR. vs. 125898 test-amd64-amd64-qemuu-nested-intel 17 debian-hvm-install/l1/l2 fail REGR. vs. 125898 Regressions which are regarded as allowable (not blocking): test-amd64-amd64-xl-rtds 7 xen-boot fail REGR. vs. 125898 test-armhf-armhf-xl-rtds16 guest-start/debian.repeat fail REGR. vs. 125898 Tests which did not succeed, but are not blocking: test-amd64-amd64-xl-credit1 7 xen-bootfail baseline untested test-amd64-amd64-xl-qemut-win7-amd64 17 guest-stopfail like 125898 test-amd64-i386-xl-qemut-win7-amd64 17 guest-stop fail like 125898 test-armhf-armhf-libvirt 14 saverestore-support-checkfail like 125898 test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop fail like 125898 test-armhf-armhf-libvirt-raw 13 saverestore-support-checkfail like 125898 test-amd64-amd64-xl-qemut-ws16-amd64 17 guest-stopfail like 125898 test-amd64-amd64-xl-qemuu-ws16-amd64 17 guest-stopfail like 125898 test-amd64-i386-xl-pvshim12 guest-start fail never pass test-amd64-amd64-libvirt 13 migrate-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-armhf-armhf-xl-arndale 13 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 14 saverestore-support-checkfail never pass test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2 fail never pass test-armhf-armhf-xl-credit1 13 migrate-support-checkfail never pass test-armhf-armhf-xl-credit1 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-multivcpu 13 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 13 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-cubietruck 13 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-cubietruck 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-rtds 13 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 14 saverestore-support-checkfail never pass test-armhf-armhf-libvirt 13 migrate-support-checkfail never pass
Re: [Xen-devel] [PATCH v2 05/10] libxl: Do root checks once in libxl__domain_get_device_model_uid
> On Dec 12, 2018, at 3:45 PM, Ian Jackson wrote: > > George Dunlap writes ("[PATCH v2 05/10] libxl: Do root checks once in > libxl__domain_get_device_model_uid"): >> At the moment, we check for equivalence to literal "root" before >> deciding whether to add the `runas` command-line option to QEMU. This >> is unsatisfactory for several reasons. > ... >> v2: >> - Refactor to use `out` rather than multiple labels >> - Only check for root once >> - Use 'out' rather than direct returns for errors (only use direct returns >> for early `succeed-without-setting-runas` paths) >> - Use `rc` rather than `ret` to more closely align with CODING_STYLE >> - Fill out comments about the cases we're handling >> - Return ERROR_DEVICE_EXISTS rather than ERROR_FAIL if there's another >> username that maps to our calculated uid >> - Report an error if the specified device_model_user doesn't exist > > Thanks. This is all much better now. FWIW I agree. :-) > Or you could treat the !user_base as an error block and write this: > >> +if (!user_base) { >> +LOGD(ERROR, guest_domid, "Couldn't find device_model_user %s", >> + user); >> +rc = ERROR_INVAL; > +goto out; > +} > +intended_uid = user_base->pw_uid; > +rc = 0; > +goto out; >> +} This looks good. > >> +/* >> + * If dm_restrict isn't set, and we don't have a specified user, don't >> + * bother setting a `-runas` parameter. >> + */ >> if (!libxl_defbool_val(b_info->dm_restrict)) { >> LOGD(DEBUG, guest_domid, >> "dm_restrict disabled, starting QEMU as root"); >> return 0; >> } > > Why `return 0' here but `goto out' earlier ? IMO all the success > returns should be the same. Not exactly — we only want to do the root check if we’re running as an alternate user. In this case, neither device_model_user nor dm_restrict is set, so we’re not running as an alternate user, so we don’t want to run the `if(intended_uid == 0)` check on the normal ‘out’ path. I take it you’d prefer to always jump to out, but to have the conditional be, `if (!rc && user)`? > >> if (user_clash) { >> LOGD(ERROR, guest_domid, >> "wanted to use uid %ld (%s + %d) but that is user %s !", >> (long)intended_uid, LIBXL_QEMU_USER_RANGE_BASE, >> guest_domid, user_clash->pw_name); >> -return ERROR_FAIL; >> +rc = ERROR_DEVICE_EXISTS; >> +goto out; > > I appreciate your desire to specify particular error value, but I am > far from convinced that ERROR_DEVICE_EXISTS is appropriate. It would > lead someone to ask which device was in the way. Consider this patch the most concise way of asking the question, “What error should I use in this case?” :-) > > We generally use EINVAL for bad configurations. If you prefer, feel > free to introduce a new error code. 32-bit signed integers are pretty > cheap. The integers may be cheap, but scanning through trying to comprehend them is not. :-) I’ll think about whether to introduce a new one or just use ERROR_INVAL. > >> +if (rc < 0) >> +goto out; >> if (user_base) { >> LOGD(WARN, guest_domid, "Could not find user %s, falling back to %s", >> LIBXL_QEMU_USER_RANGE_BASE, LIBXL_QEMU_USER_SHARED); >> -goto end_search; >> +intended_uid = user_base->pw_uid; >> +goto out; > > Here we have this pattern again with a `goto out' without a preceding > assignment to `rc'. AFAICT the rules implied by your out block are: > > * Every goto out must be preceded by an assignment to rc. > IMO there is no reason this should not immediately precede > the goto out. > > * Additionally, if rc is 0 then the goto out must also be preceded > relatively recently by an assignment to intended_uid. Those are rules that you’re implying, not me. :-) My `goto out` invariant in this patch were: 1. rc may be an error code. In this case, rc is returned. 2. rc may be zero; if rc is zero: 2a. user must be non-NULL, 2b. user must be verified to exist on the system, and 2c. intended_uid must be set to the userid reported in the previous check In order to accept your suggestion above to replace the `return` with a `goto out`, I have to make the invariant as follows: 1. rc may be an error code. In this case, rc is returned. 2. rc may be zero, and user NULL. In this case, rc is returned. 3. rc may be zero, and user non-NULL. In this case: [2b and 2c from above] In this case, we know that rc is 0 because we just checked the value 6 lines earlier. If code is ever added in between such that rc becomes non-zero, *that* code should be calling `goto out` (or thinking carefully about why falling through to this code is OK). I’ll write redundant statements everywhere if you want, but I thought that would count as the kind of code duplication you wanted
[Xen-devel] [PATCH 0/2] gic-vgic optimizations
From: Andrii Anisov Here are few patches from RFC series [1] currently approved to be upstreamed with appropriate changes. Andrii Anisov (2): gic-vgic: Drop an excessive clear_lrs Is a patch #5 [2], with a change: - Keep LR clear for debug build No changes in v2: arm/irq: skip action availability check for non-debug build Is a patch #11 [3], with a change: - Completely remove the check for a non-debug build, but preserve for debug. Changes in v2: - reworded the commit title and message - removed an RB prematurely inserted by mistake [1] https://lists.xenproject.org/archives/html/xen-devel/2018-11/msg03328.html [2] https://lists.xenproject.org/archives/html/xen-devel/2018-11/msg03285.html [3] https://lists.xenproject.org/archives/html/xen-devel/2018-11/msg03291.html xen/arch/arm/gic-vgic.c | 2 ++ xen/arch/arm/irq.c | 2 ++ 2 files changed, 4 insertions(+) -- 2.7.4 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v2 1/2] gic-vgic: Drop an excessive clear_lrs
From: Andrii Anisov This action is excessive because for an invalid LR there is no need to write another invalid value to a register. So we can skip it here, saving a peripheral register write. Keep clearing the LR for the DEBUG build. This would make dumped invalid LRs be zero. That is more obvious than picking state bits from a non-zero value. Signed-off-by: Andrii Anisov Reviewed-by: Julien Grall --- xen/arch/arm/gic-vgic.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/xen/arch/arm/gic-vgic.c b/xen/arch/arm/gic-vgic.c index 990399c..48922f5 100644 --- a/xen/arch/arm/gic-vgic.c +++ b/xen/arch/arm/gic-vgic.c @@ -216,7 +216,9 @@ static void gic_update_one_lr(struct vcpu *v, int i) } else { +#ifndef NDEBUG gic_hw_ops->clear_lr(i); +#endif clear_bit(i, _cpu(lr_mask)); if ( p->desc != NULL ) -- 2.7.4 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v2 2/2] arm/irq: skip action availability check for non-debug build
From: Andrii Anisov Under desc->lock taken: An IRQ with _IRQ_GUEST flag set always has an action. An IRQ with _IRQ_DISABLED flag cleared always has an action. Those flags checks cover all accesses to desc->action in do_IRQ, so we can skip desc->action check in non-debug build. Keep in place for debug build to help diagnostics potential misconfiguration. Signed-off-by: Andrii Anisov --- xen/arch/arm/irq.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/xen/arch/arm/irq.c b/xen/arch/arm/irq.c index d6a0273..4a02cc1 100644 --- a/xen/arch/arm/irq.c +++ b/xen/arch/arm/irq.c @@ -209,12 +209,14 @@ void do_IRQ(struct cpu_user_regs *regs, unsigned int irq, int is_fiq) spin_lock(>lock); desc->handler->ack(desc); +#ifndef NDEBUG if ( !desc->action ) { printk("Unknown %s %#3.3x\n", is_fiq ? "FIQ" : "IRQ", irq); goto out; } +#endif if ( test_bit(_IRQ_GUEST, >status) ) { -- 2.7.4 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 2/2] arm/irq: skip action avalability check for non-debug build
On 12/12/2018 17:59, Andrii Anisov wrote: On 12.12.18 19:49, Julien Grall wrote: Those conditions are not sufficient to ensure desc->action is not NULL. You also need to take the spinlock. Agree. Should I describe it as following? Under desc->lock taken: An IRQ with _IRQ_GUEST flag set always has an action. An IRQ with _IRQ_DISABLED flag cleared always have an action. This looks better. While looking at the code, I noticed an interesting race with the release code. As I understand the race in not directly linked to this patch. Is it correct? That's correct. I actually noticed when checking whether the commit message was matching the behavior. Guest IRQ are released using the function gic_remove_irq_to_guest. The sequence is roughly: 1) spin_lock(desc->lock); 2) writel(desc->irq, ICENABLER); 3) set_bit(_IRQ_DISABLED, >status); 4) clear_bit(_IRQ_GUES, >status); 5) desc->handler = _irq_type; 6) spin_unlock(desc->lock); Even if 2) will disable the interrupt in the hardware, the interrupt may have been received earlier on another CPU and waiting on the lock. As soon as the lock is taken, the code will notice the irq disabled (thanks to 3)) and will then end the interrupt. The callbak end for no_irq_type is a NOP, therefore the interrupt will stay active and the priority will not be dropped. Because of that, the CPU will never be able to receive interrupt for guest anymore. AFAICT, this can only happen if an interrupt is received while destroying the assigned domain. I think 5) should be replaced with desc->handler = gic_hw_ops->gic_host_irq_type; Or we potentially need to update no_irq_type and EOI "spurious interrupt". I am not entirely sure which way is the best to address the race. Any opinions? Let me spend a bit more time to look into that Other wording and grammatical nits will be addressed as suggested. -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Xen optimization
On Wed, 12 Dec 2018, Andrii Anisov wrote: > Hello Stefano, > > On 12.12.18 19:39, Stefano Stabellini wrote: > > Thanks for the good work, Andrii! > > > > The WARM_MAX improvements for vwfi=native with your optimizations are > > impressive. > > I really hope you are not speaking about these numbers: > > > > max=840 warm_max=120 min=120 avg=127 > > Those are TBM baremetal numbers in hyp mode. I know, I was referring to your older results, sorry for the confusion. > Did you try my RFC on your HW? ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Xen optimization
On Wed, 2018-12-12 at 19:32 +0200, Andrii Anisov wrote: > On 12.12.18 19:10, Dario Faggioli wrote: > > I think only bisection could shed some light on this. And it would > > be > > wonderful if you could do that, but I understand that it takes > > time. :- > > / > Well, bisect might help. But I'm really confused why MemTotal may be > reduced. > Yeah, and although difficult to admit/see the reason why, I think this looks like it is coming from something we do in Xen. And since you say you have an old Xen version that works, I really see bisection as the way to go... > > Are you absolutely sure about that? That is, are you "just" > > assuming > > the scheduler won't move stuff, or have you put some debugging or > > printing in place to verify that to be the case? > Being honest, I did not check for exactly this setup. I verified it > for 4.10. > Not sure I'm getting. Are you saying that you somehow verified that on 4.10 vcpus don't move? But on 4.10 you have pinning that works, don't you? Or are you saying you've verified that vcpus don't move, on 4.10, even without doing the pinning? If yes, can I ask how? As for staging, I really can't tell, as indeed there would be no need for them to move, but they actually could, for a number of reasons. So, unless you, like, put printk()-s (if you can) or ASSERTS() when v- >processor changes, I wouldn't take that for granted. :-( Regards, Dario -- <> (Raistlin Majere) - Dario Faggioli, Ph.D, http://about.me/dario.faggioli Software Engineer @ SUSE https://www.suse.com/ signature.asc Description: This is a digitally signed message part ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 2/2] arm/irq: skip action avalability check for non-debug build
On 12.12.18 19:49, Julien Grall wrote: Those conditions are not sufficient to ensure desc->action is not NULL. You also need to take the spinlock. Agree. Should I describe it as following? Under desc->lock taken: An IRQ with _IRQ_GUEST flag set always has an action. An IRQ with _IRQ_DISABLED flag cleared always have an action. While looking at the code, I noticed an interesting race with the release code. As I understand the race in not directly linked to this patch. Is it correct? Guest IRQ are released using the function gic_remove_irq_to_guest. The sequence is roughly: 1) spin_lock(desc->lock); 2) writel(desc->irq, ICENABLER); 3) set_bit(_IRQ_DISABLED, >status); 4) clear_bit(_IRQ_GUES, >status); 5) desc->handler = _irq_type; 6) spin_unlock(desc->lock); Even if 2) will disable the interrupt in the hardware, the interrupt may have been received earlier on another CPU and waiting on the lock. As soon as the lock is taken, the code will notice the irq disabled (thanks to 3)) and will then end the interrupt. The callbak end for no_irq_type is a NOP, therefore the interrupt will stay active and the priority will not be dropped. Because of that, the CPU will never be able to receive interrupt for guest anymore. AFAICT, this can only happen if an interrupt is received while destroying the assigned domain. I think 5) should be replaced with desc->handler = gic_hw_ops->gic_host_irq_type; Or we potentially need to update no_irq_type and EOI "spurious interrupt". I am not entirely sure which way is the best to address the race. Any opinions? Let me spend a bit more time to look into that Other wording and grammatical nits will be addressed as suggested. -- Sincerely, Andrii Anisov. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 2/2] arm/irq: skip action avalability check for non-debug build
On 12.12.18 19:49, Julien Grall wrote: Please don't add a reviewed-by tag until it was explicitly written by the reviewer. My bad, I mixed it with #5. -- Sincerely, Andrii Anisov. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 2/2] arm/irq: skip action avalability check for non-debug build
Title: s/avalability/availability/ On 12/12/2018 16:55, Andrii Anisov wrote: From: Andrii Anisov An IRQ with _IRQ_GUEST flag set always has an action. An IRQ with _IRQ_DISABLED flag cleared always have an action. s/have/has/ Those conditions are not sufficient to ensure desc->action is not NULL. You also need to take the spinlock. While looking at the code, I noticed an interesting race with the release code. Guest IRQ are released using the function gic_remove_irq_to_guest. The sequence is roughly: 1) spin_lock(desc->lock); 2) writel(desc->irq, ICENABLER); 3) set_bit(_IRQ_DISABLED, >status); 4) clear_bit(_IRQ_GUES, >status); 5) desc->handler = _irq_type; 6) spin_unlock(desc->lock); Even if 2) will disable the interrupt in the hardware, the interrupt may have been received earlier on another CPU and waiting on the lock. As soon as the lock is taken, the code will notice the irq disabled (thanks to 3)) and will then end the interrupt. The callbak end for no_irq_type is a NOP, therefore the interrupt will stay active and the priority will not be dropped. Because of that, the CPU will never be able to receive interrupt for guest anymore. AFAICT, this can only happen if an interrupt is received while destroying the assigned domain. I think 5) should be replaced with desc->handler = gic_hw_ops->gic_host_irq_type; Or we potentially need to update no_irq_type and EOI "spurious interrupt". I am not entirely sure which way is the best to address the race. Any opinions? Those flags checks cover all accesses to desc->action in do_IRQ, > so we can skip desc->action check. "in non-debug build". Still keep it in place for debug build. "Keep in place for debug build to help diagnostics potential misconfiguration". Signed-off-by: Andrii Anisov Reviewed-by: Julien Grall Please don't add a reviewed-by tag until it was explicitly written by the reviewer. --- xen/arch/arm/irq.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/xen/arch/arm/irq.c b/xen/arch/arm/irq.c index d6a0273..4a02cc1 100644 --- a/xen/arch/arm/irq.c +++ b/xen/arch/arm/irq.c @@ -209,12 +209,14 @@ void do_IRQ(struct cpu_user_regs *regs, unsigned int irq, int is_fiq) spin_lock(>lock); desc->handler->ack(desc); +#ifndef NDEBUG if ( !desc->action ) { printk("Unknown %s %#3.3x\n", is_fiq ? "FIQ" : "IRQ", irq); goto out; } +#endif if ( test_bit(_IRQ_GUEST, >status) ) { Cheers, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH for-4.12 v2 16/17] xen/arm: Implement Set/Way operations
On Wed, 2018-12-12 at 09:25 -0800, Stefano Stabellini wrote: > On Wed, 12 Dec 2018, Julien Grall wrote: > > > For Dario: basically we have a long running operation to perform, > we > > > thought that the best place for it would be on the path returning > to > > > guest (leave_hypervisor_tail). The operation can interrupt itself > > > checking sotfirq_pending() once in a while to avoid blocking the > pcpu > > > for too long. > > > > > > The question is: is it better to check sotfirq_pending() even > before > > > starting? Or every so often during the operating is good enough? > Does it > > > even matter? > > I am not sure to understand what is your concern here. Checking for > > softirq_pending() often is not an issue. The issue is when we > happen to not > > check it. At the moment, I would prefer to be over cautious until > we figure > > out whether this is a real issue. > > > > If you are concerned about the performance impact, this is only > called when a > > guest is using set/way. > > Actually, I have no concerns, as I think it should make no > difference, > but I just wanted a second opinion. > Yeah, sorry. I saw the email on Monday, but then got distracted. So, in this case, I personally don't think either solution is so much better (or so much worse) of the other one. In general, what's best may vary on a case-by-case basis (e.g., how long have we been non-preemptable already, when we entering the long running operation?). Therefore, if I'd want to be on the safe side, I think I would check before entering the loop (or whatever the long running op is implemented). The performance impact of just one more softirq_pending() check itself should really be negligible (even considering cache effects). Regards, Dario -- <> (Raistlin Majere) - Dario Faggioli, Ph.D, http://about.me/dario.faggioli Software Engineer @ SUSE https://www.suse.com/ signature.asc Description: This is a digitally signed message part ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Xen optimization
Hello Stefano, On 12.12.18 19:39, Stefano Stabellini wrote: Thanks for the good work, Andrii! The WARM_MAX improvements for vwfi=native with your optimizations are impressive. I really hope you are not speaking about these numbers: max=840 warm_max=120 min=120 avg=127 Those are TBM baremetal numbers in hyp mode. Did you try my RFC on your HW? -- Sincerely, Andrii Anisov. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Xen optimization
On Wed, 12 Dec 2018, Andrii Anisov wrote: > On 12.12.18 11:46, Andrii Anisov wrote: > > Digging into that now. > I got it. My u-boot starts TBM in hyp mode. But them both miss setting > HCR_EL2.IMO, so no interrupt exception was taken in hyp. > OK, for my baremetal TBM in hyp, numbers are: > > max=840 warm_max=120 min=120 avg=127 > > I guess, warm_max and min are one tick of the system timer. And it seems to me > that one tick of the system timer is the lower limit of the irq latency by HW > design. Thanks for the good work, Andrii! The WARM_MAX improvements for vwfi=native with your optimizations are impressive. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2 03/10] libxl: Clean up userlookup_helper_getpw* helper
> On Dec 12, 2018, at 3:26 PM, Ian Jackson wrote: > > George Dunlap writes ("[PATCH v2 03/10] libxl: Clean up > userlookup_helper_getpw* helper"): >> Bring conventions more in line with libxl__xs_read_checked(): >> - If found, return 0 and set pointer to non-NULL >> - If not found, return 0 and set pointer to NULL >> - On error, return libxl-style error number. >> >> Update documentation to match. > ... >> #define DEFINE_USERLOOKUP_HELPER(NAME,SPEC_TYPE,STRUCTNAME,SYSCONF) \ >> static int userlookup_helper_##NAME(libxl__gc *gc, \ >> @@ -83,7 +89,7 @@ static int libxl__create_qemu_logfile(libxl__gc *gc, char >> *name) >> struct STRUCTNAME *resultp = NULL; \ > > git diff has excelled itself in choice of heading line, hasn't it ? > >> @@ -142,14 +147,14 @@ static int >> libxl__domain_get_device_model_uid(libxl__gc *gc, >> _pwbuf, _base); >> if (ret < 0) >> return ret; >> -if (ret > 0) { >> +if (user_base) { > > I would prefer to also: > > -if (ret < 0) > +if (ret) > return ret; > > which is more conventional for rc and might reduce the impact of bugs > where the function returned a positive. (Twice.) Looks like thrice, but sure. > > With or without that change, > > Acked-by: Ian Jackson > > Thanks for the cleanup! No problem — it’s nice to have something tractable and satisfying to accomplish when my other active project is a seemingly never-ending ball of twine… -George ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [RFC 3/3] pvh: Boot uncompressed kernel using direct boot ABI
On 12/12/2018 7:28 AM, Stefano Garzarella wrote: On Tue, Dec 11, 2018 at 7:35 PM Maran Wilson wrote: On 12/11/2018 9:11 AM, Stefano Garzarella wrote: Hi Liam, in order to support PVH also with SeaBIOS, I'm going to work on a new option rom (like linuxboot/multiboot) that can be used in this case. That is awesome. Yes, please keep us posted when you have something working. Yes, I'll keep you updated! Just FYI, before switching over to using Qemu+qboot, we had been using a Qemu only solution (but not using an option rom) internally that worked very well using no FW at all. We had Qemu simply parse the ELF file and jump to the PVH entry point if one is found. The only gotcha was that we had to include a pair of patches that were originally written by folks at Intel as part of the clear containers work. Specifically, in order to be able to skip firmware entirely, we had to do 2 additional things: (1) ACPI tables generated by Qemu are usually patched up by FW. Since we were running no FW, we needed to do that patching up of the ACPI tables in Qemu when it was detected that we were going to enter the OS via the PVH entry point. (2) We also needed to add a patch to Qemu to enable a few PM registers -- something typically done by FW. I had a look of qemu-lite, are you referring to this? Yes. More specifically, we were using a modified version of this patch: acpi: patch guest ACPI when loading firmware is skipped But unlike qemu-lite, we were not using a -nofw flag, instead, just choosing PVH vs legacy boot based on which -kernel binary was provided and whether it contained the PVH ELF note. So apply the above patch, you also need to pick up: acpi: expose acpi_checksum() For a while, we had also been using patch: ich9: enable pm registers when there is no firmware But that last patch can be avoided by simply selecting Hardware-Reduced ACPI mode when building the FADT in Qemu, when PVH boot is selected. But you probably wont need those patches at all if you are actually running some version of minimized SeaBIOS. Thanks, -Maran But if SeaBIOS is involved in the solution you are working on, I guess you won't really need those extra patches. Just figured I'd mention it so you have the full picture. Thank you very much to share with me these details! Cheers, Stefano Thanks, -Maran I'll keep you updated on it! Cheers, Stefano On Wed, Dec 5, 2018 at 11:38 PM Liam Merwick wrote: These changes (along with corresponding qboot and Linux kernel changes) enable a guest to be booted using the x86/HVM direct boot ABI. This commit adds a load_elfboot() routine to pass the size and location of the kernel entry point to qboot (which will fill in the start_info struct information needed to to boot the guest). Having loaded the ELF binary, load_linux() will run qboot which continues the boot. The address for the kernel entry point has already been read from an ELF Note in the uncompressed kernel binary earlier in pc_memory_init(). Signed-off-by: George Kennedy Signed-off-by: Liam Merwick --- hw/i386/pc.c | 72 1 file changed, 72 insertions(+) diff --git a/hw/i386/pc.c b/hw/i386/pc.c index 056aa46d99b9..d3012cbd8597 100644 --- a/hw/i386/pc.c +++ b/hw/i386/pc.c @@ -54,6 +54,7 @@ #include "sysemu/qtest.h" #include "kvm_i386.h" #include "hw/xen/xen.h" +#include "hw/xen/start_info.h" #include "ui/qemu-spice.h" #include "exec/memory.h" #include "exec/address-spaces.h" @@ -1098,6 +1099,50 @@ done: return pvh_start_addr != 0; } +static bool load_elfboot(const char *kernel_filename, + int kernel_file_size, + uint8_t *header, + size_t pvh_xen_start_addr, + FWCfgState *fw_cfg) +{ +uint32_t flags = 0; +uint32_t mh_load_addr = 0; +uint32_t elf_kernel_size = 0; +uint64_t elf_entry; +uint64_t elf_low, elf_high; +int kernel_size; + +if (ldl_p(header) != 0x464c457f) { +return false; /* no elfboot */ +} + +bool elf_is64 = header[EI_CLASS] == ELFCLASS64; +flags = elf_is64 ? +((Elf64_Ehdr *)header)->e_flags : ((Elf32_Ehdr *)header)->e_flags; + +if (flags & 0x00010004) { /* LOAD_ELF_HEADER_HAS_ADDR */ +error_report("elfboot unsupported flags = %x", flags); +exit(1); +} + +kernel_size = load_elf(kernel_filename, NULL, NULL, _entry, + _low, _high, 0, I386_ELF_MACHINE, + 0, 0); + +if (kernel_size < 0) { +error_report("Error while loading elf kernel"); +exit(1); +} +mh_load_addr = elf_low; +elf_kernel_size = elf_high - elf_low; + +fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ENTRY, pvh_xen_start_addr); +fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ADDR, mh_load_addr); +fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_SIZE, elf_kernel_size); + +return true; +} + static void
Re: [Xen-devel] Xen optimization
Hello Dario, On 12.12.18 19:10, Dario Faggioli wrote: Ah, yes... I've seen the thread. I haven't commented, as it is really, really weird, and I don't know what to think/say. I think only bisection could shed some light on this. And it would be wonderful if you could do that, but I understand that it takes time. :- / Well, bisect might help. But I'm really confused why MemTotal may be reduced. Are you absolutely sure about that? That is, are you "just" assuming the scheduler won't move stuff, or have you put some debugging or printing in place to verify that to be the case?Being honest, I did not check for exactly this setup. I verified it for 4.10. I'm asking because, yet, in theory that is what one would expect. But, as I think you know very well, although in theory there is no difference between theory and practice, in practice, there is. :-) I know it very well :) -- Sincerely, Andrii Anisov. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Xen optimization
On Wed, 2018-12-12 at 11:39 +0200, Andrii Anisov wrote: > Hello Dario, > Hi, > On 11.12.18 18:56, Dario Faggioli wrote: > > Also, what about Xen numbers, sched=null. > Didn't check, will put on the list. > :-) > > I don't expect much improvement, considering pinning is in-place > > already. > Actually, I faced a strange issue with explicit pinning of Dom0. > Didn't sort out the cause yet. And Julien says it is not reproducible > on his desk. > Ah, yes... I've seen the thread. I haven't commented, as it is really, really weird, and I don't know what to think/say. I think only bisection could shed some light on this. And it would be wonderful if you could do that, but I understand that it takes time. :- / > But yes, with VCPU number less than PCPUs - there is no migration of > Dom0 VCPUs. > Are you absolutely sure about that? That is, are you "just" assuming the scheduler won't move stuff, or have you put some debugging or printing in place to verify that to be the case? I'm asking because, yet, in theory that is what one would expect. But, as I think you know very well, although in theory there is no difference between theory and practice, in practice, there is. :-) Regards, Dario > [1]https://lists.xenproject.org/archives/html/xen-devel/2018-12/msg00435.html -- <> (Raistlin Majere) - Dario Faggioli, Ph.D, http://about.me/dario.faggioli Software Engineer @ SUSE https://www.suse.com/ signature.asc Description: This is a digitally signed message part ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2 1/2] x86/dom0: rename paging function
On Wed, Dec 12, 2018 at 09:15:09AM -0700, Jan Beulich wrote: > >>> On 12.12.18 at 16:56, wrote: > > On Wed, Dec 12, 2018 at 03:32:53AM -0700, Jan Beulich wrote: > >> >>> On 12.12.18 at 11:04, wrote: > >> > You mentioned there's some code (for PV?) to calculate the size of the > >> > page tables but I'm having trouble finding it (mainly because I'm not > >> > that familiar with PV), could you point me to it? > >> > >> In dom0_construct_pv() you'll find a loop starting with > >> "for ( nr_pt_pages = 2; ; nr_pt_pages++ )". It's not the neatest, > >> but at least we've never had reports of failure. > > > > That seems quite complicated, what about using the formula below: > > > > /* > > * Approximate the memory required for the HAP/IOMMU page tables by > > * pessimistically assuming every guest page will use a p2m page table > > * entry. > > */ > > return DIV_ROUND_UP(( > > /* Account for one entry in the L1 per page. */ > > nr_pages + > > /* Account for one entry in the L2 per 512 pages. */ > > DIV_ROUND_UP(nr_pages, 512) + > > /* Account for one entry in the L3 per 512^2 pages. */ > > DIV_ROUND_UP(nr_pages, 512 * 512) + > > /* Account for one entry in the L4 per 512^3 pages. */ > > DIV_ROUND_UP(nr_pages, 512 * 512 * 512) + > > ) * 8, PAGE_SIZE << PAGE_ORDER_4K); > > > > That takes into account higher level page table structures. > > That's a fair approximation without 2M and 1G pages available. I'm > unconvinced we want to over-estimate this heavily in the more > common case of large page mappings being available. Otoh this > provides enough resources to later also deal with shattering of > large pages. > > The MMIO side of things of course still remains unclear. Right, for the MMIO and the handling of grant and foreign mappings it's not clear how we want to proceed. Maybe account for all host RAM (total_pages) plus MMIO BARs? > What I don't understand in any case though is > "PAGE_SIZE << PAGE_ORDER_4K". This is x86 code - why not > just PAGE_SIZE? Oh, I've done it like that because this is related to p2m code, which uses this way to get the page size. IIRC you told me to use this for things like pvh_setup_e820. I don't mind switching to just PAGE_SIZE. Thanks, Roger. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH 0/2] gic-vgic optimizations
From: Andrii Anisov Here are few patches from RFC series [1] currently approved to be upstreamed with appropriate changes. Andrii Anisov (2): gic-vgic: Drop an excessive clear_lrs Is a patch #5 [2], with a change: - Keep LR clear for debug build arm/irq: skip action availability check for non-debug build Is a patch #11 [3], with a change: - Completely remove the check for a non-debug build, but preserve for debug. [1] https://lists.xenproject.org/archives/html/xen-devel/2018-11/msg03328.html [2] https://lists.xenproject.org/archives/html/xen-devel/2018-11/msg03285.html [3] https://lists.xenproject.org/archives/html/xen-devel/2018-11/msg03291.html xen/arch/arm/gic-vgic.c | 2 ++ xen/arch/arm/irq.c | 2 ++ 2 files changed, 4 insertions(+) -- 2.7.4 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH 1/2] gic-vgic: Drop an excessive clear_lrs
From: Andrii Anisov This action is excessive because for an invalid LR there is no need to write another invalid value to a register. So we can skip it here, saving a peripheral register write. Keep clearing the LR for the DEBUG build. This would make dumped invalid LRs be zero. That is more obvious than picking state bits from a non-zero value. Signed-off-by: Andrii Anisov Reviewed-by: Julien Grall --- xen/arch/arm/gic-vgic.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/xen/arch/arm/gic-vgic.c b/xen/arch/arm/gic-vgic.c index 990399c..48922f5 100644 --- a/xen/arch/arm/gic-vgic.c +++ b/xen/arch/arm/gic-vgic.c @@ -216,7 +216,9 @@ static void gic_update_one_lr(struct vcpu *v, int i) } else { +#ifndef NDEBUG gic_hw_ops->clear_lr(i); +#endif clear_bit(i, _cpu(lr_mask)); if ( p->desc != NULL ) -- 2.7.4 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH 2/2] arm/irq: skip action avalability check for non-debug build
From: Andrii Anisov An IRQ with _IRQ_GUEST flag set always has an action. An IRQ with _IRQ_DISABLED flag cleared always have an action. Those flags checks cover all accesses to desc->action in do_IRQ, so we can skip desc->action check. Still keep it in place for debug build. Signed-off-by: Andrii Anisov Reviewed-by: Julien Grall --- xen/arch/arm/irq.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/xen/arch/arm/irq.c b/xen/arch/arm/irq.c index d6a0273..4a02cc1 100644 --- a/xen/arch/arm/irq.c +++ b/xen/arch/arm/irq.c @@ -209,12 +209,14 @@ void do_IRQ(struct cpu_user_regs *regs, unsigned int irq, int is_fiq) spin_lock(>lock); desc->handler->ack(desc); +#ifndef NDEBUG if ( !desc->action ) { printk("Unknown %s %#3.3x\n", is_fiq ? "FIQ" : "IRQ", irq); goto out; } +#endif if ( test_bit(_IRQ_GUEST, >status) ) { -- 2.7.4 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 13/25] argo: implement the register op
On Fri, Nov 30, 2018 at 05:32:52PM -0800, Christopher Clark wrote: > Used by a domain to register a region of memory for receiving messages from > either a specified other domain, or, if specifying a wildcard, any domain. > > This operation creates a mapping within Xen's private address space that > will remain resident for the lifetime of the ring. In subsequent commits, the > hypervisor will use this mapping to copy data from a sending domain into this > registered ring, making it accessible to the domain that registered the ring > to > receive data. > > In this code, the p2m type of the memory supplied by the guest for the ring > must be p2m_ram_rw, which is a conservative choice made to defer the need to > reason about the other p2m types with this commit. > > argo_pfn_t type is introduced here to create a pfn_t type that is 64-bit on > all architectures, to assist with avoiding the need to add a compat ABI. > > Signed-off-by: Christopher Clark > --- > xen/common/argo.c | 498 > + > xen/include/asm-arm/guest_access.h | 2 + > xen/include/asm-x86/guest_access.h | 2 + > xen/include/public/argo.h | 64 + > 4 files changed, 566 insertions(+) > > diff --git a/xen/common/argo.c b/xen/common/argo.c > index 2a95e09..f4e82cf 100644 > --- a/xen/common/argo.c > +++ b/xen/common/argo.c > @@ -25,6 +25,7 @@ > #include > #include > > +DEFINE_XEN_GUEST_HANDLE(argo_pfn_t); > DEFINE_XEN_GUEST_HANDLE(argo_addr_t); > DEFINE_XEN_GUEST_HANDLE(argo_ring_t); > > @@ -98,6 +99,25 @@ struct argo_domain > }; > > /* > + * Helper functions > + */ > + > +static inline uint16_t > +argo_hash_fn(const struct argo_ring_id *id) No need for the argo_ prefix for static functions, this is already an argo specific file. > +{ > +uint16_t ret; > + > +ret = (uint16_t)(id->addr.port >> 16); > +ret ^= (uint16_t)id->addr.port; > +ret ^= id->addr.domain_id; > +ret ^= id->partner; > + > +ret &= (ARGO_HTABLE_SIZE - 1); I'm having trouble figuring out what this is supposed to do, I think a comment and the expected hash formula will help make sure the code is correct. Also doesn't this need to be documented in the public header? > +return ret; > +} > + > +/* > * locks > */ > > @@ -171,6 +191,74 @@ argo_ring_unmap(struct argo_ring_info *ring_info) > } > } > > +/* caller must have L3 or W(L2) */ > +static int > +argo_ring_map_page(struct argo_ring_info *ring_info, uint32_t i, > + uint8_t **page) > +{ > +if ( i >= ring_info->nmfns ) > +{ > +printk(XENLOG_ERR "argo: ring (vm%u:%x vm%d) %p attempted to map > page" You likely want to use gprintk here and below, or XENLOG_G_ERR, so that the guest cannot DoS the console. > + " %u of %u\n", ring_info->id.addr.domain_id, > + ring_info->id.addr.port, ring_info->id.partner, ring_info, > + i, ring_info->nmfns); > +return -EFAULT; > +} > +ASSERT(ring_info->mfns); > +ASSERT(ring_info->mfn_mapping); We are trying to move away from such assertions, and instead use constructions that would prevent issues in non-debug builds. I would write the above asserts as: if ( !ring_info->mfns || !ring_info->mfn_mapping ) { ASSERT_UNREACHABLE(); return -E; } That way non-debug builds won't trigger page faults if there's indeed a way to get here with the wrong state, and debug builds will still hit an assert. > + > +if ( !ring_info->mfn_mapping[i] ) > +{ > +/* > + * TODO: > + * The first page of the ring contains the ring indices, so both > read and > + * write access to the page is required by the hypervisor, but > read-access > + * is not needed for this mapping for the remainder of the ring. > + * Since this mapping will remain resident in Xen's address space for > + * the lifetime of the ring, and following the principle of least > privilege, > + * it could be preferable to: > + * # add a XSM check to determine what policy is wanted here > + * # depending on the XSM query, optionally create this mapping as > + *_write-only_ on platforms that can support it. > + *(eg. Intel EPT/AMD NPT). > + */ > +ring_info->mfn_mapping[i] = > map_domain_page_global(ring_info->mfns[i]); > + > +if ( !ring_info->mfn_mapping[i] ) > +{ > +printk(XENLOG_ERR "argo: ring (vm%u:%x vm%d) %p attempted to map > page" > + " %u of %u\n", ring_info->id.addr.domain_id, > + ring_info->id.addr.port, ring_info->id.partner, ring_info, > + i, ring_info->nmfns); > +return -EFAULT; > +} > +argo_dprintk("mapping page %"PRI_mfn" to %p\n", > + mfn_x(ring_info->mfns[i]), ring_info->mfn_mapping[i]); > +} > + > +if ( page ) > +*page =
Re: [Xen-devel] [PATCH] libxl: Documentation about the domain configuration on disk
On Fri, Dec 07, 2018 at 07:05:38PM +, Ian Jackson wrote: > Wei Liu writes ("Re: [PATCH] libxl: Documentation about the domain > configuration on disk"): > > On Thu, Dec 06, 2018 at 02:57:33PM +, Anthony PERARD wrote: > > > Anyway, that comment block isn't very helpful because it basically says > > > that we can't depriv QEMU, I mean do hotplug with a deprived QEMU. It > > > assumes that we can keep a lock on the userdata while updating the > > > guest, but we can't keep the lock while talking with QEMU (or more > > > generaly: we can't keep the lock while doing any async operation). > > > > > > But there is one useful piece of information: > > > Here we maintain one invariant: every device in xenstore must have > > > an entry in JSON file. > > > (xenstore is describe as "primary reference" just before that sentence). > > > > Yes. That. > ... > > When removing a CD, you only need to update the primary source -- QEMU > > in this case, you can leave libxl-json untouched. It is allowed to have > > stale entries in libxl-json. This is implied in "We may not even need > > this ..." further above. > > If you leave a state entry in libxl-json then you would reject > attempts to insert a new cd, because the libxl-json would tell you one > is already present. > > I think the lock difficulty Anthony identifies is real. We need to > either develop a new set of locking rules, or make all acquisitions of > the libxl-json lock slow. > > > The invariant that we want to maintain is: > > * Nothing may exist in the primary config without > a corresponding entry in libxl-json. > > The rules that implement that are: > > * No-one may edit the libxl-json without holding the lock. > > * You may not cause a thing to be added to the primary config > unless you have held the libxl-json lock continuously > since ensuring that the libxl-json config describes it. > > * Conversely you may not cause a thing to be removed from > the libxl-json unless you have held the libxl-json lock > continueously since ensuring the thing is absent > from the primary config. > > And unfortunately much code acquiring the libxl-json lock expects it > to be fast. > > > How about the following scheme. We split the libxl-json lock into > two. I'm going to call them the fast lock and the slow lock. > > * The fast lock is the existing libxl-json lock. > > * The slow lock is outside the libxl-json lock in the lock > hierarchy. It is also outside the libxl_ctx lock. It is > to be acquired by an ao event callback. > > * No-one may read or edit the libxl-json without holding the fast > lock across their read operation, or their read/modify/write > cycle. > > * However, there are special rules for thing removal/addition, for > things added/removed via qmp. Call these `qmp things'. It is > permissible to add or remove a qmp thing across two separate > acquisitions of the fast lock, one to read the old state of the > thing, and one to read/modify/write to update (only) the new state > of the thing. This is subject to the thing add/removal rule, from > before, which becomes: > > * You may not cause a thing to be added to the primary config > unless you have held the relevant thing lock continuously > since ensuring that the libxl-json config describes it. > > * Conversely you may not cause a thing to be removed from the > libxl-json unless you have held the relevant thing lock > continuously since ensuring the thing is absent from the primary > config. > > * The `relevant thing lock' is the slow lock for qmp things, and the > fast lock for other things. > > * Acquiring the fast lock fails for a destroyed domain, as at > present. I found out that this isn't exactly true. There is a short window where a thread can aquire the fast lock for a domain that is about to be destroyed. But this is fine as long as the thread tries to read the userdata before doing anything else. (The window is between userdata_destroyall and xc_domain_destroy.) > > I think this maintains the invariant. > > I haven't figured out domain destruction. Ideally domain destruction > could happen without taking the slow lock. > > I think this is probably possible if we make sure that qemu is always > killed before the libxl-json is removed. The result is that if a qmp > thing operation races with domain destruction, and in its 1st read > gets an existing libxl-json from before destruction, the qmp thing > will, when it acquires the fast lock again, necessarily the libxl-json > will not exist, and the qmp operation will bomb out. I've check the domain destruction functions in libxl. And that appear to be true from my understanding. Relevent actions taken in libxl, in that order: - kill QEMU - userdata_destroyall - xc_domain_destroy > But I don't exactly know how this relates to domain creation. In > general I haven't
Re: [Xen-devel] [PATCH v11] tolerate jitter in cpu_khz calculation to avoid TSC emulation
>>> On 12.12.18 at 16:20, wrote: > Improve decision when vTSC emulation will be activated for a domU with > tsc_mode=default. The current approach is to compare the cpu_khz value > from two physical hosts. Since this value is not accurate, it can not be > used verbatim to decide if vTSC emulation needs to be enabled. Without > this change each TSC access from domU will be emulated after migration, > which causes a significant perfomance drop for workloads that make use > of rdtsc. > > If a domU uses TSC as clocksoure it also must run NTP in some way to > avoid the potential drift what will most likely happen, independent of > any migration. Which drift? While anyone's well advised to run NTP, a completely isolated set of systems may have no need to, if their interactions don't depend on exactly matching time. > The calculation of the drift is based on the time > returned by remote servers versus how fast the local clock advances. NTP > can handle a drift up to 500PPM. This means the local clocksource can > run up to 500us slower or faster. This calculation is based on the TSC > frequency of the host where the domU was started. Once a domU is > migrated to a host of a different class, like from "2.3GHz" to "2.4GHz", > the TSC frequency changes, but the domU kernel may not recalibrate > itself. That's why we switch to emulated (or hardware scaling) mode in that case. It's anyway not really clear to me what this entire ... > As a result, the drift will be larger and might be outside of > the 500 PPM range. In addition, the kernel may notice the change of > speed in which the TSC advances and could change the clocksource. All > this depends of course on the type of OS that is running in the domU. ... (up to here) paragraph is supposed to tell the reader. > @@ -1885,6 +1888,16 @@ void __init early_time_init(void) > printk("Detected %lu.%03lu MHz processor.\n", > cpu_khz / 1000, cpu_khz % 1000); > > +tmp = 1000 * 1000; > +tmp += VTSC_NTP_PPM_TOLERANCE; > +tmp *= cpu_khz; > +tmp /= 1000 * 1000; > +tmp -= cpu_khz; > +if (tmp >= VTSC_JITTER_RANGE_KHZ) > +tmp -= VTSC_JITTER_RANGE_KHZ; Besides the style issue in the if() - how can this be correct? This clearly introduces a discontinuity (just consider the case where tmp is exactly VTSC_JITTER_RANGE_KHZ before the if()). And I also can't see how it guarantees the resulting value to be below VTSC_JITTER_RANGE_KHZ. Did you perhaps mean to cap the value (i.e. = instead of -= )? > +vtsc_tolerance_khz = (unsigned int)tmp; Stray cast. > +printk("Tolerating vtsc jitter for domUs: %u kHz.\n", > vtsc_tolerance_khz); Please omit the full stop; the printk() in context above shouldn't have one either. > @@ -2223,8 +2237,25 @@ void tsc_set_info(struct domain *d, > * When a guest is created, gtsc_khz is passed in as zero, making > * d->arch.tsc_khz == cpu_khz. Thus no need to check incarnation. > */ > +disable_vtsc = d->arch.tsc_khz == cpu_khz; > + > +if ( tsc_mode == TSC_MODE_DEFAULT && gtsc_khz && vtsc_tolerance_khz ) > +{ > +long khz_diff; > + > +khz_diff = ABS((long)(cpu_khz - gtsc_khz)); I think khz_diff = ABS((long)cpu_khz - gtsc_khz); or some such would be less fragile, if e.g. we decided to make cpu_khz an unsigned int (which is an option, as I don't think we'll see above 4THz systems any time soon). > +disable_vtsc = khz_diff <= vtsc_tolerance_khz; > + > +printk(XENLOG_G_INFO "d%d: host has %lu kHz," > + " domU expects %u kHz," > + " difference of %ld is %s tolerance of %u\n", > + d->domain_id, cpu_khz, gtsc_khz, khz_diff, > + disable_vtsc ? "within" : "outside", > + vtsc_tolerance_khz); > +} > + > if ( tsc_mode == TSC_MODE_DEFAULT && host_tsc_is_safe() && > - (d->arch.tsc_khz == cpu_khz || > + (disable_vtsc || >(is_hvm_domain(d) && > hvm_get_tsc_scaling_ratio(d->arch.tsc_khz))) ) > { In any event I don't follow why all of the sudden this becomes an always-on mode, with not even a boot command line override. Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2 09/10] libxl: Kill QEMU with "reaper" ruid
George Dunlap writes ("[PATCH v2 09/10] libxl: Kill QEMU with "reaper" ruid"): > Using kill(-1) to killing an untrusted dm process with the real uid > equal to the dm_uid isn't guaranteed to succeed: the process in > question may be able to kill the reaper process after the setresuid() > and before the kill(). > > Instead, set the real uid to the QEMU user for domain 0 > (QEMU_USER_RANGE_BASE + 0). The reaper process will still be able to > kill the dm process, but not vice versa. > > This, in turn, requires locking to make sure that only one reaper > process is using that uid at a time; otherwise one reaper process may > kill the other reaper process. > > Create a lockfile in RUNDIR/dm-reaper-lock, and grab the lock before > executing kill. > > In the event that we can't get the lock for some reason, go ahead with > the kill using dm_uid for both real and effective UIDs. This isn't > guaranteed to work, but it's no worse than not trying to kill the > process at all. Thanks. Only minor comments here. > +/* > + * Look up "reaper UID". If present and non-root, returns 0 and sets > + * reaper_uid. If not present, returns 0 and leaves reaper_uid unset; > + * otherwise returns libxl-style error. > + */ `leaves reaper_uid unset' is ambiguous. It might mean `leaves it unchanged' or `leaves it set to a sentinel value'. The implementation seems to be the former, which means that all callers must set it to a sentinel value. I think it would be better if it explicitly set it to (uid_t)-1 in that case. ... I looked further and saw that actually you meant `leaves it unchanged' and the caller is expected to set it to the value that it will want to use if there is no dedicated reaper uid. This is rather too subtle for me. Can you please put the logic that falls back to reaper_uid=dm_dedicated_uid closer to the point of use ? > +if(user_base->pw_uid == 0) { ^ missing space. .oO{ I wonder how that checkpatch project is coming on... } > +static int get_reaper_lock_and_uid(libxl__destroy_devicemodel_state *ddms, > + uid_t *reaper_uid) > +{ ... > +fd = open(lockfile, O_RDWR|O_CREAT, 0666); I think this mode is wrong. If the process is running with a g+w umask the file is g+w which is not desirable. I suggest 0644. > +if (fd < 0) { > +/* All other errno: EBADF, EINVAL, ENOLCK, EWOULDBLOCK */ > +LOGED(ERROR, domid, > + "unexpected error while trying to open lockfile %s, errno=%d", > + lockfile, errno); > +return ERROR_FAIL; ;4~ `All other errno' - other to what ? I think this comment has become detached from its code. > +/* Try to lock the file, retrying on EINTR */ > +for (;;) { > +r = flock(fd, LOCK_EX); > +if (!r) > +break; > +if (errno != EINTR) { > +/* All other errno: EBADF, EINVAL, ENOLCK, EWOULDBLOCK */ > +LOGED(ERROR, domid, > + "unexpected error while trying to lock %s, fd=%d, > errno=%d", > + lockfile, fd, errno); > +return ERROR_FAIL; > +} > +} LGTM. > +/* > + * Get reaper_uid. If we can't find such a uid, return an error. > + * > + * FIXME: This means that domain destruction will fail if > + * device_model_user is set but QEMU_USER_RANGE_BASE doesn't exist. > + */ > +return libxl__get_reaper_uid(gc, reaper_uid); > +} Did you mean to introduce this new FIXME ? I think device_model_user was there before your series (even though it probably won't work very well...) Forgive me if I am missing something. If you did mean to, then you need to explain in the commit message why it is OK. > /* > * Destroy all processes of the given uid by setresuid to the > * specified uid and kill(-1). NB this MUST BE CALLED FROM A SEPARATE > - * PROCESS from the normal libxl process. Returns a libxl-style error > - * code. > + * PROCESS from the normal libxl process, and should exit immediately > + * after return. Returns a libxl-style error code. I think this rule that you must _exit immediately after return is not new so can you move that comment into the patch that introduces the setresuid ? Also it would be marginally better to explicitly say _exit rather than exit. > /* ... > + * NB: Even if we don't have a separate reaper_uid, the parent can > + * know whether we won the race by looking at the status variable; > + * so we don't strictly need to return failure in this case. But > + * if there's a misconfiguration, it's better to alert the > + * administator sooner rather than later; so if we fail to get a > + * reaper uid, report an error even if the kill succeeds. I approve of this reasoning and also of it being explained in a comment. Thanks. > @@ -2853,7 +2952,7 @@ static int > kill_device_model_uid_child(libxl__destroy_devicemodel_state *ddms, > r = kill(-1, 9); > if (r &&
Re: [Xen-devel] [PATCH v2 10/10] dm_depriv: Mark `UID cleanup` as completed
George Dunlap writes ("[PATCH v2 10/10] dm_depriv: Mark `UID cleanup` as completed"): > Signed-off-by: George Dunlap Acked-by: Ian Jackson ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH for-4.12] dt: fix integer check in dt_device_is_compatible
Hi Stefano, On 11/12/2018 18:46, Stefano Stabellini wrote: cplen is unsigned, thus, it can never be < 0. Use a signed integer local variable instead. The current code checks > 0. Looking at the code I don't think it can ever be negative. So can you details what is the problem you are trying to resolve? Cheers, Signed-off-by: Stefano Stabellini --- xen/common/device_tree.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/xen/common/device_tree.c b/xen/common/device_tree.c index 8fc401d..df274cc 100644 --- a/xen/common/device_tree.c +++ b/xen/common/device_tree.c @@ -213,17 +213,20 @@ bool_t dt_device_is_compatible(const struct dt_device_node *device, { const char* cp; u32 cplen, l; +s64 ilen; cp = dt_get_property(device, "compatible", ); if ( cp == NULL ) return 0; -while ( cplen > 0 ) + +ilen = cplen; +while ( ilen > 0 ) { if ( dt_compat_cmp(cp, compat) == 0 ) return 1; l = strlen(cp) + 1; cp += l; -cplen -= l; +ilen -= l; } return 0; -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2 08/10] libxl: Kill QEMU by uid when possible
George Dunlap writes ("[PATCH v2 08/10] libxl: Kill QEMU by uid when possible"): > The privcmd fd that a dm_restrict'ed QEMU has gives it permission to > one specific domain ID. This domain ID will probably eventually be > used again. It is therefore necessary to make absolutely sure that a > rogue QEMU process cannot hang around after its domain has exited. Thanks. Detailed comments mostly on error handling follow... > diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h > index f9e0bf6578..cd3208f4b8 100644 > --- a/tools/libxl/libxl_internal.h > +++ b/tools/libxl/libxl_internal.h > @@ -1135,7 +1135,7 @@ typedef struct { > const char *shim_cmdline; > const char *pv_cmdline; > > -char *dm_runas; > +char *dm_runas, *dm_uid; I think `dm_uid' is misnamed. It is only set if the dm has a *dedicated* uid. If the dm is run as uid 0 or as a shared qemu uid, it is set to NULL. > @@ -3706,6 +3706,8 @@ struct libxl__destroy_devicemodel_state { > uint32_t domid; > libxl__devicemodel_destroy_cb *callback; /* May be called re-entrantly */ > /* private to implementation */ > +libxl__ev_child destroyer; > +int rc; /* Accumulated return value for the destroy operation */ Excellent. > diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c > index 7f9c6a62fe..53fdf8daf7 100644 > --- a/tools/libxl/libxl_dm.c > +++ b/tools/libxl/libxl_dm.c > @@ -129,6 +129,8 @@ static int libxl__domain_get_device_model_uid(libxl__gc > *gc, > int rc; > char *user; > uid_t intended_uid; > +bool kill_by_uid; > + I guess this new blank line is deliberate. I think it is probably a good idea. > @@ -148,8 +150,10 @@ static int libxl__domain_get_device_model_uid(libxl__gc > *gc, > LOGD(ERROR, guest_domid, "Couldn't find device_model_user %s", > user); > rc = ERROR_INVAL; > -} else > +} else { > intended_uid = user_base->pw_uid; > +kill_by_uid = true; > +} > > goto out; > } I think your changes to the out block newly imply that all `goto out' with `rc=0' must also set kill_by_uid. I have not (in this revision) applied this patch and searched for that because I think there will have to be a respin to consistently apply the rules I previously identified. (I mention this here partly so that when I review v3 I know that my former self wanted to double check that.) > @@ -226,6 +234,8 @@ out: > } > > state->dm_runas = user; > +if (kill_by_uid) > +state->dm_uid = GCSPRINTF("%ld", (long)intended_uid); > } More stuff is accumulating in this post-0-check success path. Perhaps this means my suggestion of if (!rc) { ... } if (!rc) { ... } will be most convenient. > +/* > + * If we're starting the dm with a non-root UID, save the UID so > + * that we can reliably kill it and any subprocesses > + */ > +if (state->dm_uid) > +libxl__xs_printf(gc, XBT_NULL, > + GCSPRINTF("%s/image/device-model-uid", dom_path), My comment about the misnamed libxl variable applies to device-model-uid too I think. > +#define PROPAGATE_RC if(!ddms->rc) ddms->rc = rc ^ I like this macro. But (i) it needs proper macro hygiene - either a do{ }while(0) block, or refactoring into an expression (and then putting in parens). And (ii) you missed out a space. (The references to ddms and rc are anaphoric, not macro parameters, so do not need parens.) > +path = DEVICE_MODEL_XS_PATH(gc, LIBXL_TOOLSTACK_DOMID, domid, ""); > +rc = libxl__xs_rm_checked(gc, XBT_NULL, path); > +if (rc) { > +PROPAGATE_RC; > LOGD(ERROR, domid, "xs_rm failed for %s", path); > +} Your new macro makes this a nicely transparent pasttern. > +/* > + * See if we should try to kill by uid > + */ > +path = GCSPRINTF("/local/domain/%d/image/device-model-uid", domid); > +rc = libxl__xs_read_checked(gc, XBT_NULL, path, _uid_str); > + > +/* > + * If there was an error here, accumulate the error and fall back > + * to killing by pid. > + */ > +if (rc) { > +PROPAGATE_RC; > +LOGD(ERROR, domid, "Reading dm UID path failed for %s", path); > +} From the comment for libxl__xs_read_checked: | * On error, *result_out is undefined. Arguably this is a bear trap. Maybe you would like to fix it there rather than by setting dm_uid_str to 0 here. > +reaper_pid = libxl__ev_child_fork(gc, >destroyer, > + kill_device_model_uid_cb); > +if (reaper_pid < 0) { > +rc = ERROR_FAIL; > +PROPAGATE_RC; > +/* > + * Note that if this fails, we still don't kill by pid, to > + * make sure that an untrusted DM has not "maliciously" > + * exited (potentially causing us to kill an unrelated > +
Re: [Xen-devel] [PATCH v2 1/2] x86/dom0: rename paging function
>>> On 12.12.18 at 16:56, wrote: > On Wed, Dec 12, 2018 at 03:32:53AM -0700, Jan Beulich wrote: >> >>> On 12.12.18 at 11:04, wrote: >> > You mentioned there's some code (for PV?) to calculate the size of the >> > page tables but I'm having trouble finding it (mainly because I'm not >> > that familiar with PV), could you point me to it? >> >> In dom0_construct_pv() you'll find a loop starting with >> "for ( nr_pt_pages = 2; ; nr_pt_pages++ )". It's not the neatest, >> but at least we've never had reports of failure. > > That seems quite complicated, what about using the formula below: > > /* > * Approximate the memory required for the HAP/IOMMU page tables by > * pessimistically assuming every guest page will use a p2m page table > * entry. > */ > return DIV_ROUND_UP(( > /* Account for one entry in the L1 per page. */ > nr_pages + > /* Account for one entry in the L2 per 512 pages. */ > DIV_ROUND_UP(nr_pages, 512) + > /* Account for one entry in the L3 per 512^2 pages. */ > DIV_ROUND_UP(nr_pages, 512 * 512) + > /* Account for one entry in the L4 per 512^3 pages. */ > DIV_ROUND_UP(nr_pages, 512 * 512 * 512) + > ) * 8, PAGE_SIZE << PAGE_ORDER_4K); > > That takes into account higher level page table structures. That's a fair approximation without 2M and 1G pages available. I'm unconvinced we want to over-estimate this heavily in the more common case of large page mappings being available. Otoh this provides enough resources to later also deal with shattering of large pages. The MMIO side of things of course still remains unclear. What I don't understand in any case though is "PAGE_SIZE << PAGE_ORDER_4K". This is x86 code - why not just PAGE_SIZE? Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] x86: fix paging_log_dirty_op to work with paging guests
>>> On 12.12.18 at 15:54, wrote: > Fix this by releasing the target paging lock before attempting to > perform the copy of the dirty bitmap, and then forcing a restart of > the whole process in case there have been changes to the dirty bitmap > tables. I'm afraid it's not that simple: The writer side (paging_mark_pfn_dirty()) uses the same lock, and I think the assumption is that while the copying takes place no updates to the bitmap can occur. Then again the subject domain gets paused anyway, so updates probably can't happen (the "probably" of course needs to be eliminated). But at the very least I'd expect a more thorough discussion here as to why dropping the lock intermediately is fine. The preemption mechanism can't be used as sole reference, because that one doesn't drop the lock in the middle of an iteration. > --- a/xen/arch/x86/mm/paging.c > +++ b/xen/arch/x86/mm/paging.c > @@ -408,6 +408,7 @@ static int paging_log_dirty_op(struct domain *d, > unsigned long *l1 = NULL; > int i4, i3, i2; > > + again: > if ( !resuming ) > { Considering that you set "resuming" to 1 before jumping here, why don't you put the label after this if()? I'd even consider pulling the label even further down (perhaps to immediately before the last if() ahead of the main loop), and have the path branching there re-acquire the lock (and drop some of the other state setting that then becomes unnecessary). > @@ -488,6 +490,16 @@ static int paging_log_dirty_op(struct domain *d, > bytes = (unsigned int)((sc->pages - pages + 7) >> 3); > if ( likely(peek) ) > { > +if ( paging_mode_enabled(current->domain) ) > +/* > + * Drop the target p2m lock, or else Xen will panic > + * when trying to acquire the p2m lock of the caller > + * due to invalid lock order. Note that there are no > + * lock ordering issues here, and the panic is due to > + * the fact that the lock level tracking doesn't > record > + * the domain the lock belongs to. > + */ > +paging_unlock(d); This makes it sound as if tracking the domain would help. It doesn't, at least not as long as there is not also some sort of ordering or hierarchy between domains. IOW I'd prefer if the "doesn't" became "can't". Now before we go this route I think we need to consider whether this really is the only place where the two locks get into one another's way. That's because I don't think we want to introduce more of such, well, hackery, and hence we'd otherwise need a better solution. For example the locking model could be adjusted to allow such nesting in the general case: Dom0 and any domain whose ->target matches the subject domain here could be given a slightly different "weight" in the lock order violation check logic. I'm also uncertain about the overall effect this change has, in that now the lock gets dropped for _every_ page to be copied. > @@ -505,6 +517,23 @@ static int paging_log_dirty_op(struct domain *d, > clear_page(l1); > unmap_domain_page(l1); > } > +if ( paging_mode_enabled(current->domain) && peek ) > +{ > +d->arch.paging.preempt.log_dirty.i4 = i4; > +d->arch.paging.preempt.log_dirty.i3 = i3; > +d->arch.paging.preempt.log_dirty.i2 = i2 + 1; If i2 + 1 == LOGDIRTY_NODE_ENTRIES I think it would be better > +d->arch.paging.preempt.log_dirty.done = pages; > +d->arch.paging.preempt.dom = current->domain; > +d->arch.paging.preempt.op = sc->op; > +resuming = 1; true? > +if ( l2 ) > +unmap_domain_page(l2); > +if ( l3 ) > +unmap_domain_page(l3); > +if ( l4 ) > +unmap_domain_page(l4); > +goto again; At the very least for safety reasons you should set l2, l3, and l4 back to NULL here. > @@ -513,6 +542,7 @@ static int paging_log_dirty_op(struct domain *d, > { > d->arch.paging.preempt.log_dirty.i4 = i4; > d->arch.paging.preempt.log_dirty.i3 = i3 + 1; > +d->arch.paging.preempt.log_dirty.i2 = 0; > rv = -ERESTART; > break; > } > @@ -525,6 +555,7 @@ static int paging_log_dirty_op(struct domain *d, > { > d->arch.paging.preempt.log_dirty.i4 = i4 + 1; > d->arch.paging.preempt.log_dirty.i3 = 0; > +d->arch.paging.preempt.log_dirty.i2 = 0; > rv = -ERESTART; > } With these I don't see why you need storage in the
Re: [Xen-devel] [PATCH 07/25] xen (ARM, x86): add errno-returning functions for copy
On Fri, Nov 30, 2018 at 05:32:46PM -0800, Christopher Clark wrote: > Applied to both x86 and ARM headers. > > Signed-off-by: Christopher Clark > --- > xen/include/asm-arm/guest_access.h | 25 + > xen/include/asm-x86/guest_access.h | 29 + > xen/include/xen/guest_access.h | 3 +++ > 3 files changed, 57 insertions(+) > > diff --git a/xen/include/asm-arm/guest_access.h > b/xen/include/asm-arm/guest_access.h > index 224d2a0..7b6f89c 100644 > --- a/xen/include/asm-arm/guest_access.h > +++ b/xen/include/asm-arm/guest_access.h > @@ -24,6 +24,11 @@ int access_guest_memory_by_ipa(struct domain *d, paddr_t > ipa, void *buf, > #define __raw_copy_from_guest raw_copy_from_guest > #define __raw_clear_guest raw_clear_guest > > +#define raw_copy_from_guest_errno(dst, src, len) \ > +(raw_copy_from_guest((dst), (src), (len)) ? -EFAULT : 0) > +#define raw_copy_to_guest_errno(dst, src, len) \ > +(raw_copy_to_guest((dst), (src), (len)) ? -EFAULT : 0) Since the only error that you return is EFAULT, I don't really see the point in adding all those helpers. You achieve exactly the same by returning a boolean and doing the translation to EFAULT in the caller if required. It might have been nice to have the copy to/from set of functions return an error value, but adding a new set of helpers that have the same functionality but just differ in the return value look redundant. Thanks, Roger. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [RFC 11/16] irq: skip action avalability check for guest's IRQ
On 12.12.18 17:08, Julien Grall wrote: Is it just because it adds a couple more instruction in the guest case? And add unlikely for XEN IRQ case. That was the idea. The check is mainly there to catch error in debug build. I supposed a race with `release_irq()`, but found out that we are safe with `_IRQ_DISABLED` flag when `action` is not set. So I'll keep the code in its current place (as of mainline) and wrap it with `#ifndef NDEBUG`. -- Sincerely, Andrii Anisov. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2 1/2] x86/dom0: rename paging function
On Wed, Dec 12, 2018 at 03:32:53AM -0700, Jan Beulich wrote: > >>> On 12.12.18 at 11:04, wrote: > > On Wed, Dec 12, 2018 at 02:53:26AM -0700, Jan Beulich wrote: > >> >>> On 12.12.18 at 10:14, wrote: > >> > On Tue, Dec 11, 2018 at 08:33:08AM -0700, Jan Beulich wrote: > >> >> >>> On 11.12.18 at 16:19, wrote: > >> >> > On Tue, Dec 11, 2018 at 08:08:51AM -0700, Jan Beulich wrote: > >> >> >> >>> On 05.12.18 at 15:54, wrote: > >> >> >> > To note it's calculating the approximate amount of memory required > >> >> >> > by > >> >> >> > shadow paging. > >> >> >> > >> >> >> I don't understand this logic, and ... > >> >> >> > >> >> >> > @@ -325,7 +325,7 @@ unsigned long __init dom0_compute_nr_pages( > >> >> >> > break; > >> >> >> > > >> >> >> > /* Reserve memory for shadow or HAP. */ > >> >> >> > -avail -= dom0_paging_pages(d, nr_pages); > >> >> >> > +avail -= dom0_shadow_pages(d, nr_pages); > >> >> >> > } > >> >> >> > >> >> >> ... the comment here (and lack of conditional restricting the > >> >> >> code to shadow mode) appear to support me: Have you > >> >> >> been mislead by the function having a comment referring > >> >> >> to libxl_get_required_shadow_memory()? I think if anything > >> >> >> that libxl function would want to be renamed (to replace > >> >> >> "shadow" by something more generic in its name). > >> >> > > >> >> > But the logic in dom0_shadow_pages to calculate the size of the paging > >> >> > memory pool is specifically for shadow AFAICT, I don't think HAP needs > >> >> > to take the number of vCPUs into account, since there's only a > >> >> > single p2m for the whole domain. OTOH shadow needs to take the number > >> >> > of vCPUs into account because each one will have a different shadow. > >> >> > >> >> Yes, the vCPU count aspect is indeed shadow specific. However, > >> >> as said in reply to the other patch, the calculation here was at > >> >> least supposed to also take into account the P2M part of the > >> >> needed allocations. Yet the P2M part ought to be similar between > >> >> both modes. > >> >> > >> >> > Note that patch 2 in this series adds a function to calculate the size > >> >> > of the paging memory pool for HAP, and a conditional is added to the > >> >> > expression above that takes into account whether shadow or HAP is in > >> >> > use when subtracting from the amount of available memory. > >> >> > >> >> Well, assuming we can settle on what shape patch 2 should take > >> >> I can see the point in doing the rename here, but then with an > >> >> adjusted description: Especially in light of the code comment still > >> >> visible above you'll want to point out that the rename is in > >> >> preparation of splitting the calculations. Since I question the split, > >> >> though, the rename (in a separate patch) is questionable to me > >> >> too. If we used uniform P2M calculations and added just shadow's > >> >> per-vCPU extra on top, no rename in a separate patch would > >> >> seem warranted. > >> > > >> > The current calculations in dom0_paging_pages assume 1 page is needed > >> > for each 1MB of guest memory for the p2m, do you think this is OK? > >> > (and suitable to be used for HAP/IOMMU page tables also) > >> > >> Well, 1 page per 1Mb means the same as your current 8 bytes > >> per page times 2 (for separate P2M and IOMMU tables), afaict. > > > > I was planning to use 1 page per 1Mb for the p2m, and then 1 page per > > 1Mb for the IOMMU, so 16 bytes per page. > > Well, that's (as said for patch 2) quite a bit of an over-estimate, > but then again reserving a little too much is perhaps better than > reserving too little. > > > You mentioned there's some code (for PV?) to calculate the size of the > > page tables but I'm having trouble finding it (mainly because I'm not > > that familiar with PV), could you point me to it? > > In dom0_construct_pv() you'll find a loop starting with > "for ( nr_pt_pages = 2; ; nr_pt_pages++ )". It's not the neatest, > but at least we've never had reports of failure. That seems quite complicated, what about using the formula below: /* * Approximate the memory required for the HAP/IOMMU page tables by * pessimistically assuming every guest page will use a p2m page table * entry. */ return DIV_ROUND_UP(( /* Account for one entry in the L1 per page. */ nr_pages + /* Account for one entry in the L2 per 512 pages. */ DIV_ROUND_UP(nr_pages, 512) + /* Account for one entry in the L3 per 512^2 pages. */ DIV_ROUND_UP(nr_pages, 512 * 512) + /* Account for one entry in the L4 per 512^3 pages. */ DIV_ROUND_UP(nr_pages, 512 * 512 * 512) + ) * 8, PAGE_SIZE << PAGE_ORDER_4K); That takes into account higher level page table structures. Thanks, Roger. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2 07/10] libxl: Make killing of device model asynchronous
George Dunlap writes ("[PATCH v2 07/10] libxl: Make killing of device model asynchronous"): > Or at least, give it an asynchronous interface so that we can make it > actually asynchronous in subsequent patches. > > Create state structures and callback function signatures. Add the > state structure to libxl__destroy_domid_state. Break > libxl__destroy_domid down into two functions. Acked-by: Ian Jackson ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2 05/10] libxl: Do root checks once in libxl__domain_get_device_model_uid
George Dunlap writes ("[PATCH v2 05/10] libxl: Do root checks once in libxl__domain_get_device_model_uid"): > At the moment, we check for equivalence to literal "root" before > deciding whether to add the `runas` command-line option to QEMU. This > is unsatisfactory for several reasons. ... > v2: > - Refactor to use `out` rather than multiple labels > - Only check for root once > - Use 'out' rather than direct returns for errors (only use direct returns > for early `succeed-without-setting-runas` paths) > - Use `rc` rather than `ret` to more closely align with CODING_STYLE > - Fill out comments about the cases we're handling > - Return ERROR_DEVICE_EXISTS rather than ERROR_FAIL if there's another > username that maps to our calculated uid > - Report an error if the specified device_model_user doesn't exist Thanks. This is all much better now. I still have a few style nits, and one comment on your reuse of a libxl error code, but AFAICT the algorithm is right. > +if (!user_base) { > +LOGD(ERROR, guest_domid, "Couldn't find device_model_user %s", > + user); > +rc = ERROR_INVAL; > +} else > +intended_uid = user_base->pw_uid; > > +goto out; 1. Would you mind adding the missing { } ? | ... To avoid confusion, either all | the blocks in an if...else chain have braces, or none of them do. 2. Can you please set rc in the other path ? I think this is supposed to be a success path. Relying on the previous value of rc seems very action at a distance. (Later: yes, I see this this is a success path. See my comments about the rules implied by your `out:' block.) TBH I would prefer two goto outs rather than one, like this: > +if (!user_base) { > +LOGD(ERROR, guest_domid, "Couldn't find device_model_user %s", > + user); > +rc = ERROR_INVAL; +goto out; +} else { > +intended_uid = user_base->pw_uid; +rc = 0; +goto out; +} > +} which seems easier to read since it is not necessary to untangle the whole if, and look at the context, to see the correctness of the individual bits. Or you could treat the !user_base as an error block and write this: > +if (!user_base) { > +LOGD(ERROR, guest_domid, "Couldn't find device_model_user %s", > + user); > +rc = ERROR_INVAL; +goto out; +} +intended_uid = user_base->pw_uid; +rc = 0; +goto out; > +} > +/* > + * If dm_restrict isn't set, and we don't have a specified user, don't > + * bother setting a `-runas` parameter. > + */ > if (!libxl_defbool_val(b_info->dm_restrict)) { > LOGD(DEBUG, guest_domid, > "dm_restrict disabled, starting QEMU as root"); > return 0; > } Why `return 0' here but `goto out' earlier ? IMO all the success returns should be the same. > +rc = userlookup_helper_getpwuid(gc, intended_uid, > _clash_pwbuf, _clash); > -if (ret < 0) > -return ret; > +if (rc < 0) > +goto out; In my last mail I suggested this should be `ret < 0' but of course now it should be `if (rc)' ... > if (user_clash) { > LOGD(ERROR, guest_domid, > "wanted to use uid %ld (%s + %d) but that is user %s !", > (long)intended_uid, LIBXL_QEMU_USER_RANGE_BASE, > guest_domid, user_clash->pw_name); > -return ERROR_FAIL; > +rc = ERROR_DEVICE_EXISTS; > +goto out; I appreciate your desire to specify particular error value, but I am far from convinced that ERROR_DEVICE_EXISTS is appropriate. It would lead someone to ask which device was in the way. We generally use EINVAL for bad configurations. If you prefer, feel free to introduce a new error code. 32-bit signed integers are pretty cheap. > +if (rc < 0) > +goto out; > if (user_base) { > LOGD(WARN, guest_domid, "Could not find user %s, falling back to %s", > LIBXL_QEMU_USER_RANGE_BASE, LIBXL_QEMU_USER_SHARED); > -goto end_search; > +intended_uid = user_base->pw_uid; > +goto out; Here we have this pattern again with a `goto out' without a preceding assignment to `rc'. AFAICT the rules implied by your out block are: * Every goto out must be preceded by an assignment to rc. IMO there is no reason this should not immediately precede the goto out. * Additionally, if rc is 0 then the goto out must also be preceded relatively recently by an assignment to intended_uid. > +out: > +if (!rc) { > +if (intended_uid == 0) { > +LOGD(ERROR, guest_domid, "intended_uid is 0 (root)!"); > +return ERROR_INVAL; > +} > + > +state->dm_runas = user; > +} > + >
Re: [Xen-devel] [PATCH for-4.12 v2 16/17] xen/arm: Implement Set/Way operations
On 07/12/2018 21:29, Stefano Stabellini wrote: CC'ing Dario Dario, please give a look at the preemption question below. On Fri, 7 Dec 2018, Julien Grall wrote: On 06/12/2018 23:32, Stefano Stabellini wrote: On Tue, 4 Dec 2018, Julien Grall wrote: So you may not execute them before returning to the guest introducing long delay. That's why we execute the rest of the code with interrupts masked. If sotfirq_pending() returns 0 then you know there were no more softirq pending to handle. All the new one will be signaled via an interrupt than can only come up when irq are unmasked. The one before executing vCPU work can potentially be avoided. The reason I added it is it can take some times before p2m_flush_vm() will call softirq. As we do this on return to guest we may have already been executed for some time in the hypervisor. So this give us a chance to preempt if the vCPU consumed his sliced. This one is difficult to tell whether it is important or if it would be best avoided. For Dario: basically we have a long running operation to perform, we thought that the best place for it would be on the path returning to guest (leave_hypervisor_tail). The operation can interrupt itself checking sotfirq_pending() once in a while to avoid blocking the pcpu for too long. The question is: is it better to check sotfirq_pending() even before starting? Or every so often during the operating is good enough? Does it even matter? I am not sure to understand what is your concern here. Checking for softirq_pending() often is not an issue. The issue is when we happen to not check it. At the moment, I would prefer to be over cautious until we figure out whether this is a real issue. If you are concerned about the performance impact, this is only called when a guest is using set/way. Cheers, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2 03/10] libxl: Clean up userlookup_helper_getpw* helper
George Dunlap writes ("[PATCH v2 03/10] libxl: Clean up userlookup_helper_getpw* helper"): > Bring conventions more in line with libxl__xs_read_checked(): > - If found, return 0 and set pointer to non-NULL > - If not found, return 0 and set pointer to NULL > - On error, return libxl-style error number. > > Update documentation to match. ... > #define DEFINE_USERLOOKUP_HELPER(NAME,SPEC_TYPE,STRUCTNAME,SYSCONF) \ > static int userlookup_helper_##NAME(libxl__gc *gc, \ > @@ -83,7 +89,7 @@ static int libxl__create_qemu_logfile(libxl__gc *gc, char > *name) > struct STRUCTNAME *resultp = NULL; \ git diff has excelled itself in choice of heading line, hasn't it ? > @@ -142,14 +147,14 @@ static int libxl__domain_get_device_model_uid(libxl__gc > *gc, > _pwbuf, _base); > if (ret < 0) > return ret; > -if (ret > 0) { > +if (user_base) { I would prefer to also: -if (ret < 0) +if (ret) return ret; which is more conventional for rc and might reduce the impact of bugs where the function returned a positive. (Twice.) With or without that change, Acked-by: Ian Jackson Thanks for the cleanup! Ian. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] xen: xen-pciback: Reset MSI-X state when exposing a device
>>> On 12.12.18 at 16:18, wrote: > On Wed, Dec 12, 2018 at 01:51:01AM -0700, Jan Beulich wrote: > On 12.12.18 at 08:06, wrote: >>> On Wed, Dec 05, 2018 at 09:01:33AM -0500, Boris Ostrovsky wrote: On 12/5/18 4:32 AM, Roger Pau Monné wrote: > On Wed, Dec 05, 2018 at 10:19:17AM +0800, Chao Gao wrote: >> I find some pass-thru devices don't work any more across guest reboot. >> Assigning it to another guest also meets the same issue. And the only >> way to make it work again is un-binding and binding it to pciback. >> Someone reported this issue one year ago [1]. More detail also can be >> found in [2]. >> >> The root-cause is Xen's internal MSI-X state isn't reset properly >> during reboot or re-assignment. In the above case, Xen set maskall bit >> to mask all MSI interrupts after it detected a potential security >> issue. Even after device reset, Xen didn't reset its internal maskall >> bit. As a result, maskall bit would be set again in next write to >> MSI-X message control register. >> >> Given that PHYSDEVOPS_prepare_msix() also triggers Xen resetting MSI-X >> internal state of a device, we employ it to fix this issue rather than >> introducing another dedicated sub-hypercall. >> >> Note that PHYSDEVOPS_release_msix() will fail if the mapping between >> the device's msix and pirq has been created. This limitation prevents >> us calling this function when detaching a device from a guest during >> guest shutdown. Thus it is called right before calling >> PHYSDEVOPS_prepare_msix(). > s/PHYSDEVOPS/PHYSDEVOP/ (no final S). And then I would also drop the > () at the end of the hypercall name since it's not a function. > > I'm also wondering why the release can't be done when the device is > detached from the guest (or the guest has been shut down). This makes > me worry about the raciness of the attach/detach procedure: if there's > a state where pciback assumes the device has been detached from the > guest, but there are still pirqs bound, an attempt to attach to > another guest in such state will fail. I wonder whether this additional reset functionality could be done out of xen_pcibk_xenbus_remove(). We first do a (best effort) device reset and then do the extra things that are not properly done there. >>> >>> No. It cannot be done in xen_pcibk_xenbus_remove() without modifying >>> the handler of PHYSDEVOP_release_msix. To do a successful Xen internal >>> MSI-X state reset, PHYSDEVOP_{release, prepare}_msix should be finished >>> without error. But ATM, xen expects that no msi is bound to pirq when >>> doing PHYSDEVOP_release_msix. Otherwise it fails with error code -EBUSY. >>> However, the expectation isn't guaranteed in xen_pcibk_xenbus_remove(). >>> In some cases, if qemu fails to unmap MSIs, MSIs are unmapped by Xen >>> at last minute, which happens after device reset in >>> xen_pcibk_xenbus_remove(). >> >>But that may need taking care of: I don't think it is a good idea to have >>anything left from the prior owning domain when the device gets reset. >>I.e. left over IRQ bindings should perhaps be forcibly cleared before >>invoking the reset; > > Agree. How about pciback to track the established IRQ bindings? Then > pciback can clear irq binding before invoking the reset. How would pciback even know of those mappings, when it's qemu who establishes (and manages) them? >>in fact I'd expect this to happen in the course of >>domain destruction, and I'd expect the device reset to come after the >>domain was cleaned up. Perhaps simply an ordering issue in the tool >>stack? > > I don't think reversing the sequences of device reset and domain > destruction would be simple. Furthermore, during device hot-unplug, > device reset is done when the owner is alive. So if we use domain > destruction to enforce all irq binding cleared, in theory, it won't be > applicable to hot-unplug case (if qemu's hot-unplug logic is > compromised). Even in the hot-unplug case the tool stack could issue unbind requests, behind the back of the possibly compromised qemu, once neither the guest nor qemu have access to the device anymore. Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v11] tolerate jitter in cpu_khz calculation to avoid TSC emulation
Improve decision when vTSC emulation will be activated for a domU with tsc_mode=default. The current approach is to compare the cpu_khz value from two physical hosts. Since this value is not accurate, it can not be used verbatim to decide if vTSC emulation needs to be enabled. Without this change each TSC access from domU will be emulated after migration, which causes a significant perfomance drop for workloads that make use of rdtsc. If a domU uses TSC as clocksoure it also must run NTP in some way to avoid the potential drift what will most likely happen, independent of any migration. The calculation of the drift is based on the time returned by remote servers versus how fast the local clock advances. NTP can handle a drift up to 500PPM. This means the local clocksource can run up to 500us slower or faster. This calculation is based on the TSC frequency of the host where the domU was started. Once a domU is migrated to a host of a different class, like from "2.3GHz" to "2.4GHz", the TSC frequency changes, but the domU kernel may not recalibrate itself. As a result, the drift will be larger and might be outside of the 500 PPM range. In addition, the kernel may notice the change of speed in which the TSC advances and could change the clocksource. All this depends of course on the type of OS that is running in the domU. If the domU is migrated to another host of the same class, both hosts may have a slightly different TSC frequency. The difference is small enough and most likely within the drift range that NTP can handle. The formula to set the tolerance for this host calculates the ticks within a timespan of 500 PPM, which is 500us. From this number the assumed jitter in the TSC frequency measurement must be substracted because Xen itself can not know if the estimated value in cpu_khz is at the edge or in the middle of the range of possible freqencies. Data collected during the incident which triggered this change showed a jitter of up to 200 KHz across systems of the same class. The resulting tolerance is larger than needed, and it is expected to still cover the possible drift that NTP can handle. Signed-off-by: Olaf Hering -- v11: - trim patch and use calculated tolerance value, no admin interaction required v10: - rebase to ae01a8e315 - remove changes for libxl and save/restore protocol, the feature has to be per host instead of per guest - add newline to tsc_set_info (Andrew) - add pointer to xen-tscmode(7) in xl.cfg(5)/vtsc_tolerance_khz (Andrew) - mention potential clock drift in the domU (Andrew) - reword the newly added paragraph in xen-tscmode(7) (Andrew), and also mention that it is about the measured/estimated TSC value rather than the real value. The latter is simply unknown. - use uint32 for internal representation of xen_domctl_tsc_info.vtsc_tolerance_khz and remove padding field - add math for real TSC frequency to xen-tscmode v9: - extend commit msg, mention potential issues with xc_sr_rec_tsc_info._res1 v8: - adjust also python stream checker for added tolerance member v7: - use uint16 in libxl_types.idl to match type used elsewhere in the patch v6: - mention default value in xl.cfg - tsc_set_info: remove usage of __func__, use %d for domid - tsc_set_info: use ABS to calculate khz_diff v5: - reduce functionality to allow setting of the tolerance value only at initial domU startup v4: - add missing copyback in XEN_DOMCTL_set_vtsc_tolerance_khz v3: - rename vtsc_khz_tolerance to vtsc_tolerance_khz - separate domctls to adjust values - more docs - update libxl.h - update python tests - flask check bound to tsc permissions - not runtime tested due to dlsym() build errors in staging --- xen/arch/x86/time.c | 33 - 1 file changed, 32 insertions(+), 1 deletion(-) diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c index 24d4c2794b..2ffdc2ea8f 100644 --- a/xen/arch/x86/time.c +++ b/xen/arch/x86/time.c @@ -43,6 +43,9 @@ static char __initdata opt_clocksource[10]; string_param("clocksource", opt_clocksource); unsigned long __read_mostly cpu_khz; /* CPU clock frequency in kHz. */ +#define VTSC_NTP_PPM_TOLERANCE 500UL /* Amount of drift NTP will handle */ +#define VTSC_JITTER_RANGE_KHZ 200UL /* Assumed jitter in cpu_khz */ +static unsigned int __read_mostly vtsc_tolerance_khz; DEFINE_SPINLOCK(rtc_lock); unsigned long pit0_ticks; @@ -1885,6 +1888,16 @@ void __init early_time_init(void) printk("Detected %lu.%03lu MHz processor.\n", cpu_khz / 1000, cpu_khz % 1000); +tmp = 1000 * 1000; +tmp += VTSC_NTP_PPM_TOLERANCE; +tmp *= cpu_khz; +tmp /= 1000 * 1000; +tmp -= cpu_khz; +if (tmp >= VTSC_JITTER_RANGE_KHZ) +tmp -= VTSC_JITTER_RANGE_KHZ; +vtsc_tolerance_khz = (unsigned int)tmp; +printk("Tolerating vtsc jitter for domUs: %u kHz.\n", vtsc_tolerance_khz); + setup_irq(0, 0, ); } @@ -2208,6 +2221,7 @@ void tsc_set_info(struct domain *d,
Re: [Xen-devel] [PATCH] xen: xen-pciback: Reset MSI-X state when exposing a device
On Wed, Dec 12, 2018 at 01:51:01AM -0700, Jan Beulich wrote: On 12.12.18 at 08:06, wrote: >> On Wed, Dec 05, 2018 at 09:01:33AM -0500, Boris Ostrovsky wrote: >>>On 12/5/18 4:32 AM, Roger Pau Monné wrote: On Wed, Dec 05, 2018 at 10:19:17AM +0800, Chao Gao wrote: > I find some pass-thru devices don't work any more across guest reboot. > Assigning it to another guest also meets the same issue. And the only > way to make it work again is un-binding and binding it to pciback. > Someone reported this issue one year ago [1]. More detail also can be > found in [2]. > > The root-cause is Xen's internal MSI-X state isn't reset properly > during reboot or re-assignment. In the above case, Xen set maskall bit > to mask all MSI interrupts after it detected a potential security > issue. Even after device reset, Xen didn't reset its internal maskall > bit. As a result, maskall bit would be set again in next write to > MSI-X message control register. > > Given that PHYSDEVOPS_prepare_msix() also triggers Xen resetting MSI-X > internal state of a device, we employ it to fix this issue rather than > introducing another dedicated sub-hypercall. > > Note that PHYSDEVOPS_release_msix() will fail if the mapping between > the device's msix and pirq has been created. This limitation prevents > us calling this function when detaching a device from a guest during > guest shutdown. Thus it is called right before calling > PHYSDEVOPS_prepare_msix(). s/PHYSDEVOPS/PHYSDEVOP/ (no final S). And then I would also drop the () at the end of the hypercall name since it's not a function. I'm also wondering why the release can't be done when the device is detached from the guest (or the guest has been shut down). This makes me worry about the raciness of the attach/detach procedure: if there's a state where pciback assumes the device has been detached from the guest, but there are still pirqs bound, an attempt to attach to another guest in such state will fail. >>> >>>I wonder whether this additional reset functionality could be done out >>>of xen_pcibk_xenbus_remove(). We first do a (best effort) device reset >>>and then do the extra things that are not properly done there. >> >> No. It cannot be done in xen_pcibk_xenbus_remove() without modifying >> the handler of PHYSDEVOP_release_msix. To do a successful Xen internal >> MSI-X state reset, PHYSDEVOP_{release, prepare}_msix should be finished >> without error. But ATM, xen expects that no msi is bound to pirq when >> doing PHYSDEVOP_release_msix. Otherwise it fails with error code -EBUSY. >> However, the expectation isn't guaranteed in xen_pcibk_xenbus_remove(). >> In some cases, if qemu fails to unmap MSIs, MSIs are unmapped by Xen >> at last minute, which happens after device reset in >> xen_pcibk_xenbus_remove(). > >But that may need taking care of: I don't think it is a good idea to have >anything left from the prior owning domain when the device gets reset. >I.e. left over IRQ bindings should perhaps be forcibly cleared before >invoking the reset; Agree. How about pciback to track the established IRQ bindings? Then pciback can clear irq binding before invoking the reset. >in fact I'd expect this to happen in the course of >domain destruction, and I'd expect the device reset to come after the >domain was cleaned up. Perhaps simply an ordering issue in the tool >stack? I don't think reversing the sequences of device reset and domain destruction would be simple. Furthermore, during device hot-unplug, device reset is done when the owner is alive. So if we use domain destruction to enforce all irq binding cleared, in theory, it won't be applicable to hot-unplug case (if qemu's hot-unplug logic is compromised). Thanks Chao ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [RFC 11/16] irq: skip action avalability check for guest's IRQ
On 12/12/2018 14:58, Andrii Anisov wrote: On 12.12.18 16:49, Julien Grall wrote: ASSERT only tells you that desc->action was NULL. It does not tell you which IRQ has the desc->action == NULL. Ah, yes. It is a bit a shame we don't have way to provide another message with ASSERT to help you debugging. We might have implemented an assert with a message. And guys on stackoverflow suggest for workaround `ASSERT(condition && "error message")`. That's not going to help for showing the IRQ number. No, I am suggesting to have the current if ( test_bit(...) ) protecred with #ifndef NDEBUG. No code duplication. I would not like to leave even non-debug build without the check. The check is mainly there to catch error in debug build. So I am quite confused what is your goal by moving the check. Is it just because it adds a couple more instruction in the guest case? Cheers, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [RFC 11/16] irq: skip action avalability check for guest's IRQ
On 12.12.18 16:49, Julien Grall wrote: ASSERT only tells you that desc->action was NULL. It does not tell you which IRQ has the desc->action == NULL. Ah, yes. It is a bit a shame we don't have way to provide another message with ASSERT to help you debugging. We might have implemented an assert with a message. And guys on stackoverflow suggest for workaround `ASSERT(condition && "error message")`. No, I am suggesting to have the current if ( test_bit(...) ) protecred with #ifndef NDEBUG. No code duplication. I would not like to leave even non-debug build without the check. -- Sincerely, Andrii Anisov. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH] x86: fix paging_log_dirty_op to work with paging guests
When the caller of paging_log_dirty_op is a paging mode guest Xen would choke when trying to copy the dirty bitmap to the guest provided buffer because the paging lock of the target is already locked, and trying to lock the paging lock of the caller will cause the mm lock order checks to trigger: (XEN) mm locking order violation: 64 > 16 (XEN) Xen BUG at ./mm-locks.h:143 (XEN) [ Xen-4.12-unstable x86_64 debug=y Tainted: C ] (XEN) CPU:4 (XEN) RIP:e008:[] p2m.c#_mm_read_lock+0x41/0x50 (XEN) RFLAGS: 00010286 CONTEXT: hypervisor (d0v3) [...] (XEN) Xen call trace: (XEN)[] p2m.c#_mm_read_lock+0x41/0x50 (XEN)[] vmac.c#p2m_get_page_from_gfn+0x46/0x200 (XEN)[] vmac.c#hap_p2m_ga_to_gfn_4_levels+0x4a/0x270 (XEN)[] vmac.c#hvm_translate_get_page+0x33/0x1c0 (XEN)[] hvm.c#__hvm_copy+0x8d/0x230 (XEN)[] vmac.c#hvm_copy_to_guest_linear+0x46/0x60 (XEN)[] vmac.c#copy_to_user_hvm+0x79/0x90 (XEN)[] paging.c#paging_log_dirty_op+0x2c9/0x6d0 (XEN)[] vmac.c#arch_do_domctl+0x63e/0x1c00 (XEN)[] vmac.c#do_domctl+0x450/0x1020 (XEN)[] vmac.c#hvm_hypercall+0x1c9/0x4b0 (XEN)[] vmac.c#vmx_vmexit_handler+0x6c1/0xea0 (XEN)[] vmac.c#vmx_asm_vmexit_handler+0xfa/0x270 (XEN) (XEN) (XEN) (XEN) Panic on CPU 4: (XEN) Xen BUG at ./mm-locks.h:143 (XEN) Fix this by releasing the target paging lock before attempting to perform the copy of the dirty bitmap, and then forcing a restart of the whole process in case there have been changes to the dirty bitmap tables. Note that the path for non-paging guests remains the same, and the paging lock is not dropped in that case. Signed-off-by: Roger Pau Monné --- Cc: George Dunlap Cc: Jan Beulich Cc: Andrew Cooper Cc: Wei Liu --- xen/arch/x86/mm/paging.c | 37 +--- xen/include/asm-x86/domain.h | 1 + 2 files changed, 35 insertions(+), 3 deletions(-) diff --git a/xen/arch/x86/mm/paging.c b/xen/arch/x86/mm/paging.c index d5836eb688..752f827f38 100644 --- a/xen/arch/x86/mm/paging.c +++ b/xen/arch/x86/mm/paging.c @@ -408,6 +408,7 @@ static int paging_log_dirty_op(struct domain *d, unsigned long *l1 = NULL; int i4, i3, i2; + again: if ( !resuming ) { /* @@ -468,17 +469,18 @@ static int paging_log_dirty_op(struct domain *d, l4 = paging_map_log_dirty_bitmap(d); i4 = d->arch.paging.preempt.log_dirty.i4; i3 = d->arch.paging.preempt.log_dirty.i3; +i2 = d->arch.paging.preempt.log_dirty.i2; pages = d->arch.paging.preempt.log_dirty.done; for ( ; (pages < sc->pages) && (i4 < LOGDIRTY_NODE_ENTRIES); i4++, i3 = 0 ) { l3 = (l4 && mfn_valid(l4[i4])) ? map_domain_page(l4[i4]) : NULL; -for ( ; (pages < sc->pages) && (i3 < LOGDIRTY_NODE_ENTRIES); i3++ ) +for ( ; (pages < sc->pages) && (i3 < LOGDIRTY_NODE_ENTRIES); + i3++, i2 = 0 ) { l2 = ((l3 && mfn_valid(l3[i3])) ? map_domain_page(l3[i3]) : NULL); -for ( i2 = 0; - (pages < sc->pages) && (i2 < LOGDIRTY_NODE_ENTRIES); +for ( ; (pages < sc->pages) && (i2 < LOGDIRTY_NODE_ENTRIES); i2++ ) { unsigned int bytes = PAGE_SIZE; @@ -488,6 +490,16 @@ static int paging_log_dirty_op(struct domain *d, bytes = (unsigned int)((sc->pages - pages + 7) >> 3); if ( likely(peek) ) { +if ( paging_mode_enabled(current->domain) ) +/* + * Drop the target p2m lock, or else Xen will panic + * when trying to acquire the p2m lock of the caller + * due to invalid lock order. Note that there are no + * lock ordering issues here, and the panic is due to + * the fact that the lock level tracking doesn't record + * the domain the lock belongs to. + */ +paging_unlock(d); if ( (l1 ? copy_to_guest_offset(sc->dirty_bitmap, pages >> 3, (uint8_t *)l1, bytes) @@ -505,6 +517,23 @@ static int paging_log_dirty_op(struct domain *d, clear_page(l1); unmap_domain_page(l1); } +if ( paging_mode_enabled(current->domain) && peek ) +{ +d->arch.paging.preempt.log_dirty.i4 = i4; +d->arch.paging.preempt.log_dirty.i3 = i3; +d->arch.paging.preempt.log_dirty.i2 = i2 + 1; +d->arch.paging.preempt.log_dirty.done = pages; +d->arch.paging.preempt.dom = current->domain; +
[Xen-devel] [PATCH] libxl_create: Re-order callbacks of initiate_domain_create
Callbacks should be in the order that there are going to be executed. This patch fix the initiate_domain_create callbacks, and also reorder the callbacks prototytes. That way, it's easier to follow the flow. This patch: - move libxl__colo_restore_setup_done after domcreate_bootloader_done. - move domcreate_attach_devices after domcreate_devmodel_started. No functional change. Signed-off-by: Anthony PERARD --- tools/libxl/libxl_create.c | 123 +++-- 1 file changed, 63 insertions(+), 60 deletions(-) diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c index fa573344bc..891175b15b 100644 --- a/tools/libxl/libxl_create.c +++ b/tools/libxl/libxl_create.c @@ -771,28 +771,31 @@ static int store_libxl_entry(libxl__gc *gc, uint32_t domid, */ /* Event callbacks, in this order: */ -static void domcreate_devmodel_started(libxl__egc *egc, - libxl__dm_spawn_state *dmss, - int rc); static void domcreate_bootloader_console_available(libxl__egc *egc, libxl__bootloader_state *bl); -static void domcreate_bootloader_done(libxl__egc *egc, - libxl__bootloader_state *bl, - int rc); - -static void domcreate_launch_dm(libxl__egc *egc, libxl__multidev *aodevs, -int ret); - static void domcreate_console_available(libxl__egc *egc, libxl__domain_create_state *dcs); +static void domcreate_bootloader_done(libxl__egc *egc, + libxl__bootloader_state *bl, + int rc); +static void libxl__colo_restore_setup_done(libxl__egc *egc, + libxl__colo_restore_state *crs, + int rc); static void domcreate_stream_done(libxl__egc *egc, libxl__stream_read_state *srs, int ret); - static void domcreate_rebuild_done(libxl__egc *egc, libxl__domain_create_state *dcs, int ret); +static void domcreate_launch_dm(libxl__egc *egc, libxl__multidev *aodevs, +int ret); +static void domcreate_devmodel_started(libxl__egc *egc, + libxl__dm_spawn_state *dmss, + int rc); +static void domcreate_attach_devices(libxl__egc *egc, + libxl__multidev *multidev, + int ret); /* Our own function to clean up and call the user's callback. * The final call in the sequence. */ @@ -1031,23 +1034,6 @@ static void domcreate_console_available(libxl__egc *egc, dcs->aop_console_how.for_event)); } -static void libxl__colo_restore_setup_done(libxl__egc *egc, - libxl__colo_restore_state *crs, - int rc) -{ -libxl__domain_create_state *dcs = CONTAINER_OF(crs, *dcs, crs); - -EGC_GC; - -if (rc) { -LOGD(ERROR, dcs->guest_domid, "colo restore setup fails: %d", rc); -domcreate_stream_done(egc, >srs, rc); -return; -} - -libxl__stream_read_start(egc, >srs); -} - static void domcreate_bootloader_done(libxl__egc *egc, libxl__bootloader_state *bl, int rc) @@ -1145,6 +1131,23 @@ static void domcreate_bootloader_done(libxl__egc *egc, domcreate_stream_done(egc, >srs, rc); } +static void libxl__colo_restore_setup_done(libxl__egc *egc, + libxl__colo_restore_state *crs, + int rc) +{ +libxl__domain_create_state *dcs = CONTAINER_OF(crs, *dcs, crs); + +EGC_GC; + +if (rc) { +LOGD(ERROR, dcs->guest_domid, "colo restore setup fails: %d", rc); +domcreate_stream_done(egc, >srs, rc); +return; +} + +libxl__stream_read_start(egc, >srs); +} + void libxl__srm_callout_callback_restore_results(xen_pfn_t store_mfn, xen_pfn_t console_mfn, void *user) { @@ -1509,6 +1512,38 @@ const struct libxl_device_type *device_type_tbl[] = { NULL }; +static void domcreate_devmodel_started(libxl__egc *egc, + libxl__dm_spawn_state *dmss, + int ret) +{ +libxl__domain_create_state *dcs = CONTAINER_OF(dmss, *dcs, sdss.dm); +STATE_AO_GC(dmss->spawn.ao); +int domid = dcs->guest_domid; + +/* convenience aliases */ +libxl_domain_config *const d_config = dcs->guest_config; + +if (ret) { +LOGD(ERROR, domid, "device model did not start:
Re: [Xen-devel] [RFC 11/16] irq: skip action avalability check for guest's IRQ
Hi, On 12/12/2018 13:51, Andrii Anisov wrote: Julien, On 12.12.18 13:59, Julien Grall wrote: An ASSERT(...) is already embed in irq_get_guest_info(desc). Yep. I thought about the ASSERT(...) over the current printk yesterday. But I discarded it because it does not give you more information if something went really wrong as the stack trace would not really be helpful in that context. But ASSERT also prints a failed condition text. ASSERT only tells you that desc->action was NULL. It does not tell you which IRQ has the desc->action == NULL. It is a bit a shame we don't have way to provide another message with ASSERT to help you debugging. So I would prefer the #ifdef NDEBUG solution. I'm a bit confused about your suggestion. Do you want to see two identical pieces of code before and after the `if ( test_bit(_IRQ_GUEST, >status) )`? One with `#ifndef NDEBUG`, other without. Is my understanding correct? No, I am suggesting to have the current if ( test_bit(...) ) protecred with #ifndef NDEBUG. No code duplication. Cheers, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [xen-4.10-testing test] 131151: regressions - FAIL
Jan Beulich writes ("Re: [Xen-devel] [xen-4.10-testing test] 131151: regressions - FAIL"): > So this can, by now, be called a reliably recurring failure. I concur with the assertions made on irc that 129676 got an (un)lucky pass and that it should be force pushed. So I have done that, with b6e203bc80e9d3e1dc7eb579d9665a77700d78cc Ian. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [RFC 11/16] irq: skip action avalability check for guest's IRQ
Julien, On 12.12.18 13:59, Julien Grall wrote: An ASSERT(...) is already embed in irq_get_guest_info(desc). Yep. I thought about the ASSERT(...) over the current printk yesterday. But I discarded it because it does not give you more information if something went really wrong as the stack trace would not really be helpful in that context. But ASSERT also prints a failed condition text. So I would prefer the #ifdef NDEBUG solution. I'm a bit confused about your suggestion. Do you want to see two identical pieces of code before and after the `if ( test_bit(_IRQ_GUEST, >status) )`? One with `#ifndef NDEBUG`, other without. Is my understanding correct? -- Sincerely, Andrii Anisov. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [RFC 08/16] gic: separate ppi processing
Hi, On 12/12/2018 12:52, Andrii Anisov wrote: On 12.12.18 14:46, Julien Grall wrote: I haven't answered because I am waiting the numbers. I got it. But TBM will not show effect of this patch. If TBM does not show effect, then you need a different benchmark. Probably by looking at how the PPI latency in Xen. Even the patch breaks TBM's functionality because TBM relies on a phys timer interrupt to be assigned to the domain (go through the spi path). Well, the problem is your patch assumes PPI cannot be assigned to the guest. While this is the case today, there are plan for handing over PPI (such as timer) to guest. This will help to remove the hack we have for the timer in both Xen code base and Guest code base. I would actually prefer if we try to optimize do_IRQ. Actually, this patch has some good ideas that can be re-used in the current code. For instance, I think you can drop the logic that use _IRQ_PENDING because the interrupt can never be received to another PE while active (this will be cleared by desc->handler->end()). Similarly the test on _IRQ_INPROGRESS could be dropped. Although, I would still like to keep set/clear _IRQ_INPROGRESS. With those changes, do_IRQ becomes a lot more like the function do_ppi. The only difference is the spinlock. But I don't believe this will bring that much difference in performance impact. After all the lock should unlikely be contended. Cheers, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [RFC 08/16] gic: separate ppi processing
On 12.12.18 14:46, Julien Grall wrote: I haven't answered because I am waiting the numbers. I got it. But TBM will not show effect of this patch. Even the patch breaks TBM's functionality because TBM relies on a phys timer interrupt to be assigned to the domain (go through the spi path). -- Sincerely, Andrii Anisov. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [RFC 13/16] gic-vgic: skip irqs locking
On 12.12.18 14:44, Julien Grall wrote: At the moment, I am happy with the second chunk of this patch to go. I am still unconvinced #1 is the right thing to go. I got it. So keep the patch still for now. I plan to send separately patches you have reviewed first. Then evaluate the rest with TBM. Does it fit your vision? Can you send separately patches I said I am happy with for Xen 4.12? We can then continue to discuss the others with numbers and see if we can merge them in time. Yep. -- Sincerely, Andrii Anisov. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [RFC 08/16] gic: separate ppi processing
Hi, On 12/12/2018 12:37, Andrii Anisov wrote: Julien, Sorry for bothering you too much. What about this patch? I haven't answered because I am waiting the numbers. Cheers, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [RFC 13/16] gic-vgic: skip irqs locking
Hi Andrii, On 12/12/2018 12:35, Andrii Anisov wrote: On 12.12.18 14:07, Julien Grall wrote: This chunk relies on patch #1, am I correct? For sure, it is. If so, this should be written in the commit message that this was introduced recently. This helps to figure out whether the patch can be merged before the rest. Do you mean I can prepare it for 4.12 together with #1? At the moment, I am happy with the second chunk of this patch to go. I am still unconvinced #1 is the right thing to go. I plan to send separately patches you have reviewed first. Then evaluate the rest with TBM. Does it fit your vision? Can you send separately patches I said I am happy with for Xen 4.12? We can then continue to discuss the others with numbers and see if we can merge them in time. Cheers, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [RFC 08/16] gic: separate ppi processing
Julien, Sorry for bothering you too much. What about this patch? -- Sincerely, Andrii Anisov. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [RFC 13/16] gic-vgic: skip irqs locking
On 12.12.18 14:07, Julien Grall wrote: This chunk relies on patch #1, am I correct? For sure, it is. If so, this should be written in the commit message that this was introduced recently. This helps to figure out whether the patch can be merged before the rest. Do you mean I can prepare it for 4.12 together with #1? I plan to send separately patches you have reviewed first. Then evaluate the rest with TBM. Does it fit your vision? I would add an ASSERT(!local_is_irq_enabled()) on top to show that this should be called with interrupt disabled. OK. -- Sincerely, Andrii Anisov. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Interrupt injection with ISR set on Intel hardware
On Wed, Dec 12, 2018 at 11:48:52AM +, Tian, Kevin wrote: > > From: Roger Pau Monné [mailto:roger@citrix.com] > > Sent: Wednesday, December 12, 2018 7:25 PM > > > > On Wed, Dec 12, 2018 at 10:36:44AM +, Tian, Kevin wrote: > > > > From: Roger Pau Monné [mailto:roger@citrix.com] > > > > Sent: Monday, October 15, 2018 6:30 PM > > > > (XEN) [22642] POWERTYPE 4 > > > > (XEN) [22643] IDLE PPR 0x0020 > > > > (XEN)IRR > > > > > > 00 > > > > 00 > > > > (XEN)ISR > > > > > > 02 > > > > 00 > > > > (XEN) [22644] WAKE PPR 0x0020 > > > > (XEN)IRR > > > > > > 02 > > > > 00 > > > > (XEN)ISR > > > > > > 02 > > > > 00 > > > > > > looks pending IRR (0x21) doesn't always trigger a spurious interrupt? > > > > Yes, that's correct. Having a pending IRR and going idle doesn't > > always trigger the spurious interrupt re-injection. > > > > > is it a fixed pattern after how many rounds of Cstate enter/exit with > > > pending IRR(0x21) then you see assertion happened (in this example > > > it happens at 3rd time)? > > > > It's not a fixed pattern, here's another trace with IRR(0x21) being > > pending just once during the Cstate transitions: > > did you observe a case where such asset may occur when IRR(0x21) > is cleared but ISR (0x21) is set? No, I've always seen both ISR and IRR set when the interrupt injection happens. This of course doesn't mean it's not possible, but I have not seen any trace with ISR(0x21) set and IRR(0x21) clear. Thanks, Roger. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [RFC 06/16] gic: drop interrupts enabling on interrupts processing
Hi, On 28/11/2018 22:06, Julien Grall wrote: On 28/11/2018 21:32, Andrii Anisov wrote: From: Andrii Anisov This reduces the number of context switches in case we have an interrupt storm. We will read out all of those interrupt in the loop anyway. This needs a better explanation. You might want to have a look at the details I provided in another discussion. In general, I would like any changes in the vGIC to be very detailed. As they are usually subbtle implication on the rest of the code base. This patch is suitable for Xen 4.12 with more comments in the commit message. Cheers, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [RFC 13/16] gic-vgic: skip irqs locking
Hi, On 28/11/2018 21:32, Andrii Anisov wrote: From: Andrii Anisov Those fucntions are called under IRQs disabled already, so avoid s/fucntions/ additional flags saving and restore. Signed-off-by: Andrii Anisov --- xen/arch/arm/gic-vgic.c | 10 -- 1 file changed, 4 insertions(+), 6 deletions(-) diff --git a/xen/arch/arm/gic-vgic.c b/xen/arch/arm/gic-vgic.c index 5b73bbd..e511f91 100644 --- a/xen/arch/arm/gic-vgic.c +++ b/xen/arch/arm/gic-vgic.c @@ -268,7 +268,6 @@ static void gic_update_one_lr(struct vcpu *v, int i) void vgic_sync_from_lrs(struct vcpu *v) { int i = 0; -unsigned long flags; unsigned int nr_lrs = gic_get_nr_lrs(); /* The idle domain has no LRs to be cleared. Since gic_restore_state @@ -279,7 +278,7 @@ void vgic_sync_from_lrs(struct vcpu *v) gic_hw_ops->update_hcr_status(GICH_HCR_UIE, false); -spin_lock_irqsave(>arch.vgic.lock, flags); +spin_lock(>arch.vgic.lock); This chunk relies on patch #1, am I correct? If so, this should be written in the commit message that this was introduced recently. This helps to figure out whether the patch can be merged before the rest. while ((i = find_next_bit((const unsigned long *) _cpu(lr_mask), nr_lrs, i)) < nr_lrs ) { @@ -287,7 +286,7 @@ void vgic_sync_from_lrs(struct vcpu *v) i++; } -spin_unlock_irqrestore(>arch.vgic.lock, flags); +spin_unlock(>arch.vgic.lock); } static void gic_restore_pending_irqs(struct vcpu *v) @@ -295,11 +294,10 @@ static void gic_restore_pending_irqs(struct vcpu *v) int lr = 0; struct pending_irq *p, *t, *p_r; struct list_head *inflight_r; -unsigned long flags; unsigned int nr_lrs = gic_get_nr_lrs(); int lrs = nr_lrs; -spin_lock_irqsave(>arch.vgic.lock, flags); +spin_lock(>arch.vgic.lock); I would add an ASSERT(!local_is_irq_enabled()) on top to show that this should be called with interrupt disabled. if ( list_empty(>arch.vgic.lr_pending) ) goto out; @@ -343,7 +341,7 @@ found: } out: -spin_unlock_irqrestore(>arch.vgic.lock, flags); +spin_unlock(>arch.vgic.lock); } void gic_clear_pending_irqs(struct vcpu *v) Cheers, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [Qemu-devel] xen_disk qdevification
Olaf Hering writes: > On Fri, Nov 02, Kevin Wolf wrote: > >> A while ago, a downstream patch review found out that there are some QMP >> commands that would immediately crash if a xen_disk device were present >> because of the lacking qdevification. This is not the code quality >> standard I envision for QEMU. It's time for non-qdev devices to go. > > Do you have that backwards by any chance? IMO the presence of assert() > contributes to bad code quality, not the drivers that trigger those > asserts. It is bad enough that two QEMU releases went out while being in > bad shape. Converting block devices to the qdev infrastructure (introduced in 2009) has been a longwinded affair. We've repeatedly reminded people in charge of the stragglers to update their code. Neglecting to update code to current infrastructure creates a burden and a risk. The burden is on the maintainers of the infrastructure: they get to drag along outmoded infrastructure. The risk is on the users of the code: it becomes a special case, and eventually acquires its special bugs. Putting the blame for them entirely on the maintainers of the infrastructure is not fair. > Anyway, hopefully Paul or whoever will find the time and energy to > convert the code at some point. Yes, Paul's taking care of it. Much appreciated. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] xen_disk qdevification (was: [PATCH 0/3] Performance improvements for xen_disk v2)
Am 12.12.2018 um 09:59 hat Olaf Hering geschrieben: > On Fri, Nov 02, Kevin Wolf wrote: > > > A while ago, a downstream patch review found out that there are some QMP > > commands that would immediately crash if a xen_disk device were present > > because of the lacking qdevification. This is not the code quality > > standard I envision for QEMU. It's time for non-qdev devices to go. > > Do you have that backwards by any chance? IMO the presence of assert() > contributes to bad code quality, not the drivers that trigger those > asserts. You like shooting the messenger, it seems? Bugs aren't bad, only catching them is? But anyway, in this case, I seem to remember it was a plain old segfault, not a failed assertion. Kevin signature.asc Description: PGP signature ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [RFC 11/16] irq: skip action avalability check for guest's IRQ
On 12/12/2018 11:30, Andrii Anisov wrote: On 11.12.18 16:48, Julien Grall wrote: And you can't see any potential race in that code happening in the future? It is protected with `desc->lock` so far. If one decided to get it from under the lock, the race is possible with `release_irq()`. Also getting an unknown interrupt is very unlikely on a non-debug platform. I am tempted to keep the code at the same place but protect with an #ifndef NDEBUG. What do you think? Well, I think about a correspondent ASSERT for the guest IRQ. Like following: if ( test_bit(_IRQ_GUEST, >status) ) {>> struct irq_guest *info = irq_get_guest_info(desc); + ASSERT( desc->action != NULL ); What would you prefer? An ASSERT(...) is already embed in irq_get_guest_info(desc). I thought about the ASSERT(...) over the current printk yesterday. But I discarded it because it does not give you more information if something went really wrong as the stack trace would not really be helpful in that context. So I would prefer the #ifdef NDEBUG solution. Cheers, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [Qemu-devel] [PATCH for-3.2 v5 07/19] hw: apply accel compat properties without touching globals
Hi On Mon, Dec 10, 2018 at 8:55 PM Igor Mammedov wrote: > > On Mon, 10 Dec 2018 17:45:22 +0100 > Igor Mammedov wrote: > > > On Tue, 4 Dec 2018 18:20:11 +0400 > > Marc-André Lureau wrote: > > > > > Instead of registering compat properties as globals, let's keep them > > > in their own array, to avoid mixing with user globals. > > > > > > Introduce object_apply_global_props() function, to apply compatibility > > > properties from a GPtrArray. > > > > > > Signed-off-by: Marc-André Lureau > > > --- > > > include/hw/qdev-core.h | 10 ++ > > > include/qom/object.h | 3 +++ > > > include/sysemu/accel.h | 4 +--- > > > accel/accel.c | 12 > > > hw/core/qdev.c | 9 + > > > hw/xen/xen-common.c| 9 ++--- > > > qom/object.c | 25 + > > > vl.c | 1 - > > > 8 files changed, 54 insertions(+), 19 deletions(-) > > > > > [...] > > > diff --git a/hw/xen/xen-common.c b/hw/xen/xen-common.c > > > index 6ec14c73ca..4532aa8632 100644 > > > --- a/hw/xen/xen-common.c > > > +++ b/hw/xen/xen-common.c > > > @@ -174,18 +174,21 @@ static GlobalProperty xen_compat_props[] = { > > > .driver = "migration", > > > .property = "send-section-footer", > > > .value = "off", > > > -}, > > > -{ /* end of list */ }, > > > +} > > > }; > > > > > > static void xen_accel_class_init(ObjectClass *oc, void *data) > > > { > > > AccelClass *ac = ACCEL_CLASS(oc); > > > + > > > ac->name = "Xen"; > > > ac->init_machine = xen_init; > > > ac->setup_post = xen_setup_post; > > > ac->allowed = _allowed; > > > -ac->global_props = xen_compat_props; > > > +ac->compat_props = g_ptr_array_new(); > > where is matching free for that? > can we at least annotate it somehow so that valgrind won't complain about > this leak? If you check my commits on qemu, you should see that I do care (too much?) about leaks :) In this case though, I don't see valgrind or asan complaining, I guess it's still a reachable reference. Do you think a FIXME comment would be helpful? (/me wish we had a proper object system, GObject, but that ship as long sailed..) > > > > > > + > > > +compat_props_add(ac->compat_props, > > > + xen_compat_props, G_N_ELEMENTS(xen_compat_props)); > > > } > > > > > > #define TYPE_XEN_ACCEL ACCEL_CLASS_NAME("xen") > > > diff --git a/qom/object.c b/qom/object.c > > > index 17921c0a71..dbdab0aead 100644 > > > --- a/qom/object.c > > > +++ b/qom/object.c > > > @@ -370,6 +370,31 @@ static void object_post_init_with_type(Object *obj, > > > TypeImpl *ti) > > > } > > > } > > > > > > +void object_apply_global_props(Object *obj, const GPtrArray *props, > > > Error **errp) > > > +{ > > > +Error *err = NULL; > > > +int i; > > > + > > > +if (!props) { > > > +return; > > > +} > > > + > > > +for (i = 0; i < props->len; i++) { > > > +GlobalProperty *p = g_ptr_array_index(props, i); > > > + > > > +if (object_dynamic_cast(obj, p->driver) == NULL) { > > > +continue; > > > +} > > > +p->used = true; > > > +object_property_parse(obj, p->value, p->property, ); > > > +if (err != NULL) { > > > +error_prepend(, "can't apply global %s.%s=%s: ", > > > + p->driver, p->property, p->value); > > > +error_propagate(errp, err); > > > +} > > > +} > > > +} > > > + > > > static void object_initialize_with_type(void *data, size_t size, > > > TypeImpl *type) > > > { > > > Object *obj = data; > > > diff --git a/vl.c b/vl.c > > > index a5ae5f23d2..88ba658572 100644 > > > --- a/vl.c > > > +++ b/vl.c > > > @@ -2968,7 +2968,6 @@ static void user_register_global_props(void) > > > */ > > > static void register_global_properties(MachineState *ms) > > > { > > > -accel_register_compat_props(ms->accelerator); > > > machine_register_compat_props(ms); > > > user_register_global_props(); > > > } > > > > > > -- Marc-André Lureau ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [RFC 05/16] gic-vgic: Drop an excessive clear_lrs
On 12/12/2018 11:01, Andrii Anisov wrote: Hello Julien, On 11.12.18 16:33, Julien Grall wrote: With #ifndef NDEBUG and the appropriate comment: Will do. Reviewed-by: Julien Grall Thank you. Feel free to resent it alone so it can be merged to Xen 4.12. What about getting it together with "[RFC 11/16] irq: skip action avalability check for guest's IRQ"? I am fine with that :). Cheers, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 15/25] argo: implement the sendv op
>>> On 01.12.18 at 02:32, wrote: > +static void > +argo_signal_domain(struct domain *d) > +{ > +argo_dprintk("signalling domid:%d\n", d->domain_id); > + > +if ( !d->argo ) /* This can happen if the domain is being destroyed */ > +return; If such a precaution is necessary, how is it guaranteed that the pointer doesn't change to NULL between the check above and ... > +evtchn_send(d, d->argo->evtchn_port); ... the use here? > +static int > +argo_iov_count(XEN_GUEST_HANDLE_PARAM(argo_iov_t) iovs, uint8_t niov, > + uint32_t *count) > +{ > +argo_iov_t iov; > +uint32_t sum_iov_lens = 0; > +int ret; > + > +if ( niov > ARGO_MAXIOV ) > +return -EINVAL; > + > +while ( niov-- ) > +{ > +ret = copy_from_guest_errno(, iovs, 1); > +if ( ret ) > +return ret; > + > +/* check each to protect sum against integer overflow */ > +if ( iov.iov_len > ARGO_MAX_RING_SIZE ) > +return -EINVAL; > + > +sum_iov_lens += iov.iov_len; > + > +/* > + * Again protect sum from integer overflow > + * and ensure total msg size will be within bounds. > + */ > +if ( sum_iov_lens > ARGO_MAX_MSG_SIZE ) > +return -EINVAL; So you do overflow checks here. But how does this help when ... > +guest_handle_add_offset(iovs, 1); > +} > + > +*count = sum_iov_lens; > +return 0; > +} > + > +static int > +argo_ringbuf_insert(struct domain *d, > +struct argo_ring_info *ring_info, > +const struct argo_ring_id *src_id, > +XEN_GUEST_HANDLE_PARAM(argo_iov_t) iovs, uint8_t niov, > +uint32_t message_type, unsigned long *out_len) > +{ > +argo_ring_t ring; > +struct argo_ring_message_header mh = { 0 }; > +int32_t sp; > +int32_t ret = 0; > +uint32_t len; > +uint32_t iov_len; > +uint32_t sum_iov_len = 0; > + > +ASSERT(spin_is_locked(_info->lock)); > + > +if ( (ret = argo_iov_count(iovs, niov, )) ) > +return ret; > + > +if ( ((ARGO_ROUNDUP(len) + sizeof (struct argo_ring_message_header) ) >= > + ring_info->len) > + || (len > ARGO_MAX_MSG_SIZE) ) > +return -EMSGSIZE; > + > +do { > +ret = argo_ringbuf_get_rx_ptr(ring_info, _ptr); > +if ( ret ) > +break; > + > +argo_sanitize_ring(, ring_info); > + > +argo_dprintk("ring.tx_ptr=%d ring.rx_ptr=%d ring.len=%d" > + " ring_info->tx_ptr=%d\n", > + ring.tx_ptr, ring.rx_ptr, ring.len, ring_info->tx_ptr); > + > +if ( ring.rx_ptr == ring.tx_ptr ) > +sp = ring_info->len; > +else > +{ > +sp = ring.rx_ptr - ring.tx_ptr; > +if ( sp < 0 ) > +sp += ring.len; > +} > + > +if ( (ARGO_ROUNDUP(len) + sizeof(struct argo_ring_message_header)) > >= sp ) > +{ > +argo_dprintk("EAGAIN\n"); > +ret = -EAGAIN; > +break; > +} > + > +mh.len = len + sizeof(struct argo_ring_message_header); > +mh.source.port = src_id->addr.port; > +mh.source.domain_id = src_id->addr.domain_id; > +mh.message_type = message_type; > + > +/* > + * For this copy to the guest ring, tx_ptr is always 16-byte aligned > + * and the message header is 16 bytes long. > + */ > +BUILD_BUG_ON(sizeof(struct argo_ring_message_header) != > ARGO_ROUNDUP(1)); > + > +if ( (ret = argo_memcpy_to_guest_ring(ring_info, > + ring.tx_ptr + > sizeof(argo_ring_t), > + , > + XEN_GUEST_HANDLE_NULL(uint8_t), > + sizeof(mh))) ) > +break; > + > +ring.tx_ptr += sizeof(mh); > +if ( ring.tx_ptr == ring_info->len ) > +ring.tx_ptr = 0; > + > +while ( niov-- ) > +{ > +XEN_GUEST_HANDLE_PARAM(uint8_t) bufp_hnd; > +XEN_GUEST_HANDLE(uint8_t) buf_hnd; > +argo_iov_t iov; > + > +ret = copy_from_guest_errno(, iovs, 1); ... here you copy the structure again from guest memory, at which point it may have changed? I see you do some checks further down, but the question then is - is the checking in argo_iov_count() redundant and hence unnecessary? Are you really safe here against inconsistencies between the first and second reads? If so, a thorough explanation in a comment is needed here. > +if ( ret ) > +break; > + > +bufp_hnd = guest_handle_from_ptr((uintptr_t)iov.iov_base, > uint8_t); Please use a handle in the public interface instead of such a cast. > +buf_hnd = guest_handle_from_param(bufp_hnd, uint8_t); > +iov_len
Re: [Xen-devel] Interrupt injection with ISR set on Intel hardware
> From: Roger Pau Monné [mailto:roger@citrix.com] > Sent: Wednesday, December 12, 2018 7:25 PM > > On Wed, Dec 12, 2018 at 10:36:44AM +, Tian, Kevin wrote: > > > From: Roger Pau Monné [mailto:roger@citrix.com] > > > Sent: Monday, October 15, 2018 6:30 PM > > > (XEN) [22642] POWERTYPE 4 > > > (XEN) [22643] IDLE PPR 0x0020 > > > (XEN)IRR > > > > 00 > > > 00 > > > (XEN)ISR > > > > 02 > > > 00 > > > (XEN) [22644] WAKE PPR 0x0020 > > > (XEN)IRR > > > > 02 > > > 00 > > > (XEN)ISR > > > > 02 > > > 00 > > > > looks pending IRR (0x21) doesn't always trigger a spurious interrupt? > > Yes, that's correct. Having a pending IRR and going idle doesn't > always trigger the spurious interrupt re-injection. > > > is it a fixed pattern after how many rounds of Cstate enter/exit with > > pending IRR(0x21) then you see assertion happened (in this example > > it happens at 3rd time)? > > It's not a fixed pattern, here's another trace with IRR(0x21) being > pending just once during the Cstate transitions: did you observe a case where such asset may occur when IRR(0x21) is cleared but ISR (0x21) is set? I want to understand whether pending ISR alone triggers such problem or another pending IRR together is necessary... > > (XEN) *** Pending EOI error *** > (XEN) cpu #1, irq 30, vector 0x21, sp 1 > (XEN) Peoi stack: sp 1 > (XEN) [ 0] irq 30, vec 0x21, ready 0, ISR 1, TMR 0, IRR 0 > (XEN) Peoi stack trace records: > (XEN) [ 7886] ACK_POST PPR 0x0010 > (XEN)IRR > 00 > 00 > (XEN)ISR > 00 > 00 > (XEN) [ 7887] POWERTYPE 5 > (XEN) [ 7888] IDLE PPR 0x0010 > (XEN)IRR > 00 > 00 > (XEN)ISR > 00 > 00 > (XEN) [ 7889] WAKE PPR 0x0010 > (XEN)IRR > 00 > 04 > (XEN)ISR > 00 > 00 > (XEN) [ 7890] ACK_PRE PPR 0x00f0 > (XEN)IRR > 00 > 00 > (XEN)ISR > 00 > 04 > (XEN) [ 7891] ACK_POST PPR 0x0010 > (XEN)IRR > 00 > 00 > (XEN)ISR > 00 > 00 > (XEN) [ 7892] POWERTYPE 5 > (XEN) [ 7893] IDLE PPR 0x0010 > (XEN)IRR > 00 > 00 > (XEN)ISR > 00 > 00 > (XEN) [ 7894] WAKE PPR 0x0010 > (XEN)IRR > 00 > 04 > (XEN)ISR > 00 > 00 > (XEN) [ 7895] ACK_PRE PPR 0x00f0 > (XEN)IRR > 00 > 00 > (XEN)ISR > 00 > 04 > (XEN) [ 7896] ACK_POST PPR 0x0010 > (XEN)IRR > 00 > 00 > (XEN)ISR > 00 > 00 > (XEN) [ 7897] POWERTYPE 5 > (XEN) [ 7898] IDLE PPR 0x0010 > (XEN)IRR > 00 > 00 > (XEN)ISR > 00 > 00 > (XEN) [ 7899] WAKE PPR 0x0010 > (XEN)IRR > 00 > 04 > (XEN)ISR > 00 > 00 > (XEN) [ 7900] ACK_PRE PPR 0x00f0 > (XEN)IRR > 00 > 00 > (XEN)ISR > 00 > 04 > (XEN) [ 7901] ACK_POST PPR 0x0010 > (XEN)IRR > 00 > 00 > (XEN)ISR >
Re: [Xen-devel] [RFC 11/16] irq: skip action avalability check for guest's IRQ
On 11.12.18 16:48, Julien Grall wrote: And you can't see any potential race in that code happening in the future? It is protected with `desc->lock` so far. If one decided to get it from under the lock, the race is possible with `release_irq()`. Also getting an unknown interrupt is very unlikely on a non-debug platform. I am tempted to keep the code at the same place but protect with an #ifndef NDEBUG. What do you think? Well, I think about a correspondent ASSERT for the guest IRQ. Like following: if ( test_bit(_IRQ_GUEST, >status) ) {>> struct irq_guest *info = irq_get_guest_info(desc); +ASSERT( desc->action != NULL ); What would you prefer? -- Sincerely, Andrii Anisov. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Interrupt injection with ISR set on Intel hardware
On Wed, Dec 12, 2018 at 10:36:44AM +, Tian, Kevin wrote: > > From: Roger Pau Monné [mailto:roger@citrix.com] > > Sent: Monday, October 15, 2018 6:30 PM > > (XEN) [22642] POWERTYPE 4 > > (XEN) [22643] IDLE PPR 0x0020 > > (XEN)IRR > > 00 > > 00 > > (XEN)ISR > > 02 > > 00 > > (XEN) [22644] WAKE PPR 0x0020 > > (XEN)IRR > > 02 > > 00 > > (XEN)ISR > > 02 > > 00 > > looks pending IRR (0x21) doesn't always trigger a spurious interrupt? Yes, that's correct. Having a pending IRR and going idle doesn't always trigger the spurious interrupt re-injection. > is it a fixed pattern after how many rounds of Cstate enter/exit with > pending IRR(0x21) then you see assertion happened (in this example > it happens at 3rd time)? It's not a fixed pattern, here's another trace with IRR(0x21) being pending just once during the Cstate transitions: (XEN) *** Pending EOI error *** (XEN) cpu #1, irq 30, vector 0x21, sp 1 (XEN) Peoi stack: sp 1 (XEN) [ 0] irq 30, vec 0x21, ready 0, ISR 1, TMR 0, IRR 0 (XEN) Peoi stack trace records: (XEN) [ 7886] ACK_POST PPR 0x0010 (XEN)IRR (XEN)ISR (XEN) [ 7887] POWERTYPE 5 (XEN) [ 7888] IDLE PPR 0x0010 (XEN)IRR (XEN)ISR (XEN) [ 7889] WAKE PPR 0x0010 (XEN)IRR 0004 (XEN)ISR (XEN) [ 7890] ACK_PRE PPR 0x00f0 (XEN)IRR (XEN)ISR 0004 (XEN) [ 7891] ACK_POST PPR 0x0010 (XEN)IRR (XEN)ISR (XEN) [ 7892] POWERTYPE 5 (XEN) [ 7893] IDLE PPR 0x0010 (XEN)IRR (XEN)ISR (XEN) [ 7894] WAKE PPR 0x0010 (XEN)IRR 0004 (XEN)ISR (XEN) [ 7895] ACK_PRE PPR 0x00f0 (XEN)IRR (XEN)ISR 0004 (XEN) [ 7896] ACK_POST PPR 0x0010 (XEN)IRR (XEN)ISR (XEN) [ 7897] POWERTYPE 5 (XEN) [ 7898] IDLE PPR 0x0010 (XEN)IRR (XEN)ISR (XEN) [ 7899] WAKE PPR 0x0010 (XEN)IRR 0004 (XEN)ISR (XEN) [ 7900] ACK_PRE PPR 0x00f0 (XEN)IRR (XEN)ISR 0004 (XEN) [ 7901] ACK_POST PPR 0x0010 (XEN)IRR (XEN)ISR (XEN) [ 7902] POWERTYPE 5 (XEN) [ 7903] IDLE PPR 0x0010 (XEN)IRR (XEN)ISR (XEN) [ 7904] WAKE PPR 0x0010 (XEN)IRR (XEN)ISR (XEN) [ 7905] PUSH {sp 0, irq
Re: [Xen-devel] [PATCH v2 2/2] x86/dom0: improve paging memory usage calculations
On Wed, Dec 12, 2018 at 12:14:53PM +0100, Roger Pau Monné wrote: > On Wed, Dec 12, 2018 at 03:57:41AM -0700, Jan Beulich wrote: > > >>> On 12.12.18 at 11:16, wrote: > > > There are also further issues that I wanted to discuss in a separate > > > thread, what about foreign mappings? Dom0 will likely map a non > > > trivial amount of grants and foreign mappings, which will also require > > > p2m/IOMMU page table entries. > > > > Hmm, good point. Then again this is a runtime requirement, > > whereas here we want to get the boot time estimate right. At > > runtime lack of memory for P2M tables will simply result in > > -ENOMEM. > > But Xen runtime memory is also tied to the boot estimates if there's > no dom0_mem parameter specified on the command line. I would expect ^ wouldn't > Dom0 to balloon down memory when it attempts to map BARs, even at > runtime. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel