Re: [Xen-devel] [BUG] Xen-ballooned memory never returned to domain after partial-free
On 11.12.19 23:08, Nicholas Tsirakis wrote: Hello, The issue I'm seeing is that pages of previously-xenballooned memory are getting trapped in the balloon on free, specifically when they are free'd in batches (i.e. not all at once). The first batch is restored to the domain properly, but subsequent frees are not. Truthfully I'm not sure if this is a bug or not, but the behavior I'm seeing doesn't seem to make sense. Note that this "bug" is in the balloon driver, but the behavior is seen when using the gnttab API, which utilizes the balloon in the background. -- This issue is better illustrated as an example, seen below. Note that the file in question is drivers/xen/balloon.c: Kernel version: 4.19.*, code seems consistent on master as well Relevant configs: - CONFIG_MEMORY_HOTPLUG not set - CONFIG_XEN_BALLOON_MEMORY_HOTPLUG not set * current_pages = # of pages assigned to domain * target_pages = # of pages we want assigned to domain * credit = target - current Start with current_pages/target_pages = 20 pages 1. alloc 5 pages with gnttab_alloc_pages(). current_pages = 15, credit = 5. 2. alloc 3 pages with gnttab_alloc_pages(). current_pages = 12, credit = 8. 3. some time later, free the last 3 pages with gnttab_free_pages(). 4. 3 pages go back to balloon and balloon worker is scheduled since credit > 0. * Relevant part of balloon worker shown below: do { ... credit = current_credit(); if (credit > 0) { if (balloon_is_inflated()) state = increase_reservation(credit); else state = reserve_additional_memory(); } ... } while (credit && state == BP_DONE); 5. credit > 0 and the balloon contains 3 pages, so run increase_reservation. 3 pages are restored to domain, correctly. current_pages = 15, credit = 5. 6. at this point credit is still > 0, so we loop again. 7. this time, the balloon has 0 pages, so we call reserve_additional_memory, seen below. note that CONFIG_XEN_BALLOON_MEMORY_HOTPLUG is disabled, so this funciton is very sparse. static enum bp_state reserve_additional_memory(void) { balloon_stats.target_pages = balloon_stats.current_pages; return BP_ECANCELED; } 8. now target = current = 15, which drops our credit down to 0. And I think this is the problem. We want here: balloon_stats.target_pages = balloon_stats.current_pages + balloon_stats.target_unpopulated; This should fix it. Thanks for the detailed analysis! Does the attached patch work for you? And are you fine with the "Reported-by:" added? Juergen >From 7cf6cf2b94ee11002dab439fb4ed5c7dcc1a971b Mon Sep 17 00:00:00 2001 From: Juergen Gross Date: Thu, 12 Dec 2019 08:12:26 +0100 Subject: [PATCH] xen/balloon: fix ballooned page accounting without hotplug enabled When CONFIG_XEN_BALLOON_MEMORY_HOTPLUG is not defined reserve_additional_memory() will set balloon_stats.target_pages to a wrong value in case there are still some ballooned pages allocated via alloc_xenballooned_pages(). This will result in balloon_process() no longer be triggered when ballooned pages are freed in batches. Reported-by: Nicholas Tsirakis Signed-off-by: Juergen Gross --- drivers/xen/balloon.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c index 4f2e78a5e4db..0c142bcab79d 100644 --- a/drivers/xen/balloon.c +++ b/drivers/xen/balloon.c @@ -394,7 +394,8 @@ static struct notifier_block xen_memory_nb = { #else static enum bp_state reserve_additional_memory(void) { - balloon_stats.target_pages = balloon_stats.current_pages; + balloon_stats.target_pages = balloon_stats.current_pages + + balloon_stats.target_unpopulated; return BP_ECANCELED; } #endif /* CONFIG_XEN_BALLOON_MEMORY_HOTPLUG */ -- 2.16.4 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [ovmf bisection] complete build-i386
branch xen-unstable xenbranch xen-unstable job build-i386 testid xen-build Tree: ovmf https://github.com/tianocore/edk2.git Tree: qemu git://xenbits.xen.org/qemu-xen-traditional.git Tree: qemuu git://xenbits.xen.org/qemu-xen.git Tree: seabios git://xenbits.xen.org/osstest/seabios.git Tree: xen git://xenbits.xen.org/xen.git *** Found and reproduced problem changeset *** Bug is in tree: ovmf https://github.com/tianocore/edk2.git Bug introduced: 13c5e34a1b8bfedbd10ea038cfcbae5caeab6652 Bug not present: 804666c86e7b6f04fe5c5cfdb13199c19e0e99b0 Last fail repro: http://logs.test-lab.xenproject.org/osstest/logs/144757/ commit 13c5e34a1b8bfedbd10ea038cfcbae5caeab6652 Author: Bob Feng Date: Mon Dec 2 16:24:19 2019 +0800 BaseTools: Add build option for dependency file generation BZ: https://bugzilla.tianocore.org/show_bug.cgi?id=2311 Add /showIncludes for msvc and -MMD -MF $@.deps for GCC and CLANG Remove /MP for msvc since /MP does not work with /showIncludes Signed-off-by: Bob Feng Cc: Liming Gao Cc: Steven Shi Cc: Michael D Kinney Reviewed-by: Liming Gao For bisection revision-tuple graph see: http://logs.test-lab.xenproject.org/osstest/results/bisect/ovmf/build-i386.xen-build.html Revision IDs in each graph node refer, respectively, to the Trees above. Running cs-bisection-step --graph-out=/home/logs/results/bisect/ovmf/build-i386.xen-build --summary-out=tmp/144757.bisection-summary --basis-template=144637 --blessings=real,real-bisect ovmf build-i386 xen-build Searching for failure / basis pass: 144718 fail [host=huxelrebe0] / 144637 [host=elbling1] 144590 [host=chardonnay1] 144583 [host=albana0] 144578 [host=albana0] 144564 [host=albana0] 144527 ok. Failure / basis pass flights: 144718 / 144527 (tree with no url: minios) Tree: ovmf https://github.com/tianocore/edk2.git Tree: qemu git://xenbits.xen.org/qemu-xen-traditional.git Tree: qemuu git://xenbits.xen.org/qemu-xen.git Tree: seabios git://xenbits.xen.org/osstest/seabios.git Tree: xen git://xenbits.xen.org/xen.git Latest 2fe25a74d6fee3c2ac0b930f7f3596cb432e766e d0d8ad39ecb51cd7497cd524484fe09f50876798 933ebad2470a169504799a1d95b8e410bd9847ef f21b5a4aeb020f2a5e2c6503f906a9349dd2f069 4935a5433db28077fe6313f920bbedcd54516cec Basis pass 94d4efb54ec4ca894287276ce22d29b6261dbc0b d0d8ad39ecb51cd7497cd524484fe09f50876798 933ebad2470a169504799a1d95b8e410bd9847ef c9ba5276e3217ac6a1ec772dbebf568ba3a8a55d 05de315b00bf2951617b8ef28811b1f1f2dd5742 Generating revisions with ./adhoc-revtuple-generator https://github.com/tianocore/edk2.git#94d4efb54ec4ca894287276ce22d29b6261dbc0b-2fe25a74d6fee3c2ac0b930f7f3596cb432e766e git://xenbits.xen.org/qemu-xen-traditional.git#d0d8ad39ecb51cd7497cd524484fe09f50876798-d0d8ad39ecb51cd7497cd524484fe09f50876798 git://xenbits.xen.org/qemu-xen.git#933ebad2470a169504799a1d95b8e410bd9847ef-933ebad2470a169504799a1d95b8e410bd9847ef git://xenbits.xen.org/osstest/seabios.git#c9ba5276e3217ac6a1ec772dbebf568ba3a8a5\ 5d-f21b5a4aeb020f2a5e2c6503f906a9349dd2f069 git://xenbits.xen.org/xen.git#05de315b00bf2951617b8ef28811b1f1f2dd5742-4935a5433db28077fe6313f920bbedcd54516cec Use of uninitialized value $parents in array dereference at ./adhoc-revtuple-generator line 465. Use of uninitialized value in concatenation (.) or string at ./adhoc-revtuple-generator line 465. Loaded 12532 nodes in revision graph Searching for test results: 144527 pass 94d4efb54ec4ca894287276ce22d29b6261dbc0b d0d8ad39ecb51cd7497cd524484fe09f50876798 933ebad2470a169504799a1d95b8e410bd9847ef c9ba5276e3217ac6a1ec772dbebf568ba3a8a55d 05de315b00bf2951617b8ef28811b1f1f2dd5742 144578 [host=albana0] 144583 [host=albana0] 144564 [host=albana0] 144590 [host=chardonnay1] 144637 [host=elbling1] 144646 [host=huxelrebe1] 144676 [host=huxelrebe1] 144651 [host=huxelrebe1] 144703 [host=huxelrebe1] 144661 fail irrelevant 144683 fail irrelevant 144706 fail d3add11e87dace180387562d6f1951f2bffbd3d9 d0d8ad39ecb51cd7497cd524484fe09f50876798 933ebad2470a169504799a1d95b8e410bd9847ef f21b5a4aeb020f2a5e2c6503f906a9349dd2f069 4935a5433db28077fe6313f920bbedcd54516cec 144748 fail cb277815d5ea92718eed2d334641451ce65b0ff5 d0d8ad39ecb51cd7497cd524484fe09f50876798 933ebad2470a169504799a1d95b8e410bd9847ef f21b5a4aeb020f2a5e2c6503f906a9349dd2f069 b73aad4c8b6a767ce15cc8cb65f9eeab7bfccdae 144752 fail 13c5e34a1b8bfedbd10ea038cfcbae5caeab6652 d0d8ad39ecb51cd7497cd524484fe09f50876798 933ebad2470a169504799a1d95b8e410bd9847ef f21b5a4aeb020f2a5e2c6503f906a9349dd2f069 b73aad4c8b6a767ce15cc8cb65f9eeab7bfccdae 144693 fail 97eedf5dfbaffde33210fd88066247cf0b7d3325 d0d8ad39ecb51cd7497cd524484fe09f50876798 933ebad2470a169504799a1d95b8e410bd9847ef f21b5a4aeb020f2a5e2c6503f906a9349dd2f069 4935a5433db28077fe6313f920bbedcd54516cec 144718 fail 2fe25a74d6fee3c2ac0b930f7f3596cb432e766e
[Xen-devel] [xen-4.12-testing test] 144717: tolerable FAIL - PUSHED
flight 144717 xen-4.12-testing real [real] http://logs.test-lab.xenproject.org/osstest/logs/144717/ Failures :-/ but no regressions. Tests which did not succeed, but are not blocking: test-amd64-amd64-xl-qcow217 guest-localmigrate/x10 fail like 144587 test-amd64-i386-xl-pvshim12 guest-start fail never pass test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail never pass test-amd64-amd64-libvirt 13 migrate-support-checkfail never pass test-amd64-i386-libvirt 13 migrate-support-checkfail never pass test-arm64-arm64-xl-seattle 13 migrate-support-checkfail never pass test-arm64-arm64-xl-seattle 14 saverestore-support-checkfail never pass test-amd64-i386-libvirt-xsm 13 migrate-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-arm64-arm64-xl-xsm 13 migrate-support-checkfail never pass test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2 fail never pass test-arm64-arm64-xl-xsm 14 saverestore-support-checkfail never pass test-arm64-arm64-xl-thunderx 13 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 13 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 14 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit1 13 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 14 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit1 14 saverestore-support-checkfail never pass test-arm64-arm64-xl 13 migrate-support-checkfail never pass test-arm64-arm64-xl 14 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-vhd 12 migrate-support-checkfail never pass test-armhf-armhf-libvirt 13 migrate-support-checkfail never pass test-armhf-armhf-libvirt 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-rtds 13 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-arndale 13 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 14 saverestore-support-checkfail never pass test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop fail never pass test-armhf-armhf-xl-vhd 12 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 13 saverestore-support-checkfail never pass test-amd64-amd64-xl-qemuu-ws16-amd64 17 guest-stop fail never pass test-armhf-armhf-xl-credit2 13 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 14 saverestore-support-checkfail never pass test-amd64-amd64-xl-qemut-win7-amd64 17 guest-stop fail never pass test-armhf-armhf-xl-multivcpu 13 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 14 saverestore-support-checkfail never pass test-armhf-armhf-xl 13 migrate-support-checkfail never pass test-armhf-armhf-xl 14 saverestore-support-checkfail never pass test-amd64-i386-xl-qemuu-ws16-amd64 17 guest-stop fail never pass test-amd64-i386-xl-qemut-win7-amd64 17 guest-stop fail never pass test-armhf-armhf-xl-cubietruck 13 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 14 saverestore-support-checkfail never pass test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stop fail never pass test-arm64-arm64-libvirt-xsm 13 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 14 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-raw 12 migrate-support-checkfail never pass test-armhf-armhf-libvirt-raw 13 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit1 13 migrate-support-checkfail never pass test-armhf-armhf-xl-credit1 14 saverestore-support-checkfail never pass test-amd64-amd64-xl-qemut-ws16-amd64 17 guest-stop fail never pass test-amd64-i386-xl-qemut-ws16-amd64 17 guest-stop fail never pass version targeted for testing: xen 8f333d13917cbd66b655df18a4476ed58df9d16f baseline version: xen 212b8500cb394b3a664655f79ca0bdcb31246ff7 Last test of basis 144587 2019-12-06 12:05:57 Z5 days Testing same since 144717 2019-12-11 14:36:36 Z0 days1 attempts People who touched revisions under test: Andrew Cooper George Dunlap Jan Beulich Jeremi Piotrowski Julien Grall Kevin Tian Krzysztof Kolasa Mark Pryor jobs: build-amd64-xsm pass
Re: [Xen-devel] [PATCH v3 4/4] xen-blkback: support dynamic unbind/bind
On 11.12.19 16:29, Paul Durrant wrote: By simply re-attaching to shared rings during connect_ring() rather than assuming they are freshly allocated (i.e assuming the counters are zero) it is possible for vbd instances to be unbound and re-bound from and to (respectively) a running guest. This has been tested by running: while true; do fio --name=randwrite --ioengine=libaio --iodepth=16 \ --rw=randwrite --bs=4k --direct=1 --size=1G --verify=crc32; done in a PV guest whilst running: while true; do echo vbd-$DOMID-$VBD >unbind; echo unbound; sleep 5; echo vbd-$DOMID-$VBD >bind; echo bound; sleep 3; done in dom0 from /sys/bus/xen-backend/drivers/vbd to continuously unbind and re-bind its system disk image. This is a highly useful feature for a backend module as it allows it to be unloaded and re-loaded (i.e. updated) without requiring domUs to be halted. This was also tested by running: while true; do echo vbd-$DOMID-$VBD >unbind; echo unbound; sleep 5; rmmod xen-blkback; echo unloaded; sleep 1; modprobe xen-blkback; echo bound; cd $(pwd); sleep 3; done in dom0 whilst running the same loop as above in the (single) PV guest. Some (less stressful) testing has also been done using a Windows HVM guest with the latest 9.0 PV drivers installed. Signed-off-by: Paul Durrant Reviewed-by: Juergen Gross Juergen ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v3 3/4] xen/interface: re-define FRONT/BACK_RING_ATTACH()
On 11.12.19 16:29, Paul Durrant wrote: Currently these macros are defined to re-initialize a front/back ring (respectively) to values read from the shared ring in such a way that any requests/responses that are added to the shared ring whilst the front/back is detached will be skipped over. This, in general, is not a desirable semantic since most frontend implementations will eventually block waiting for a response which would either never appear or never be processed. Since the macros are currently unused, take this opportunity to re-define them to re-initialize a front/back ring using specified values. This also allows FRONT/BACK_RING_INIT() to be re-defined in terms of FRONT/BACK_RING_ATTACH() using a specified value of 0. NOTE: BACK_RING_ATTACH() will be used directly in a subsequent patch. Signed-off-by: Paul Durrant Reviewed-by: Juergen Gross Juergen ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v3 2/4] xenbus: limit when state is forced to closed
On 11.12.19 16:29, Paul Durrant wrote: If a driver probe() fails then leave the xenstore state alone. There is no reason to modify it as the failure may be due to transient resource allocation issues and hence a subsequent probe() may succeed. If the driver supports re-binding then only force state to closed during remove() only in the case when the toolstack may need to clean up. This can be detected by checking whether the state in xenstore has been set to closing prior to device removal. NOTE: Re-bind support is indicated by new boolean in struct xenbus_driver, which defaults to false. Subsequent patches will add support to some backend drivers. Signed-off-by: Paul Durrant Reviewed-by: Juergen Gross Juergen ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [xen-4.11-testing test] 144716: tolerable FAIL - PUSHED
flight 144716 xen-4.11-testing real [real] http://logs.test-lab.xenproject.org/osstest/logs/144716/ Failures :-/ but no regressions. Tests which did not succeed, but are not blocking: test-amd64-amd64-xl-qemuu-dmrestrict-amd64-dmrestrict 10 debian-hvm-install fail never pass test-amd64-i386-xl-qemuu-dmrestrict-amd64-dmrestrict 10 debian-hvm-install fail never pass test-amd64-i386-xl-pvshim12 guest-start fail never pass test-amd64-i386-libvirt 13 migrate-support-checkfail never pass test-amd64-amd64-libvirt 13 migrate-support-checkfail never pass test-arm64-arm64-xl-seattle 13 migrate-support-checkfail never pass test-arm64-arm64-xl-seattle 14 saverestore-support-checkfail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail never pass test-amd64-i386-libvirt-xsm 13 migrate-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-arm64-arm64-xl 13 migrate-support-checkfail never pass test-arm64-arm64-xl 14 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit1 13 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 13 migrate-support-checkfail never pass test-arm64-arm64-xl-credit1 14 saverestore-support-checkfail never pass test-arm64-arm64-xl-thunderx 14 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit2 13 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 14 saverestore-support-checkfail never pass test-arm64-arm64-xl-xsm 13 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 14 saverestore-support-checkfail never pass test-arm64-arm64-libvirt-xsm 13 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-rtds 13 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 14 saverestore-support-checkfail never pass test-armhf-armhf-libvirt 13 migrate-support-checkfail never pass test-armhf-armhf-libvirt 14 saverestore-support-checkfail never pass test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2 fail never pass test-amd64-amd64-libvirt-vhd 12 migrate-support-checkfail never pass test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop fail never pass test-armhf-armhf-libvirt-raw 12 migrate-support-checkfail never pass test-armhf-armhf-libvirt-raw 13 saverestore-support-checkfail never pass test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stop fail never pass test-armhf-armhf-xl-arndale 13 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 14 saverestore-support-checkfail never pass test-armhf-armhf-xl 13 migrate-support-checkfail never pass test-armhf-armhf-xl 14 saverestore-support-checkfail never pass test-amd64-amd64-xl-qemuu-ws16-amd64 17 guest-stop fail never pass test-armhf-armhf-xl-multivcpu 13 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit2 13 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 14 saverestore-support-checkfail never pass test-amd64-i386-xl-qemuu-ws16-amd64 17 guest-stop fail never pass test-armhf-armhf-xl-cubietruck 13 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 14 saverestore-support-checkfail never pass test-amd64-amd64-xl-qemut-win7-amd64 17 guest-stop fail never pass test-amd64-i386-xl-qemut-win7-amd64 17 guest-stop fail never pass test-armhf-armhf-xl-vhd 12 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 13 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit1 13 migrate-support-checkfail never pass test-armhf-armhf-xl-credit1 14 saverestore-support-checkfail never pass test-amd64-i386-xl-qemut-ws16-amd64 17 guest-stop fail never pass test-amd64-amd64-xl-qemut-ws16-amd64 17 guest-stop fail never pass version targeted for testing: xen f562c6bb93a284033bf6f5af06287a71bc40a110 baseline version: xen 239d37e514c93e29d50d71f734b1dc453b2236a6 Last test of basis 144505 2019-12-03 10:09:29 Z8 days Testing same since 144716 2019-12-11 14:36:36 Z0 days1 attempts People who touched revisions under test: Andrew Cooper Anthony PERARD George Dunlap Jan Beulich Joe Jin Julien Grall Julien
[Xen-devel] [xen-unstable test] 144712: tolerable FAIL - PUSHED
flight 144712 xen-unstable real [real] http://logs.test-lab.xenproject.org/osstest/logs/144712/ Failures :-/ but no regressions. Regressions which are regarded as allowable (not blocking): test-armhf-armhf-xl-rtds16 guest-start/debian.repeat fail REGR. vs. 144668 Tests which did not succeed, but are not blocking: test-amd64-amd64-xl-rtds 18 guest-localmigrate/x10 fail like 144668 test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stopfail like 144668 test-armhf-armhf-libvirt 14 saverestore-support-checkfail like 144668 test-amd64-amd64-xl-qemut-win7-amd64 17 guest-stopfail like 144668 test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop fail like 144668 test-amd64-i386-xl-qemut-win7-amd64 17 guest-stop fail like 144668 test-armhf-armhf-libvirt-raw 13 saverestore-support-checkfail like 144668 test-amd64-amd64-xl-qemuu-ws16-amd64 17 guest-stopfail like 144668 test-amd64-amd64-xl-qemut-ws16-amd64 17 guest-stopfail like 144668 test-amd64-i386-xl-qemuu-ws16-amd64 17 guest-stop fail like 144668 test-amd64-i386-xl-pvshim12 guest-start fail never pass test-amd64-amd64-libvirt 13 migrate-support-checkfail never pass test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail never pass test-amd64-i386-libvirt-xsm 13 migrate-support-checkfail never pass test-amd64-i386-libvirt 13 migrate-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-arm64-arm64-xl-credit2 13 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 14 saverestore-support-checkfail never pass test-arm64-arm64-xl-xsm 13 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 14 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit1 13 migrate-support-checkfail never pass test-arm64-arm64-xl-credit1 14 saverestore-support-checkfail never pass test-arm64-arm64-xl 13 migrate-support-checkfail never pass test-arm64-arm64-xl 14 saverestore-support-checkfail never pass test-arm64-arm64-libvirt-xsm 13 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 14 saverestore-support-checkfail never pass test-arm64-arm64-xl-thunderx 13 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-arndale 13 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 14 saverestore-support-checkfail never pass test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2 fail never pass test-amd64-amd64-libvirt-vhd 12 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 13 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 14 saverestore-support-checkfail never pass test-armhf-armhf-xl 13 migrate-support-checkfail never pass test-armhf-armhf-xl 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit1 13 migrate-support-checkfail never pass test-armhf-armhf-xl-credit1 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-cubietruck 13 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 14 saverestore-support-checkfail never pass test-armhf-armhf-libvirt 13 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 13 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit2 13 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 14 saverestore-support-checkfail never pass test-arm64-arm64-xl-seattle 13 migrate-support-checkfail never pass test-arm64-arm64-xl-seattle 14 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-raw 12 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 12 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 13 saverestore-support-checkfail never pass test-amd64-i386-xl-qemut-ws16-amd64 17 guest-stop fail never pass version targeted for testing: xen 272c18435e93cbf749c096a9552ab5ef0d79a4ca baseline version: xen 4935a5433db28077fe6313f920bbedcd54516cec Last test of basis 144668 2019-12-10 15:37:57 Z1 days Testing same since 144686 2019-12-11 01:51:21 Z0 days2 attempts People who touched revisions under test: Andrew Cooper Daniel De Graaf George Dunlap Julien Grall
[Xen-devel] [ovmf bisection] complete build-amd64-xsm
branch xen-unstable xenbranch xen-unstable job build-amd64-xsm testid xen-build Tree: ovmf https://github.com/tianocore/edk2.git Tree: qemu git://xenbits.xen.org/qemu-xen-traditional.git Tree: qemuu git://xenbits.xen.org/qemu-xen.git Tree: seabios git://xenbits.xen.org/osstest/seabios.git Tree: xen git://xenbits.xen.org/xen.git *** Found and reproduced problem changeset *** Bug is in tree: ovmf https://github.com/tianocore/edk2.git Bug introduced: 13c5e34a1b8bfedbd10ea038cfcbae5caeab6652 Bug not present: 804666c86e7b6f04fe5c5cfdb13199c19e0e99b0 Last fail repro: http://logs.test-lab.xenproject.org/osstest/logs/144742/ commit 13c5e34a1b8bfedbd10ea038cfcbae5caeab6652 Author: Bob Feng Date: Mon Dec 2 16:24:19 2019 +0800 BaseTools: Add build option for dependency file generation BZ: https://bugzilla.tianocore.org/show_bug.cgi?id=2311 Add /showIncludes for msvc and -MMD -MF $@.deps for GCC and CLANG Remove /MP for msvc since /MP does not work with /showIncludes Signed-off-by: Bob Feng Cc: Liming Gao Cc: Steven Shi Cc: Michael D Kinney Reviewed-by: Liming Gao For bisection revision-tuple graph see: http://logs.test-lab.xenproject.org/osstest/results/bisect/ovmf/build-amd64-xsm.xen-build.html Revision IDs in each graph node refer, respectively, to the Trees above. Running cs-bisection-step --graph-out=/home/logs/results/bisect/ovmf/build-amd64-xsm.xen-build --summary-out=tmp/144742.bisection-summary --basis-template=144637 --blessings=real,real-bisect ovmf build-amd64-xsm xen-build Searching for failure / basis pass: 144718 fail [host=godello1] / 144637 [host=italia0] 144590 ok. Failure / basis pass flights: 144718 / 144590 (tree with no url: minios) Tree: ovmf https://github.com/tianocore/edk2.git Tree: qemu git://xenbits.xen.org/qemu-xen-traditional.git Tree: qemuu git://xenbits.xen.org/qemu-xen.git Tree: seabios git://xenbits.xen.org/osstest/seabios.git Tree: xen git://xenbits.xen.org/xen.git Latest 2fe25a74d6fee3c2ac0b930f7f3596cb432e766e d0d8ad39ecb51cd7497cd524484fe09f50876798 933ebad2470a169504799a1d95b8e410bd9847ef f21b5a4aeb020f2a5e2c6503f906a9349dd2f069 4935a5433db28077fe6313f920bbedcd54516cec Basis pass 49054b6bb66d35484e92c65f27584c4283a60986 d0d8ad39ecb51cd7497cd524484fe09f50876798 933ebad2470a169504799a1d95b8e410bd9847ef c9ba5276e3217ac6a1ec772dbebf568ba3a8a55d 131c89ce1e1dfd0b57a249615a92de4f120d9100 Generating revisions with ./adhoc-revtuple-generator https://github.com/tianocore/edk2.git#49054b6bb66d35484e92c65f27584c4283a60986-2fe25a74d6fee3c2ac0b930f7f3596cb432e766e git://xenbits.xen.org/qemu-xen-traditional.git#d0d8ad39ecb51cd7497cd524484fe09f50876798-d0d8ad39ecb51cd7497cd524484fe09f50876798 git://xenbits.xen.org/qemu-xen.git#933ebad2470a169504799a1d95b8e410bd9847ef-933ebad2470a169504799a1d95b8e410bd9847ef git://xenbits.xen.org/osstest/seabios.git#c9ba5276e3217ac6a1ec772dbebf568ba3a8a5\ 5d-f21b5a4aeb020f2a5e2c6503f906a9349dd2f069 git://xenbits.xen.org/xen.git#131c89ce1e1dfd0b57a249615a92de4f120d9100-4935a5433db28077fe6313f920bbedcd54516cec Use of uninitialized value $parents in array dereference at ./adhoc-revtuple-generator line 465. Use of uninitialized value in concatenation (.) or string at ./adhoc-revtuple-generator line 465. Loaded 12532 nodes in revision graph Searching for test results: 144590 pass 49054b6bb66d35484e92c65f27584c4283a60986 d0d8ad39ecb51cd7497cd524484fe09f50876798 933ebad2470a169504799a1d95b8e410bd9847ef c9ba5276e3217ac6a1ec772dbebf568ba3a8a55d 131c89ce1e1dfd0b57a249615a92de4f120d9100 144637 [host=italia0] 144646 fail irrelevant 144701 [host=godello0] 144676 [host=godello0] 144702 [host=godello0] 144651 fail irrelevant 144741 pass 804666c86e7b6f04fe5c5cfdb13199c19e0e99b0 d0d8ad39ecb51cd7497cd524484fe09f50876798 933ebad2470a169504799a1d95b8e410bd9847ef f21b5a4aeb020f2a5e2c6503f906a9349dd2f069 b73aad4c8b6a767ce15cc8cb65f9eeab7bfccdae 144703 [host=godello0] 144704 [host=godello0] 144688 [host=godello0] 144705 [host=godello0] 144690 [host=godello0] 144661 [host=godello0] 144691 [host=godello0] 144683 [host=godello0] 144706 [host=godello0] 144707 [host=godello0] 144692 [host=godello0] 144694 [host=godello0] 144732 pass 804666c86e7b6f04fe5c5cfdb13199c19e0e99b0 d0d8ad39ecb51cd7497cd524484fe09f50876798 933ebad2470a169504799a1d95b8e410bd9847ef c9ba5276e3217ac6a1ec772dbebf568ba3a8a55d b813ce366a4313a28b2ac26e92da0cb8d6a01a75 144695 [host=godello0] 144696 [host=godello0] 144697 [host=godello0] 144698 [host=godello0] 144700 [host=godello0] 144693 [host=godello0] 144715 [host=godello0] 144709 [host=godello0] 144718 fail 2fe25a74d6fee3c2ac0b930f7f3596cb432e766e d0d8ad39ecb51cd7497cd524484fe09f50876798 933ebad2470a169504799a1d95b8e410bd9847ef f21b5a4aeb020f2a5e2c6503f906a9349dd2f069 4935a5433db28077fe6313f920bbedcd54516cec 144714
[Xen-devel] [BUG] Xen-ballooned memory never returned to domain after partial-free
Hello, The issue I'm seeing is that pages of previously-xenballooned memory are getting trapped in the balloon on free, specifically when they are free'd in batches (i.e. not all at once). The first batch is restored to the domain properly, but subsequent frees are not. Truthfully I'm not sure if this is a bug or not, but the behavior I'm seeing doesn't seem to make sense. Note that this "bug" is in the balloon driver, but the behavior is seen when using the gnttab API, which utilizes the balloon in the background. -- This issue is better illustrated as an example, seen below. Note that the file in question is drivers/xen/balloon.c: Kernel version: 4.19.*, code seems consistent on master as well Relevant configs: - CONFIG_MEMORY_HOTPLUG not set - CONFIG_XEN_BALLOON_MEMORY_HOTPLUG not set * current_pages = # of pages assigned to domain * target_pages = # of pages we want assigned to domain * credit = target - current Start with current_pages/target_pages = 20 pages 1. alloc 5 pages with gnttab_alloc_pages(). current_pages = 15, credit = 5. 2. alloc 3 pages with gnttab_alloc_pages(). current_pages = 12, credit = 8. 3. some time later, free the last 3 pages with gnttab_free_pages(). 4. 3 pages go back to balloon and balloon worker is scheduled since credit > 0. * Relevant part of balloon worker shown below: do { ... credit = current_credit(); if (credit > 0) { if (balloon_is_inflated()) state = increase_reservation(credit); else state = reserve_additional_memory(); } ... } while (credit && state == BP_DONE); 5. credit > 0 and the balloon contains 3 pages, so run increase_reservation. 3 pages are restored to domain, correctly. current_pages = 15, credit = 5. 6. at this point credit is still > 0, so we loop again. 7. this time, the balloon has 0 pages, so we call reserve_additional_memory, seen below. note that CONFIG_XEN_BALLOON_MEMORY_HOTPLUG is disabled, so this funciton is very sparse. static enum bp_state reserve_additional_memory(void) { balloon_stats.target_pages = balloon_stats.current_pages; return BP_ECANCELED; } 8. now target = current = 15, which drops our credit down to 0. 9. at some point later we attempt to free the remaining 5 pages with gnttab_free_pages(). 10. 5 pages go back into the balloon, but this time credit = 0, so we never trigger our balloon worker (it wouldn't do anything anyway). 11. since we've essentially irreversibly decreased target_pages, we'll never attempt to re-add those pages to our domain, and those pages are reserved in the balloon forever. 12. this can be verified by running "free", "cat /proc/meminfo", etc. to show that the total memory has indeed decreased permanently until host reboot. Is this desired behavior? Why would we decrease our target pages if there's no way to restore them? I understand there is a helper function to manually reset the target, but the caller would need to manually keep track of the starting pages; that seems like unnecessary maintenance that the balloon should handle. Additionally, why should any of the above code be possible if we have memory hotplugging disabled? I'm surprised we are able to balloon any memory out from the domain in the first place. I would have expected gnttab_alloc_pages to fail. Please CC niko.tsira...@gmail.com on any replies. Thank you, --Niko ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v3 2/2] xen/arm: sign extend writes to TimerValue
Per the ARMv8 Reference Manual (ARM DDI 0487E.a), section D11.2.4 specifies that the values in the TimerValue view of the timers are signed in standard two's complement form. When writing to the TimerValue register, it should be signed extended as described by the equation CompareValue = (Counter[63:0] + SignExtend(TimerValue))[63:0] Signed-off-by: Jeff Kubascik --- xen/arch/arm/vtimer.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/xen/arch/arm/vtimer.c b/xen/arch/arm/vtimer.c index 21b98ec20a..872181d9b6 100644 --- a/xen/arch/arm/vtimer.c +++ b/xen/arch/arm/vtimer.c @@ -211,7 +211,7 @@ static bool vtimer_cntp_tval(struct cpu_user_regs *regs, uint32_t *r, } else { -v->arch.phys_timer.cval = cntpct + *r; +v->arch.phys_timer.cval = cntpct + (uint64_t)(int32_t)*r; if ( v->arch.phys_timer.ctl & CNTx_CTL_ENABLE ) { v->arch.phys_timer.ctl &= ~CNTx_CTL_PENDING; -- 2.17.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v3 1/2] xen/arm: remove physical timer offset
The physical timer traps apply an offset so that time starts at 0 for the guest. However, this offset is not currently applied to the physical counter. Per the ARMv8 Reference Manual (ARM DDI 0487E.a), section D11.2.4 Timers, the "Offset" between the counter and timer should be zero for a physical timer. This removes the offset to make the timer and counter consistent. This also cleans up the physical timer implementation to better match the virtual timer - both cval's now hold the hardware value. Signed-off-by: Jeff Kubascik --- xen/arch/arm/vtimer.c| 34 ++ xen/include/asm-arm/domain.h | 3 --- 2 files changed, 18 insertions(+), 19 deletions(-) diff --git a/xen/arch/arm/vtimer.c b/xen/arch/arm/vtimer.c index e6aebdac9e..21b98ec20a 100644 --- a/xen/arch/arm/vtimer.c +++ b/xen/arch/arm/vtimer.c @@ -62,7 +62,6 @@ static void virt_timer_expired(void *data) int domain_vtimer_init(struct domain *d, struct xen_arch_domainconfig *config) { -d->arch.phys_timer_base.offset = NOW(); d->arch.virt_timer_base.offset = READ_SYSREG64(CNTPCT_EL0); d->time_offset_seconds = ticks_to_ns(d->arch.virt_timer_base.offset - boot_count); do_div(d->time_offset_seconds, 10); @@ -108,7 +107,6 @@ int vcpu_vtimer_init(struct vcpu *v) init_timer(>timer, phys_timer_expired, t, v->processor); t->ctl = 0; -t->cval = NOW(); t->irq = d0 ? timer_get_irq(TIMER_PHYS_NONSECURE_PPI) : GUEST_TIMER_PHYS_NS_PPI; @@ -167,6 +165,7 @@ void virt_timer_restore(struct vcpu *v) static bool vtimer_cntp_ctl(struct cpu_user_regs *regs, uint32_t *r, bool read) { struct vcpu *v = current; +s_time_t expires; if ( !ACCESS_ALLOWED(regs, EL0PTEN) ) return false; @@ -184,8 +183,9 @@ static bool vtimer_cntp_ctl(struct cpu_user_regs *regs, uint32_t *r, bool read) if ( v->arch.phys_timer.ctl & CNTx_CTL_ENABLE ) { -set_timer(>arch.phys_timer.timer, - v->arch.phys_timer.cval + v->domain->arch.phys_timer_base.offset); +expires = v->arch.phys_timer.cval > boot_count + ? ticks_to_ns(v->arch.phys_timer.cval - boot_count) : 0; +set_timer(>arch.phys_timer.timer, expires); } else stop_timer(>arch.phys_timer.timer); @@ -197,26 +197,27 @@ static bool vtimer_cntp_tval(struct cpu_user_regs *regs, uint32_t *r, bool read) { struct vcpu *v = current; -s_time_t now; +uint64_t cntpct; +s_time_t expires; if ( !ACCESS_ALLOWED(regs, EL0PTEN) ) return false; -now = NOW() - v->domain->arch.phys_timer_base.offset; +cntpct = get_cycles(); if ( read ) { -*r = (uint32_t)(ns_to_ticks(v->arch.phys_timer.cval - now) & 0xull); +*r = (uint32_t)((v->arch.phys_timer.cval - cntpct) & 0xull); } else { -v->arch.phys_timer.cval = now + ticks_to_ns(*r); +v->arch.phys_timer.cval = cntpct + *r; if ( v->arch.phys_timer.ctl & CNTx_CTL_ENABLE ) { v->arch.phys_timer.ctl &= ~CNTx_CTL_PENDING; -set_timer(>arch.phys_timer.timer, - v->arch.phys_timer.cval + - v->domain->arch.phys_timer_base.offset); +expires = v->arch.phys_timer.cval > boot_count + ? ticks_to_ns(v->arch.phys_timer.cval - boot_count) : 0; +set_timer(>arch.phys_timer.timer, expires); } } return true; @@ -226,23 +227,24 @@ static bool vtimer_cntp_cval(struct cpu_user_regs *regs, uint64_t *r, bool read) { struct vcpu *v = current; +s_time_t expires; if ( !ACCESS_ALLOWED(regs, EL0PTEN) ) return false; if ( read ) { -*r = ns_to_ticks(v->arch.phys_timer.cval); +*r = v->arch.phys_timer.cval; } else { -v->arch.phys_timer.cval = ticks_to_ns(*r); +v->arch.phys_timer.cval = *r; if ( v->arch.phys_timer.ctl & CNTx_CTL_ENABLE ) { v->arch.phys_timer.ctl &= ~CNTx_CTL_PENDING; -set_timer(>arch.phys_timer.timer, - v->arch.phys_timer.cval + - v->domain->arch.phys_timer_base.offset); +expires = v->arch.phys_timer.cval > boot_count + ? ticks_to_ns(v->arch.phys_timer.cval - boot_count) : 0; +set_timer(>arch.phys_timer.timer, expires); } } return true; diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h index f3f3fb7d7f..adc7fe7210 100644 --- a/xen/include/asm-arm/domain.h +++ b/xen/include/asm-arm/domain.h @@ -65,9 +65,6 @@ struct arch_domain RELMEM_done, } relmem; -struct { -uint64_t offset; -} phys_timer_base; struct { uint64_t offset; } virt_timer_base; -- 2.17.1
[Xen-devel] [PATCH v3 0/2] xen/arm: physical timer improvements
This patch set improves the emulation of the physical timer by removing the physical timer offset and sign extend the TimerValue to better match the behavior described in the ARMv8 Reference Manual (ARM DDI 0487E.a), section D11.2.4. Changes in v2: - Update commit message to specify reference manual version and section - Change physical timer cval to hold hardware value - Make sure to sign extend TimerValue on writes. This was done by first casting the r pointer to (int32_t *), dereferencing it, then casting to uint64_t. Please let me know if there is a more correct way to do this Changes in v3: - Split TimerValue sign extension fix into separate patch - Update commit message to mention physical timer cleanup - Removed physical timer cval initialization line - Changed TimerValue sign extension to (uint64_t)(int32_t)*r - Account for condition where cval < boot_count Jeff Kubascik (2): xen/arm: remove physical timer offset xen/arm: sign extend writes to TimerValue xen/arch/arm/vtimer.c| 34 ++ xen/include/asm-arm/domain.h | 3 --- 2 files changed, 18 insertions(+), 19 deletions(-) -- 2.17.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] x86emul: correct LFS et al handling for 64-bit mode
On 11/12/2019 09:28, Jan Beulich wrote: > AMD and friends explicitly specify that 64-bit operands aren't possible > for these insns. Nevertheless REX.W isn't fully ignored: It still > cancels a possible operand size override (0x66). Intel otoh explicitly > provides for 64-bit operands on the respective insn page of the SDM. > > Signed-off-by: Jan Beulich It is definitely more than just these. Near jumps have per-vendor behaviour on how long the instruction is, whereas far jump/calls are in the same category as these by the looks of things. ~Andrew > > --- a/xen/arch/x86/x86_emulate/x86_emulate.c > +++ b/xen/arch/x86/x86_emulate/x86_emulate.c > @@ -2640,6 +2640,15 @@ x86_decode_twobyte( > } > break; > > +case 0xb2: /* lss */ > +case 0xb4: /* lfs */ > +case 0xb5: /* lgs */ > +/* REX.W ignored on a vendor-dependent basis. */ > +if ( op_bytes == 8 && > + (ctxt->cpuid->x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON)) > ) > +op_bytes = 4; > +break; > + > case 0xb8: /* jmpe / popcnt */ > if ( rep_prefix() ) > ctxt->opcode |= MASK_INSR(vex.pfx, X86EMUL_OPC_PFX_MASK); ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] x86emul: correct segment override decode for 64-bit mode
On 11/12/2019 09:27, Jan Beulich wrote: > The legacy / compatibility mode ES, CS, SS, and DS overrides are null > prefixes in 64-bit mode, i.e. they in particular don't cancel an > earlier FS or GS one. > > Signed-off-by: Jan Beulich null is a very overloaded term. What you mean here is simply "ignored". In attempting to confirm/test this, I've found yet another curiosity with instruction length calculations when reordering a rex prefix and legacy prefix. Objdump gets it wrong, but the instruction boundaries according to singlestep are weird. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH 11/24] compat_ioctl: block: handle cdrom compat ioctl in non-cdrom drivers
Various block drivers implement the CDROMMULTISESSION, CDROM_GET_CAPABILITY, and CDROMEJECT ioctl commands, relying on the block layer to handle compat_ioctl mode for them. Move this into the drivers directly as a preparation for simplifying the block layer later. Since some of these commands need a compat_ptr() conversion, introduce a blkdev_compat_ptr_ioctl() helper function that can be used as the .compat_ioctl callback for those drivers that only support compatible commands. The actual CD-ROM drivers that call cdrom_ioctl() are converted in a separate patch. Signed-off-by: Arnd Bergmann --- block/ioctl.c| 21 + drivers/block/floppy.c | 3 +++ drivers/block/paride/pd.c| 1 + drivers/block/paride/pf.c| 1 + drivers/block/sunvdc.c | 1 + drivers/block/xen-blkfront.c | 1 + include/linux/blkdev.h | 7 +++ 7 files changed, 35 insertions(+) diff --git a/block/ioctl.c b/block/ioctl.c index 5de98b97af2a..e728331d1a5b 100644 --- a/block/ioctl.c +++ b/block/ioctl.c @@ -1,5 +1,6 @@ // SPDX-License-Identifier: GPL-2.0 #include +#include #include #include #include @@ -285,6 +286,26 @@ int __blkdev_driver_ioctl(struct block_device *bdev, fmode_t mode, */ EXPORT_SYMBOL_GPL(__blkdev_driver_ioctl); +#ifdef CONFIG_COMPAT +/* + * This is the equivalent of compat_ptr_ioctl(), to be used by block + * drivers that implement only commands that are completely compatible + * between 32-bit and 64-bit user space + */ +int blkdev_compat_ptr_ioctl(struct block_device *bdev, fmode_t mode, + unsigned cmd, unsigned long arg) +{ + struct gendisk *disk = bdev->bd_disk; + + if (disk->fops->ioctl) + return disk->fops->ioctl(bdev, mode, cmd, +(unsigned long)compat_ptr(arg)); + + return -ENOIOCTLCMD; +} +EXPORT_SYMBOL(blkdev_compat_ptr_ioctl); +#endif + static int blkdev_pr_register(struct block_device *bdev, struct pr_registration __user *arg) { diff --git a/drivers/block/floppy.c b/drivers/block/floppy.c index 485865fd0412..cd3612e4e2e1 100644 --- a/drivers/block/floppy.c +++ b/drivers/block/floppy.c @@ -3879,6 +3879,9 @@ static int fd_compat_ioctl(struct block_device *bdev, fmode_t mode, unsigned int { int drive = (long)bdev->bd_disk->private_data; switch (cmd) { + case CDROMEJECT: /* CD-ROM eject */ + case 0x6470: /* SunOS floppy eject */ + case FDMSGON: case FDMSGOFF: case FDSETEMSGTRESH: diff --git a/drivers/block/paride/pd.c b/drivers/block/paride/pd.c index 6f9ad3fc716f..c0967507d085 100644 --- a/drivers/block/paride/pd.c +++ b/drivers/block/paride/pd.c @@ -874,6 +874,7 @@ static const struct block_device_operations pd_fops = { .open = pd_open, .release= pd_release, .ioctl = pd_ioctl, + .compat_ioctl = pd_ioctl, .getgeo = pd_getgeo, .check_events = pd_check_events, .revalidate_disk= pd_revalidate diff --git a/drivers/block/paride/pf.c b/drivers/block/paride/pf.c index 6b7d4cab3687..bb09f21ce21a 100644 --- a/drivers/block/paride/pf.c +++ b/drivers/block/paride/pf.c @@ -276,6 +276,7 @@ static const struct block_device_operations pf_fops = { .open = pf_open, .release= pf_release, .ioctl = pf_ioctl, + .compat_ioctl = pf_ioctl, .getgeo = pf_getgeo, .check_events = pf_check_events, }; diff --git a/drivers/block/sunvdc.c b/drivers/block/sunvdc.c index 571612e233fe..39aeebc6837d 100644 --- a/drivers/block/sunvdc.c +++ b/drivers/block/sunvdc.c @@ -171,6 +171,7 @@ static const struct block_device_operations vdc_fops = { .owner = THIS_MODULE, .getgeo = vdc_getgeo, .ioctl = vdc_ioctl, + .compat_ioctl = blkdev_compat_ptr_ioctl, }; static void vdc_blk_queue_start(struct vdc_port *port) diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c index a74d03913822..23c86350a5ab 100644 --- a/drivers/block/xen-blkfront.c +++ b/drivers/block/xen-blkfront.c @@ -2632,6 +2632,7 @@ static const struct block_device_operations xlvbd_block_fops = .release = blkif_release, .getgeo = blkif_getgeo, .ioctl = blkif_ioctl, + .compat_ioctl = blkdev_compat_ptr_ioctl, }; diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 47eb22a3b7f9..3e0408618da7 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -1711,6 +1711,13 @@ struct block_device_operations { const struct pr_ops *pr_ops; }; +#ifdef CONFIG_COMPAT +extern int blkdev_compat_ptr_ioctl(struct block_device *, fmode_t, + unsigned int, unsigned long); +#else +#define blkdev_compat_ptr_ioctl NULL +#endif + extern int __blkdev_driver_ioctl(struct block_device *, fmode_t, unsigned int,
[Xen-devel] [PATCH 00/24] block, scsi: final compat_ioctl cleanup
Hi Jens, James and Martin, This series concludes the work I did for linux-5.5 on the compat_ioctl() cleanup, killing off fs/compat_ioctl.c and block/compat_ioctl.c by moving everything into drivers. Overall this would be a reduction both in complexity and line count, but as I'm also adding documentation the overall number of lines increases in the end. My plan was originally to keep the SCSI and block parts separate. This did not work easily because of interdependencies: I cannot do the final SCSI cleanup in a good way without first addressing the CDROM ioctls, so this is one series that I hope could be merged through either the block or the scsi git trees, or possibly both if you can pull in the same branch. The series comes in these steps: 1. clean up the sg v3 interface as suggested by Linus. I have talked about this with Doug Gilbert as well, and he would rebase his sg v4 patches on top of "compat: scsi: sg: fix v3 compat read/write interface" 2. Four patches for missing block compat_ioctl handlers, to be backported into stable kernels. Separate patches because they are needed in different stable versions. 3. Actually moving handlers out of block/compat_ioctl.c and block/scsi_ioctl.c into drivers, mixed in with cleanup patches 4. Document how to do this right. I keep getting asked about this, and it helps to point to some documentation file. The series is avaialable for testing at [1]. Arnd [1] https://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground.git/log/?h=compat-ioctl-endgame Arnd Bergmann (24): compat: ARM64: always include asm-generic/compat.h compat: scsi: sg: fix v3 compat read/write interface compat_ioctl: block: handle BLKREPORTZONE/BLKRESETZONE compat_ioctl: block: handle BLKGETZONESZ/BLKGETNRZONES compat_ioctl: block: handle add zone open, close and finish ioctl compat_ioctl: block: handle Persistent Reservations compaT_ioctl: ubd, aoe: use blkdev_compat_ptr_ioctl compat_ioctl: move CDROM_SEND_PACKET handling into scsi compat_ioctl: move CDROMREADADIO to cdrom.c compat_ioctl: cdrom: handle CDROM_LAST_WRITTEN compat_ioctl: block: handle cdrom compat ioctl in non-cdrom drivers compat_ioctl: add scsi_compat_ioctl compat_ioctl: bsg: add handler compat_ioctl: ide: floppy: add handler compat_ioctl: scsi: move ioctl handling into drivers compat_ioctl: move sys_compat_ioctl() to ioctl.c compat_ioctl: simplify the implementation compat_ioctl: move cdrom commands into cdrom.c compat_ioctl: scsi: handle HDIO commands from drivers compat_ioctl: move HDIO ioctl handling into drivers/ide compat_ioctl: block: move blkdev_compat_ioctl() into ioctl.c compat_ioctl: block: simplify compat_blkpg_ioctl() compat_ioctl: simplify up block/ioctl.c Documentation: document ioctl interfaces better Documentation/core-api/index.rst | 1 + Documentation/core-api/ioctl.rst | 250 +++ arch/arm64/include/asm/compat.h| 5 +- arch/um/drivers/ubd_kern.c | 1 + block/Makefile | 1 - block/bsg.c| 1 + block/compat_ioctl.c | 411 - block/ioctl.c | 319 +++ block/scsi_ioctl.c | 214 - drivers/ata/libata-scsi.c | 9 + drivers/block/aoe/aoeblk.c | 1 + drivers/block/floppy.c | 3 + drivers/block/paride/pcd.c | 3 + drivers/block/paride/pd.c | 1 + drivers/block/paride/pf.c | 1 + drivers/block/pktcdvd.c| 26 +- drivers/block/sunvdc.c | 1 + drivers/block/virtio_blk.c | 3 + drivers/block/xen-blkfront.c | 1 + drivers/cdrom/cdrom.c | 35 ++- drivers/cdrom/gdrom.c | 3 + drivers/ide/ide-cd.c | 40 +++ drivers/ide/ide-disk.c | 3 + drivers/ide/ide-floppy.c | 4 + drivers/ide/ide-floppy.h | 2 + drivers/ide/ide-floppy_ioctl.c | 35 +++ drivers/ide/ide-gd.c | 14 + drivers/ide/ide-ioctls.c | 47 ++- drivers/ide/ide-tape.c | 14 + drivers/scsi/aic94xx/aic94xx_init.c| 3 + drivers/scsi/ch.c | 9 +- drivers/scsi/hisi_sas/hisi_sas_v1_hw.c | 3 + drivers/scsi/hisi_sas/hisi_sas_v2_hw.c | 3 + drivers/scsi/hisi_sas/hisi_sas_v3_hw.c | 3 + drivers/scsi/ipr.c | 3 + drivers/scsi/isci/init.c | 3 + drivers/scsi/mvsas/mv_init.c | 3 + drivers/scsi/pm8001/pm8001_init.c | 3 + drivers/scsi/scsi_ioctl.c | 54 +++- drivers/scsi/sd.c | 50 ++- drivers/scsi/sg.c | 169 +- drivers/scsi/sr.c | 53 +++- drivers/scsi/st.c | 51 +--
Re: [Xen-devel] [PATCH] x86: store cr4 during suspend/resume
On 11/12/2019 19:36, Andrew Cooper wrote: > Absolutely nothing remaining in suspend.c should be spilled. It can all > be (re)caluclated from the same information source as the AP boot path, > and the result will be strictly smaller in RAM, and more robust. And at a cursory glance, the logic in suspend.c doesn't correctly handle the ler=1 case. This can trivially be fixed by: diff --git a/xen/arch/x86/acpi/suspend.c b/xen/arch/x86/acpi/suspend.c index c9dea67bf3..50dbb14528 100644 --- a/xen/arch/x86/acpi/suspend.c +++ b/xen/arch/x86/acpi/suspend.c @@ -15,8 +15,6 @@ #include #include -static unsigned long saved_lstar, saved_cstar; -static unsigned long saved_sysenter_esp, saved_sysenter_eip; static unsigned long saved_fs_base, saved_gs_base, saved_kernel_gs_base; static uint64_t saved_xcr0; @@ -24,15 +22,6 @@ void save_rest_processor_state(void) { saved_fs_base = rdfsbase(); saved_gs_base = rdgsbase(); - rdmsrl(MSR_SHADOW_GS_BASE, saved_kernel_gs_base); - rdmsrl(MSR_CSTAR, saved_cstar); - rdmsrl(MSR_LSTAR, saved_lstar); - - if ( cpu_has_sep ) - { - rdmsrl(MSR_IA32_SYSENTER_ESP, saved_sysenter_esp); - rdmsrl(MSR_IA32_SYSENTER_EIP, saved_sysenter_eip); - } if ( cpu_has_xsave ) saved_xcr0 = get_xcr0(); @@ -42,25 +31,12 @@ void save_rest_processor_state(void) void restore_rest_processor_state(void) { load_system_tables(); - - /* Recover syscall MSRs */ - wrmsrl(MSR_LSTAR, saved_lstar); - wrmsrl(MSR_CSTAR, saved_cstar); - wrmsrl(MSR_STAR, XEN_MSR_STAR); - wrmsrl(MSR_SYSCALL_MASK, XEN_SYSCALL_MASK); + percpu_traps_init(); wrfsbase(saved_fs_base); wrgsbase(saved_gs_base); wrmsrl(MSR_SHADOW_GS_BASE, saved_kernel_gs_base); - if ( cpu_has_sep ) - { - /* Recover sysenter MSRs */ - wrmsrl(MSR_IA32_SYSENTER_ESP, saved_sysenter_esp); - wrmsrl(MSR_IA32_SYSENTER_EIP, saved_sysenter_eip); - wrmsr(MSR_IA32_SYSENTER_CS, __HYPERVISOR_CS, 0); - } - if ( cpu_has_xsave && !set_xcr0(saved_xcr0) ) BUG(); for starters. The fs/gs/shadow values are guest-only state so can just be discarded. xcr0 isn't needed but this code is latently(soon-to-be?) buggy by not handling XSS at the same time. Both should follow the BSP logic for evaluating the system default. That lets us delete save_rest_processor_state() entirely. In the assembly side of things, ss and cr3 obviously don't need to be spilled. ss is always 0, and we're already on the correct pagetable. We're also probably on the correct cr0. The caller-clobbered GPRs can just be discarded, and that gets our stashed state down to almost nothing. I trust I have made my point blindingly obvious. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2] xen/arm: remove physical timer offset
On 12/5/2019 3:28 PM, Julien Grall wrote: > Hi, > > On 05/12/2019 19:17, Jeff Kubascik wrote: >> On 12/3/2019 1:04 PM, Julien Grall wrote: >>> Hi, >>> >>> On 26/11/2019 21:13, Jeff Kubascik wrote: The physical timer traps apply an offset so that time starts at 0 for the guest. However, this offset is not currently applied to the physical counter. Per the ARMv8 Reference Manual (ARM DDI 0487E.a), section D11.2.4 Timers, the "Offset" between the counter and timer should be zero for a physical timer. This removes the offset to make the timer and counter consistent. Furthermore, section D11.2.4 specifies that the values in the TimerValue view of the timers are signed in standard two's complement form. When writing to the TimerValue register, it should be signed extended as described by the equation CompareValue = (Counter[63:0] + SignExtend(TimerValue))[63:0] >>> >>> I am a bit confused, is it a new bug introduced by the change or >>> previously existing? If the latter, then I think this should be modified >>> in a separate patch. >> >> This would be a previously existing bug - a quirk in the timer design that >> wasn't emulated correctly before. I can break this out into a separate patch. > > It would be great if you can split it. Thank you! > > [...] > > @@ -185,7 +184,7 @@ static bool vtimer_cntp_ctl(struct cpu_user_regs *regs, uint32_t *r, bool read) if ( v->arch.phys_timer.ctl & CNTx_CTL_ENABLE ) { set_timer(>arch.phys_timer.timer, - v->arch.phys_timer.cval + v->domain->arch.phys_timer_base.offset); + ticks_to_ns(v->arch.phys_timer.cval - boot_count)); >>> >>> cval may be smaller than boot_count. In that case, we will set the timer >>> to expire a very long time. This is not the expected behavior from the >>> guest. >>> >>> Instead, we should either use 0 to create the timer or call >>> phys_timer_expired directly. >> >> I disagree - if you set cval to a value smaller than boot_count, you are >> setting >> cval to a value less than the physical counter value. This would result in >> the >> timer having a long expiration time. > > boot_count refers to when Xen began to boot, not the start of the > physical counter. If you look at the condition to fire the timer (see > below), then it means the timer will fire right now because the physical > counter is past CompareValue (cval). > > TimerConditionMet = (((Counter[63:0] – Offset[63:0])[63:0] - > CompareValue[63:0]) >= 0) This makes sense now - I wasn't considering the greater than equals condition. > We only subtract boot_count here as the timer subsystem expects a > relative number of nanoseconds from when Xen booted. Makes sense. >> >> However, set_timer expects a signed 64 bit value in ns. The conversion of >> cval >> (unsigned 64 bit) from ticks to ns is going to overflow this. I'm not sure >> what >> would be the best way to work around this limitation. At the very least, I >> think >> we should print a warning message. > > A warning message in emulation is definitely not the right solution. If > a user asks something that is valid from the spec PoV then we should > implement it correctly. The more that I don't think boot_count store > what you expect (see above). > > But we definitely can't allow the caller of ticks_to_ns() to pass a > negative value as argument because (cval - boot_count) may be over 2^63 > for instance if the user requests a timer to be set in a million of year > (I didn't do the math!). Assuming 100MHz timer frequency, the math works out to be about 5,850 years, give or take. I'm assuming we don't need to worry about rollover conditions? > Cheers, > > -- > Julien Grall > Sincerely, Jeff Kubascik ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] x86: store cr4 during suspend/resume
On 10/12/2019 10:07, Jan Beulich wrote: > On 10.12.2019 10:59, Andrew Cooper wrote: >> On 09/12/2019 18:06, Roger Pau Monne wrote: >>> Currently cr4 is not cached before suspension, and mmu_cr4_features is >>> used in order to restore the expected cr4 value. This is correct so >>> far because the tasklet that executes the suspend/resume code is >>> running in the idle vCPU context. >>> >>> In order to make the code less fragile, explicitly save the current >>> cr4 value before suspension, so that it can be restored afterwards. >>> This ensures that the cr4 value cached in the cpu_info doesn't get out >>> of sync after resume from suspension. >>> >>> Suggested-by: Jan Beulich >>> Signed-off-by: Roger Pau Monné >> Why? There is nothing fragile here. Suspend/resume is always in idle >> context and loads of other logic already depends on this. >> >> I've been slowly stripping out redundant saved state like this. > Where it's clearly redundant, this is fine. But I don't think it's > sufficiently clear here There is a reason I made it explicitly crystal clear with c/s 87e7b4d5b > , and going back to what was there before > is imo generally less error prone than going to some fixed state. It is demonstrably more error prone, which is why I'm slowly killing it. Stashing state wastes unnecessary space, and adds an error case where state is either stashed incorrectly, or gets modified before restore, and we'll blindly use. Two examples of real bugs caused by this are c/s 0c30171cb and 4ee0ad72d Absolutely nothing remaining in suspend.c should be spilled. It can all be (re)caluclated from the same information source as the AP boot path, and the result will be strictly smaller in RAM, and more robust. > Furthermore I was hoping we could eventually do away with > mmu_cr4_features. How do you plan on doing this? ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [ovmf test] 144718: regressions - FAIL
flight 144718 ovmf real [real] http://logs.test-lab.xenproject.org/osstest/logs/144718/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: build-amd64 6 xen-buildfail REGR. vs. 144637 build-i386-xsm6 xen-buildfail REGR. vs. 144637 build-amd64-xsm 6 xen-buildfail REGR. vs. 144637 build-i3866 xen-buildfail REGR. vs. 144637 Tests which did not succeed, but are not blocking: test-amd64-amd64-xl-qemuu-ovmf-amd64 1 build-check(1) blocked n/a build-amd64-libvirt 1 build-check(1) blocked n/a build-i386-libvirt1 build-check(1) blocked n/a test-amd64-i386-xl-qemuu-ovmf-amd64 1 build-check(1) blocked n/a version targeted for testing: ovmf 2fe25a74d6fee3c2ac0b930f7f3596cb432e766e baseline version: ovmf 804666c86e7b6f04fe5c5cfdb13199c19e0e99b0 Last test of basis 144637 2019-12-09 09:09:49 Z2 days Failing since144646 2019-12-10 01:39:53 Z1 days 10 attempts Testing same since 144713 2019-12-11 12:09:19 Z0 days2 attempts People who touched revisions under test: Antoine Coeur Ard Biesheuvel Bob Feng Jiewen Yao Michael Kubacki Philippe Mathieu-Daude Steven Shi jobs: build-amd64-xsm fail build-i386-xsm fail build-amd64 fail build-i386 fail build-amd64-libvirt blocked build-i386-libvirt blocked build-amd64-pvopspass build-i386-pvops pass test-amd64-amd64-xl-qemuu-ovmf-amd64 blocked test-amd64-i386-xl-qemuu-ovmf-amd64 blocked sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Not pushing. commit 2fe25a74d6fee3c2ac0b930f7f3596cb432e766e Author: Ard Biesheuvel Date: Tue Mar 5 14:32:48 2019 +0100 ArmPkg/MmCommunicationDxe: relay architected PI events to MM context PI defines a few architected events that have significance in the MM context as well as in the non-secure DXE context. So register notify handlers for these events, and relay them into the standalone MM world. Signed-off-by: Ard Biesheuvel Reviewed-by: Jiewen Yao Reviewed-by: Achin Gupta commit d3add11e87dace180387562d6f1951f2bffbd3d9 Author: Michael Kubacki Date: Wed Nov 20 17:31:24 2019 -0800 MdeModulePkg PeiCore: Improve comment semantics This patch clarifies wording in several PeiCore comments to improve reading comprehension. Cc: Dandan Bi Cc: Liming Gao Cc: Jian J Wang Cc: Hao A Wu Signed-off-by: Michael Kubacki Reviewed-by: Liming Gao Reviewed-by: Jian J Wang commit d39d1260c615b716675f67f5c4e1f4f52df01dad Author: Michael Kubacki Date: Wed Nov 20 17:10:48 2019 -0800 MdeModulePkg PeiCore: Fix typos Cc: Dandan Bi Cc: Liming Gao Cc: Jian J Wang Cc: Hao A Wu Signed-off-by: Michael Kubacki Reviewed-by: Liming Gao Reviewed-by: Philippe Mathieu-Daude Reviewed-by: Jian J Wang commit 97eedf5dfbaffde33210fd88066247cf0b7d3325 Author: Antoine Coeur Date: Wed Dec 4 12:14:53 2019 +0800 IntelFsp2WrapperPkg: Fix various typos Fix various typos in comments and documentation. Cc: Chasel Chiu Cc: Nate DeSimone Cc: Star Zeng Reviewed-by: Philippe Mathieu-Daude Signed-off-by: Philippe Mathieu-Daude Reviewed-by: Nate DeSimone Reviewed-by: Chasel Chiu Reviewed-by: Star Zeng commit 7e55cf6b48dcd43de46d008b2f12caaad2554503 Author: Jiewen Yao Date: Sat Dec 7 21:41:10 2019 +0800 SecurityPkg/Tcg2Smm: Measure the table before patch. REF: https://bugzilla.tianocore.org/show_bug.cgi?id=1940 According to TCG PFP specification: the ACPI table must be measured prior to any modification, and the measurement must
[Xen-devel] [xen-4.13-testing test] 144708: regressions - FAIL
flight 144708 xen-4.13-testing real [real] http://logs.test-lab.xenproject.org/osstest/logs/144708/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-amd64-xl-shadow 18 guest-localmigrate/x10 fail REGR. vs. 144673 Tests which did not succeed, but are not blocking: test-amd64-amd64-xl-rtds 18 guest-localmigrate/x10 fail like 144673 test-armhf-armhf-xl-rtds 16 guest-start/debian.repeatfail like 144673 test-amd64-i386-xl-pvshim12 guest-start fail never pass test-arm64-arm64-xl-seattle 13 migrate-support-checkfail never pass test-arm64-arm64-xl-seattle 14 saverestore-support-checkfail never pass test-amd64-amd64-libvirt 13 migrate-support-checkfail never pass test-amd64-i386-libvirt-xsm 13 migrate-support-checkfail never pass test-amd64-i386-libvirt 13 migrate-support-checkfail never pass test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2 fail never pass test-arm64-arm64-xl-xsm 13 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 14 saverestore-support-checkfail never pass test-arm64-arm64-xl 13 migrate-support-checkfail never pass test-arm64-arm64-xl 14 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit2 13 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 14 saverestore-support-checkfail never pass test-arm64-arm64-libvirt-xsm 13 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 14 saverestore-support-checkfail never pass test-arm64-arm64-xl-thunderx 13 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-arndale 13 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 14 saverestore-support-checkfail never pass test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stop fail never pass test-amd64-amd64-libvirt-vhd 12 migrate-support-checkfail never pass test-arm64-arm64-xl-credit1 13 migrate-support-checkfail never pass test-arm64-arm64-xl-credit1 14 saverestore-support-checkfail never pass test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop fail never pass test-amd64-i386-xl-qemut-win7-amd64 17 guest-stop fail never pass test-armhf-armhf-xl-rtds 13 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 14 saverestore-support-checkfail never pass test-armhf-armhf-xl 13 migrate-support-checkfail never pass test-armhf-armhf-xl 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit2 13 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-cubietruck 13 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 13 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-multivcpu 14 saverestore-support-checkfail never pass test-armhf-armhf-libvirt 13 migrate-support-checkfail never pass test-armhf-armhf-libvirt 14 saverestore-support-checkfail never pass test-amd64-i386-xl-qemut-ws16-amd64 17 guest-stop fail never pass test-amd64-amd64-xl-qemut-win7-amd64 17 guest-stop fail never pass test-armhf-armhf-libvirt-raw 12 migrate-support-checkfail never pass test-armhf-armhf-libvirt-raw 13 saverestore-support-checkfail never pass test-amd64-i386-xl-qemuu-ws16-amd64 17 guest-stop fail never pass test-armhf-armhf-xl-vhd 12 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 13 saverestore-support-checkfail never pass test-amd64-amd64-xl-qemuu-ws16-amd64 17 guest-stop fail never pass test-armhf-armhf-xl-credit1 13 migrate-support-checkfail never pass test-armhf-armhf-xl-credit1 14 saverestore-support-checkfail never pass test-amd64-amd64-xl-qemut-ws16-amd64 17 guest-stop fail never pass version targeted for testing: xen fd31193058be072131316c2e0c19ab8643e676c6 baseline version: xen b0f0bbca95bd532212fb1956f3e23d1ab13a53cf Last test of basis 144673 2019-12-10 19:07:50 Z0 days Testing same since 144708 2019-12-11 11:38:22 Z0 days1 attempts People who touched
[Xen-devel] [PATCH v7 1/3] xenbus/backend: Add memory pressure handler callback
Granting pages consumes backend system memory. In systems configured with insufficient spare memory for those pages, it can cause a memory pressure situation. However, finding the optimal amount of the spare memory is challenging for large systems having dynamic resource utilization patterns. Also, such a static configuration might lack flexibility. To mitigate such problems, this commit adds a memory reclaim callback to 'xenbus_driver'. If a memory pressure is detected, 'xenbus' requests every backend driver to volunarily release its memory. Note that it would be able to improve the callback facility for more sophisticated handlings of general pressures. For example, it would be possible to monitor the memory consumption of each device and issue the release requests to only devices which causing the pressure. Also, the callback could be extended to handle not only memory, but general resources. Nevertheless, this version of the implementation defers such sophisticated goals as a future work. Reviewed-by: Juergen Gross Signed-off-by: SeongJae Park --- drivers/xen/xenbus/xenbus_probe_backend.c | 32 +++ include/xen/xenbus.h | 1 + 2 files changed, 33 insertions(+) diff --git a/drivers/xen/xenbus/xenbus_probe_backend.c b/drivers/xen/xenbus/xenbus_probe_backend.c index b0bed4faf44c..7e78ebef7c54 100644 --- a/drivers/xen/xenbus/xenbus_probe_backend.c +++ b/drivers/xen/xenbus/xenbus_probe_backend.c @@ -248,6 +248,35 @@ static int backend_probe_and_watch(struct notifier_block *notifier, return NOTIFY_DONE; } +static int backend_reclaim_memory(struct device *dev, void *data) +{ + const struct xenbus_driver *drv; + + if (!dev->driver) + return 0; + drv = to_xenbus_driver(dev->driver); + if (drv && drv->reclaim_memory) + drv->reclaim_memory(to_xenbus_device(dev)); + return 0; +} + +/* + * Returns 0 always because we are using shrinker to only detect memory + * pressure. + */ +static unsigned long backend_shrink_memory_count(struct shrinker *shrinker, + struct shrink_control *sc) +{ + bus_for_each_dev(_backend.bus, NULL, NULL, + backend_reclaim_memory); + return 0; +} + +static struct shrinker backend_memory_shrinker = { + .count_objects = backend_shrink_memory_count, + .seeks = DEFAULT_SEEKS, +}; + static int __init xenbus_probe_backend_init(void) { static struct notifier_block xenstore_notifier = { @@ -264,6 +293,9 @@ static int __init xenbus_probe_backend_init(void) register_xenstore_notifier(_notifier); + if (register_shrinker(_memory_shrinker)) + pr_warn("shrinker registration failed\n"); + return 0; } subsys_initcall(xenbus_probe_backend_init); diff --git a/include/xen/xenbus.h b/include/xen/xenbus.h index 869c816d5f8c..c861cfb6f720 100644 --- a/include/xen/xenbus.h +++ b/include/xen/xenbus.h @@ -104,6 +104,7 @@ struct xenbus_driver { struct device_driver driver; int (*read_otherend_details)(struct xenbus_device *dev); int (*is_ready)(struct xenbus_device *dev); + void (*reclaim_memory)(struct xenbus_device *dev); }; static inline struct xenbus_driver *to_xenbus_driver(struct device_driver *drv) -- 2.17.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v7 0/2] xenbus/backend: Add a memory pressure handler callback
Granting pages consumes backend system memory. In systems configured with insufficient spare memory for those pages, it can cause a memory pressure situation. However, finding the optimal amount of the spare memory is challenging for large systems having dynamic resource utilization patterns. Also, such a static configuration might lack flexibility. To mitigate such problems, this patchset adds a memory reclaim callback to 'xenbus_driver' (patch 1) and use it to mitigate the problem in 'xen-blkback' (patch 2). The third patch is a trivial cleanup of variable names. Base Version This patch is based on v5.4. A complete tree is also available at my public git repo: https://github.com/sjp38/linux/tree/blkback_squeezing_v7 Patch History - Changes from v6 (https://lore.kernel.org/linux-block/20191211042428.5961-1-sjp...@amazon.de/) - Remove more unnecessary prefixes (suggested by Roger Pau Monné) - Constify a variable (suggested by Roger Pau Monné) - Rename 'reclaim' into 'reclaim_memory' (suggested by Roger Pau Monné) - More wordsmith of the commit message (suggested by Roger Pau Monné) Changes from v5 (https://lore.kernel.org/linux-block/20191210080628.5264-1-sjp...@amazon.de/) - Wordsmith the commit messages (suggested by Roger Pau Monné) - Change the reclaim callback return type (suggested by Roger Pau Monné) - Change the type of the blkback squeeze duration variable (suggested by Roger Pau Monné) - Add a patch for removal of unnecessary static variable name prefixes (suggested by Roger Pau Monné) - Fix checkpatch.pl warnings Changes from v4 (https://lore.kernel.org/xen-devel/20191209194305.20828-1-sjp...@amazon.com/) - Remove domain id parameter from the callback (suggested by Juergen Gross) - Rename xen-blkback module parameter (suggested by Stefan Nuernburger) Changes from v3 (https://lore.kernel.org/xen-devel/20191209085839.21215-1-sjp...@amazon.com/) - Add general callback in xen_driver and use it (suggested by Juergen Gross) Changes from v2 (https://lore.kernel.org/linux-block/af195033-23d5-38ed-b73b-f6e2e3b34...@amazon.com) - Rename the module parameter and variables for brevity (aggressive shrinking -> squeezing) Changes from v1 (https://lore.kernel.org/xen-devel/20191204113419.2298-1-sjp...@amazon.com/) - Adjust the description to not use the term, `arbitrarily` (suggested by Paul Durrant) - Specify time unit of the duration in the parameter description, (suggested by Maximilian Heyne) - Change default aggressive shrinking duration from 1ms to 10ms - Merge two patches into one single patch SeongJae Park (2): xenbus/backend: Add memory pressure handler callback xen/blkback: Squeeze page pools if a memory pressure is detected drivers/block/xen-blkback/blkback.c | 23 +++-- drivers/block/xen-blkback/common.h| 1 + drivers/block/xen-blkback/xenbus.c| 3 ++- drivers/xen/xenbus/xenbus_probe_backend.c | 31 +++ include/xen/xenbus.h | 1 + 5 files changed, 56 insertions(+), 3 deletions(-) -- 2.17.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v7 3/3] xen/blkback: Remove unnecessary static variable name prefixes
A few of static variables in blkback have 'xen_blkif_' prefix, though it is unnecessary for static variables. This commit removes such prefixes. Reviewed-by: Roger Pau Monné Signed-off-by: SeongJae Park --- drivers/block/xen-blkback/blkback.c | 37 + 1 file changed, 17 insertions(+), 20 deletions(-) diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c index 98823d150905..f41c698dd854 100644 --- a/drivers/block/xen-blkback/blkback.c +++ b/drivers/block/xen-blkback/blkback.c @@ -62,8 +62,8 @@ * IO workloads. */ -static int xen_blkif_max_buffer_pages = 1024; -module_param_named(max_buffer_pages, xen_blkif_max_buffer_pages, int, 0644); +static int max_buffer_pages = 1024; +module_param_named(max_buffer_pages, max_buffer_pages, int, 0644); MODULE_PARM_DESC(max_buffer_pages, "Maximum number of free pages to keep in each block backend buffer"); @@ -78,8 +78,8 @@ MODULE_PARM_DESC(max_buffer_pages, * algorithm. */ -static int xen_blkif_max_pgrants = 1056; -module_param_named(max_persistent_grants, xen_blkif_max_pgrants, int, 0644); +static int max_pgrants = 1056; +module_param_named(max_persistent_grants, max_pgrants, int, 0644); MODULE_PARM_DESC(max_persistent_grants, "Maximum number of grants to map persistently"); @@ -88,8 +88,8 @@ MODULE_PARM_DESC(max_persistent_grants, * use. The time is in seconds, 0 means indefinitely long. */ -static unsigned int xen_blkif_pgrant_timeout = 60; -module_param_named(persistent_grant_unused_seconds, xen_blkif_pgrant_timeout, +static unsigned int pgrant_timeout = 60; +module_param_named(persistent_grant_unused_seconds, pgrant_timeout, uint, 0644); MODULE_PARM_DESC(persistent_grant_unused_seconds, "Time in seconds an unused persistent grant is allowed to " @@ -137,9 +137,8 @@ module_param(log_stats, int, 0644); static inline bool persistent_gnt_timeout(struct persistent_gnt *persistent_gnt) { - return xen_blkif_pgrant_timeout && - (jiffies - persistent_gnt->last_used >= - HZ * xen_blkif_pgrant_timeout); + return pgrant_timeout && (jiffies - persistent_gnt->last_used >= + HZ * pgrant_timeout); } /* Once a memory pressure is detected, squeeze free page pools for a while. */ @@ -249,7 +248,7 @@ static int add_persistent_gnt(struct xen_blkif_ring *ring, struct persistent_gnt *this; struct xen_blkif *blkif = ring->blkif; - if (ring->persistent_gnt_c >= xen_blkif_max_pgrants) { + if (ring->persistent_gnt_c >= max_pgrants) { if (!blkif->vbd.overflow_max_grants) blkif->vbd.overflow_max_grants = 1; return -EBUSY; @@ -412,14 +411,13 @@ static void purge_persistent_gnt(struct xen_blkif_ring *ring) goto out; } - if (ring->persistent_gnt_c < xen_blkif_max_pgrants || - (ring->persistent_gnt_c == xen_blkif_max_pgrants && + if (ring->persistent_gnt_c < max_pgrants || + (ring->persistent_gnt_c == max_pgrants && !ring->blkif->vbd.overflow_max_grants)) { num_clean = 0; } else { - num_clean = (xen_blkif_max_pgrants / 100) * LRU_PERCENT_CLEAN; - num_clean = ring->persistent_gnt_c - xen_blkif_max_pgrants + - num_clean; + num_clean = (max_pgrants / 100) * LRU_PERCENT_CLEAN; + num_clean = ring->persistent_gnt_c - max_pgrants + num_clean; num_clean = min(ring->persistent_gnt_c, num_clean); pr_debug("Going to purge at least %u persistent grants\n", num_clean); @@ -614,8 +612,7 @@ static void print_stats(struct xen_blkif_ring *ring) current->comm, ring->st_oo_req, ring->st_rd_req, ring->st_wr_req, ring->st_f_req, ring->st_ds_req, -ring->persistent_gnt_c, -xen_blkif_max_pgrants); +ring->persistent_gnt_c, max_pgrants); ring->st_print = jiffies + msecs_to_jiffies(10 * 1000); ring->st_rd_req = 0; ring->st_wr_req = 0; @@ -675,7 +672,7 @@ int xen_blkif_schedule(void *arg) if (time_before(jiffies, buffer_squeeze_end)) shrink_free_pagepool(ring, 0); else - shrink_free_pagepool(ring, xen_blkif_max_buffer_pages); + shrink_free_pagepool(ring, max_buffer_pages); if (log_stats && time_after(jiffies, ring->st_print)) print_stats(ring); @@ -902,7 +899,7 @@ static int xen_blkbk_map(struct xen_blkif_ring *ring, continue; } if (use_persistent_gnts && - ring->persistent_gnt_c < xen_blkif_max_pgrants) { + ring->persistent_gnt_c < max_pgrants) {
[Xen-devel] [PATCH v7 2/3] xen/blkback: Squeeze page pools if a memory pressure is detected
Each `blkif` has a free pages pool for the grant mapping. The size of the pool starts from zero and be increased on demand while processing the I/O requests. If current I/O requests handling is finished or 100 milliseconds has passed since last I/O requests handling, it checks and shrinks the pool to not exceed the size limit, `max_buffer_pages`. Therefore, host administrators can cause memory pressure in blkback by attaching a large number of block devices and inducing I/O. Such problematic situations can be avoided by limiting the maximum number of devices that can be attached, but finding the optimal limit is not so easy. Improper set of the limit can results in the memory pressure or a resource underutilization. This commit avoids such problematic situations by squeezing the pools (returns every free page in the pool to the system) for a while (users can set this duration via a module parameter) if a memory pressure is detected. Discussions === The `blkback`'s original shrinking mechanism returns only pages in the pool, which are not currently be used by `blkback`, to the system. In other words, the pages that are not mapped with granted pages. Because this commit is changing only the shrink limit but still uses the same freeing mechanism it does not touch pages which are currently mapping grants. Once a memory pressure is detected, this commit keeps the squeezing limit for a user-specified time duration. The duration should be neither too long nor too short. If it is too long, the squeezing incurring overhead can reduce the I/O performance. If it is too short, `blkback` will not free enough pages to reduce the memory pressure. This commit sets the value as `10 milliseconds` by default because it is a short time in terms of I/O while it is a long time in terms of memory operations. Also, as the original shrinking mechanism works for at least every 100 milliseconds, this could be a somewhat reasonable choice. I also tested other durations (refer to the below section for more details) and confirmed that 10 milliseconds is the one that works best with the test. That said, the proper duration depends on actual configurations and workloads. That's why this commit allows users to set the duration as a module parameter. Memory Pressure Test To show how this commit fixes the memory pressure situation well, I configured a test environment on a xen-running virtualization system. On the `blkfront` running guest instances, I attach a large number of network-backed volume devices and induce I/O to those. Meanwhile, I measure the number of pages that swapped in (pswpin) and out (pswpout) on the `blkback` running guest. The test ran twice, once for the `blkback` before this commit and once for that after this commit. As shown below, this commit has dramatically reduced the memory pressure: pswpin pswpout before 76,672 185,799 after 2123,325 Optimal Aggressive Shrinking Duration - To find a best squeezing duration, I repeated the test with three different durations (1ms, 10ms, and 100ms). The results are as below: durationpswpin pswpout 1 852 6,424 10 212 3,325 100 203 3,340 As expected, the memory pressure has decreased as the duration is increased, but the reduction stopped from the `10ms`. Based on this results, I chose the default duration as 10ms. Performance Overhead Test = This commit could incur I/O performance degradation under severe memory pressure because the squeezing will require more page allocations per I/O. To show the overhead, I artificially made a worst-case squeezing situation and measured the I/O performance of a `blkfront` running guest. For the artificial squeezing, I set the `blkback.max_buffer_pages` using the `/sys/module/xen_blkback/parameters/max_buffer_pages` file. In this test, I set the value to `1024` and `0`. The `1024` is the default value. Setting the value as `0` is same to a situation doing the squeezing always (worst-case). For the I/O performance measurement, I run a simple `dd` command 5 times as below and collect the 'MB/s' results. $ for i in {1..5}; do dd if=/dev/zero of=file \ bs=4k count=$((256*512)); sync; done If the underlying block device is slow enough, the squeezing overhead could be hidden. For the reason, I do this test for both a slow block device and a fast block device. I use a popular cloud block storage service, ebs[1] as a slow device and the ramdisk block device[2] for the fast device. The results are as below. 'max_pgs' represents the value of the `blkback.max_buffer_pages` parameter. On the slow block device max_pgs Min Max Median AvgStddev 0 38.7 45.8 38.7 40.12 3.1752165 1024 38.7 45.8
[Xen-devel] [xen-unstable-smoke test] 144719: tolerable all pass - PUSHED
flight 144719 xen-unstable-smoke real [real] http://logs.test-lab.xenproject.org/osstest/logs/144719/ Failures :-/ but no regressions. Tests which did not succeed, but are not blocking: test-amd64-amd64-libvirt 13 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 13 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 14 saverestore-support-checkfail never pass test-armhf-armhf-xl 13 migrate-support-checkfail never pass test-armhf-armhf-xl 14 saverestore-support-checkfail never pass version targeted for testing: xen b4f042236ae0bb6725b3e8dd40af5a2466a6f971 baseline version: xen 80268f63640f5c3a9a4e1c688a62e35448fde9e2 Last test of basis 144711 2019-12-11 12:02:23 Z0 days Testing same since 144719 2019-12-11 15:00:35 Z0 days1 attempts People who touched revisions under test: Andrew Cooper George Dunlap Jan Beulich Julien Grall Kevin Tian Wei Liu jobs: build-arm64-xsm pass build-amd64 pass build-armhf pass build-amd64-libvirt pass test-armhf-armhf-xl pass test-arm64-arm64-xl-xsm pass test-amd64-amd64-xl-qemuu-debianhvm-amd64pass test-amd64-amd64-libvirt pass sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Pushing revision : To xenbits.xen.org:/home/xen/git/xen.git 80268f6364..b4f042236a b4f042236ae0bb6725b3e8dd40af5a2466a6f971 -> smoke ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v3 1/2] x86/mm: factor out the code for shattering an l3 PTE
On 11.12.2019 17:34, Xia, Hongyan wrote: > On Wed, 2019-12-11 at 16:29 +0100, Jan Beulich wrote: >>> +} >>> + >>> +if ( locking ) >>> +spin_lock(_pgdir_lock); >>> +if ( (l3e_get_flags(*pl3e) & _PAGE_PRESENT) && >>> + (l3e_get_flags(*pl3e) & _PAGE_PSE) ) >>> +{ >>> +l3e_write_atomic(pl3e, >>> +l3e_from_paddr((paddr_t)virt_to_maddr(l2t), >>> __PAGE_HYPERVISOR)); >> >> Why the cast? (I'm sorry if this was there on v3 already and I >> didn't spot it. And if this remains the only thing to adjust, >> then I guess this could be taken care of while committing.) > > Sadly there is no l3e_from_maddr or virt_to_paddr to call directly. Of > course, paddr_t and maddr have the same underlying type (unsigned > long), so it works without a cast. I just added the cast to make it > explicit that these two are not exactly the same. Yes, there continues to be a naming disconnect. But no, this is not a reason to add a cast. Casts should be used as sparingly as possible, since they tend to hide problems. Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v3 1/2] x86/mm: factor out the code for shattering an l3 PTE
On 11.12.2019 17:27, Xia, Hongyan wrote: > On Wed, 2019-12-11 at 16:32 +0100, Jan Beulich wrote: >> On 11.12.2019 11:58, Hongyan Xia wrote: >>> @@ -5578,27 +5597,8 @@ int modify_xen_mappings(unsigned long s, >>> unsigned long e, unsigned int nf) >>> } >>> >>> /* PAGE1GB: shatter the superpage and fall through. */ >>> -pl2e = alloc_xen_pagetable(); >>> -if ( !pl2e ) >>> +if ( shatter_l3e(pl3e, 0, locking) ) >>> return -ENOMEM; >> >> Hmm, I didn't expect I'd need to comment on this again: As per >> my v2 reply, you should hand on the return value from the >> function, not make up your own. This is so that in case the >> function gains another error path with a different error code, >> it wouldn't become indistinguishable to callers further up. > > I was basically thinking about the conversation we had that ENOMEM is > probably the only error value map_pages_to_xen would return ever, and > it is unlikely to gain another return value in the future, so initially > I just let shatter return -1 and the caller return -ENOMEM. There is no > problem for me if we want to change it to handle different error > values. The alternative to your prior 0 / -1 returning would have been to have the function return bool. In this case "inventing" an error code here would be fine. The 0 / -1 approach would introduce another instance of what we're trying to get rid of elsewhere. Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH-for-4.13] build: fix tools/configure in case only python3 exists
Calling ./configure with python3 being there but no python, tools/configure will fail. Fix that by defaulting to python and falling back to python3 or python2. While at it fix the use of non portable "type -p" by replacing it by AC_PATH_PROG(). Signed-off-by: Juergen Gross --- tools/configure.ac | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/tools/configure.ac b/tools/configure.ac index a8d8ce5ffe..8d86c42de8 100644 --- a/tools/configure.ac +++ b/tools/configure.ac @@ -337,8 +337,9 @@ case "$host_os" in freebsd*) ;; *) AX_PATH_PROG_OR_FAIL([BASH], [bash]);; esac -AS_IF([test -z "$PYTHON"], [PYTHON="python"]) -AS_IF([echo "$PYTHON" | grep -q "^/"], [], [PYTHON=`type -p "$PYTHON"`]) +AS_IF([test -z "$PYTHON"], [AC_CHECK_PROGS([PYTHON], [python python3 python2], err)]) +AS_IF([test "$PYTHON" = "err"], [AC_MSG_ERROR([No python interpreter found])]) +AS_IF([echo "$PYTHON" | grep -q "^/"], [], [AC_PATH_PROG([PYTHON], [$PYTHON])]) PYTHONPATH=$PYTHON PYTHON=`basename $PYTHONPATH` -- 2.16.4 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] XSA-255 and Arm
Hi, On 10/12/2019 10:42, George Dunlap wrote: On 12/9/19 1:13 PM, Julien Grall wrote: Hi all, I was looking at the Grant Table code over the week-end and noticed thart XSA-255 [1] introduced some unintended consequences on Arm. Since the XSA, gnttab_map_frame() will remove the previous mapping (if any) because mapping to the new GFN. As on Arm we don't have an M2P, the GFN is stored per frame in the grant-table code. This will never get cleared during unmapping (e.g. XENMEM_remove_from_physmap) and therefore we may end up to remove a mapping from someone different (Arm does not check the MFN is the correct one before removing mapping). Sorry Julien, I don't know enough about the ARM grant mapping code to know what this is about. Can you point me to the relevant files / functions / structures, and maybe sketch out a problematic behavior? If you look at the implementation of gnttab_map_frame() (common/grant_table.c), it will use gnttab_get_frame_gfn(...) to know whether the grant-table frame was already mapped in the Guest P2M. If it is already mapped, then we will try to remove the current mapping before doing the new one. On Arm, gnttab_get_frame_gfn() are using an internal array because to we don't have an internal array because we don't have an M2P. There is a sister function to set the frame (see gnttab_set_frame_gfn()). However, this is only called when mapping the frame and not when the unmap is done XENMEM_remove_from_physmap. From my understanding, the current code (i.e remove the old mapping) is trying to workaround the fact we don't take reference when doing a mapping in the page-tables. At the default of not handling reference for all the mappings, we could handle grant frame in a similar way to foreign mapping. This would remove the restriction on the number of mappings and we could update the internal array at the same time. The only concern with this appraoch is we would have to walk through the array to find the GFN. Cheers, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v3 1/2] x86/mm: factor out the code for shattering an l3 PTE
On Wed, 2019-12-11 at 16:29 +0100, Jan Beulich wrote: > > +} > > + > > +if ( locking ) > > +spin_lock(_pgdir_lock); > > +if ( (l3e_get_flags(*pl3e) & _PAGE_PRESENT) && > > + (l3e_get_flags(*pl3e) & _PAGE_PSE) ) > > +{ > > +l3e_write_atomic(pl3e, > > +l3e_from_paddr((paddr_t)virt_to_maddr(l2t), > > __PAGE_HYPERVISOR)); > > Why the cast? (I'm sorry if this was there on v3 already and I > didn't spot it. And if this remains the only thing to adjust, > then I guess this could be taken care of while committing.) > > Jan Sadly there is no l3e_from_maddr or virt_to_paddr to call directly. Of course, paddr_t and maddr have the same underlying type (unsigned long), so it works without a cast. I just added the cast to make it explicit that these two are not exactly the same. Hongyan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v4 2/9] x86: introduce a new set of APIs to manage Xen page tables
Hi, On 04/12/2019 17:10, Hongyan Xia wrote: From: Wei Liu We are going to switch to using domheap page for page tables. A new set of APIs is introduced to allocate, map, unmap and free pages for page tables. The allocation and deallocation work on mfn_t but not page_info, because they are required to work even before frame table is set up. Implement the old functions with the new ones. We will rewrite, site by site, other mm functions that manipulate page tables to use the new APIs. Note these new APIs still use xenheap page underneath and no actual map and unmap is done so that we don't break xen half way. They will be switched to use domheap and dynamic mappings when usage of old APIs is eliminated. No functional change intended in this patch. Signed-off-by: Wei Liu Signed-off-by: Hongyan Xia --- Changed since v3: - const qualify unmap_xen_pagetable_new(). - remove redundant parentheses. --- xen/arch/x86/mm.c| 39 ++- xen/include/asm-x86/mm.h | 11 +++ 2 files changed, 45 insertions(+), 5 deletions(-) diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c index 7d4dd80a85..ca362ad638 100644 --- a/xen/arch/x86/mm.c +++ b/xen/arch/x86/mm.c @@ -119,6 +119,7 @@ #include #include #include +#include #include #include #include @@ -5020,22 +5021,50 @@ int mmcfg_intercept_write( } void *alloc_xen_pagetable(void) +{ +mfn_t mfn; + +mfn = alloc_xen_pagetable_new(); +ASSERT(!mfn_eq(mfn, INVALID_MFN)); The current version of alloc_xen_pagetable() may return NULL. So I can't see why this would not happen with the new implementation (see more below). Furthermore, AFAIK, mfn_to_virt() is not able to deal with INVALID_MFN. While the ASSERT will catch such error in debug build, non-debug build will end up to undefined behavior (allocation failure is a real possibility). Regardless the discussion on whether the ASSERT() is warrant, I think you want to have: if ( mfn_eq(mfn, INVALID_MFN) ) return NULL; I am half tempted to suggest to put that in map_xen_pagetable_new() for hardening. + +return map_xen_pagetable_new(mfn); +} + +void free_xen_pagetable(void *v) +{ +if ( system_state != SYS_STATE_early_boot ) +free_xen_pagetable_new(virt_to_mfn(v)); +} + +mfn_t alloc_xen_pagetable_new(void) { if ( system_state != SYS_STATE_early_boot ) { void *ptr = alloc_xenheap_page(); BUG_ON(!hardware_domain && !ptr); This BUG_ON() only catch memory exhaustion before the Hardware Domain (aka Dom0) is created. Afterwards, the pointer may be NULL. As ... -return ptr; +return virt_to_mfn(ptr); virt_to_mfn() requires a valid MFN, you will end up to undefined behavior. } -return mfn_to_virt(mfn_x(alloc_boot_pages(1, 1))); +return alloc_boot_pages(1, 1); } -void free_xen_pagetable(void *v) +void *map_xen_pagetable_new(mfn_t mfn) { -if ( system_state != SYS_STATE_early_boot ) -free_xenheap_page(v); +return mfn_to_virt(mfn_x(mfn)); +} + +/* v can point to an entry within a table or be NULL */ +void unmap_xen_pagetable_new(const void *v) +{ +/* XXX still using xenheap page, no need to do anything. */ +} + +/* mfn can be INVALID_MFN */ +void free_xen_pagetable_new(mfn_t mfn) +{ +if ( system_state != SYS_STATE_early_boot && !mfn_eq(mfn, INVALID_MFN) ) +free_xenheap_page(mfn_to_virt(mfn_x(mfn))); } static DEFINE_SPINLOCK(map_pgdir_lock); diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h index 9d2b833579..76593fe9e7 100644 --- a/xen/include/asm-x86/mm.h +++ b/xen/include/asm-x86/mm.h @@ -582,6 +582,17 @@ void *do_page_walk(struct vcpu *v, unsigned long addr); /* Allocator functions for Xen pagetables. */ void *alloc_xen_pagetable(void); void free_xen_pagetable(void *v); +mfn_t alloc_xen_pagetable_new(void); +void *map_xen_pagetable_new(mfn_t mfn); +void unmap_xen_pagetable_new(const void *v); +void free_xen_pagetable_new(mfn_t mfn); + +#define UNMAP_XEN_PAGETABLE_NEW(ptr)\ +do {\ +unmap_xen_pagetable_new(ptr); \ +(ptr) = NULL; \ +} while (0) + l1_pgentry_t *virt_to_xen_l1e(unsigned long v); int __sync_local_execstate(void); -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v3 1/2] x86/mm: factor out the code for shattering an l3 PTE
On Wed, 2019-12-11 at 16:32 +0100, Jan Beulich wrote: > On 11.12.2019 11:58, Hongyan Xia wrote: > > @@ -5578,27 +5597,8 @@ int modify_xen_mappings(unsigned long s, > > unsigned long e, unsigned int nf) > > } > > > > /* PAGE1GB: shatter the superpage and fall through. */ > > -pl2e = alloc_xen_pagetable(); > > -if ( !pl2e ) > > +if ( shatter_l3e(pl3e, 0, locking) ) > > return -ENOMEM; > > Hmm, I didn't expect I'd need to comment on this again: As per > my v2 reply, you should hand on the return value from the > function, not make up your own. This is so that in case the > function gains another error path with a different error code, > it wouldn't become indistinguishable to callers further up. > I was basically thinking about the conversation we had that ENOMEM is probably the only error value map_pages_to_xen would return ever, and it is unlikely to gain another return value in the future, so initially I just let shatter return -1 and the caller return -ENOMEM. There is no problem for me if we want to change it to handle different error values. Hongyan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] IOMMU: make DMA containment of quarantined devices optional
On Wed, Dec 11, 2019 at 01:53:00PM +0100, Jan Beulich wrote: > Containing still in flight DMA was introduced to work around certain > devices / systems hanging hard upon hitting an IOMMU fault. Passing > through (such) devices (on such systems) is inherently insecure (as > guests could easily arrange for IOMMU faults to occur). Defaulting to > a mode where admins may not even become aware of issues with devices can > be considered undesirable. Therefore convert this mode of operation to > an optional one, not one enabled by default. > > This involves resurrecting code commit ea38867831da ("x86 / iommu: set > up a scratch page in the quarantine domain") did remove, in a slightly > extended fashion. Here, instead of reintroducing a pretty pointless use > of "goto" in domain_context_unmap(), and instead of making the function > (at least temporarily) inconsistent, take the opportunity and replace > the other similarly pointless "goto" as well. > > In order to key the re-instated bypasses off of there (not) being a root > page table this further requires moving the allocate_domain_resources() > invocation from reassign_device() to amd_iommu_setup_domain_device() (or > else reassign_device() would allocate a root page table anyway); this is > benign to the second caller of the latter function. > > Signed-off-by: Jan Beulich > --- > I'm happy to take better suggestions to replace "full". > > --- a/docs/misc/xen-command-line.pandoc > +++ b/docs/misc/xen-command-line.pandoc > @@ -1232,7 +1232,7 @@ detection of systems known to misbehave > > Default: `new` unless directed-EOI is supported > > ### iommu > -= List of [ , verbose, debug, force, required, quarantine, > += List of [ , verbose, debug, force, required, quarantine[=full], > sharept, intremap, intpost, crash-disable, > snoop, qinval, igfx, amd-iommu-perdev-intremap, > dom0-{passthrough,strict} ] > @@ -1270,11 +1270,13 @@ boolean (e.g. `iommu=no`) can override t > will prevent Xen from booting if IOMMUs aren't discovered and enabled > successfully. > > -* The `quarantine` boolean can be used to control Xen's behavior when > -de-assigning devices from guests. If enabled (the default), Xen always > -quarantines such devices; they must be explicitly assigned back to Dom0 > -before they can be used there again. If disabled, Xen will only > -quarantine devices the toolstack hass arranged for getting quarantined. > +* The `quarantine` option can be used to control Xen's behavior when > +de-assigning devices from guests. If set to true (the default), Xen > +always quarantines such devices; they must be explicitly assigned back > +to Dom0 before they can be used there again. If set to "full", still > +active DMA will additionally be directed to a "sink" page. If set to > +false, Xen will only quarantine devices the toolstack has arranged for > +getting quarantined. > > * The `sharept` boolean controls whether the IOMMU pagetables are shared > with the CPU-side HAP pagetables, or allocated separately. Sharing > --- a/xen/drivers/passthrough/amd/pci_amd_iommu.c > +++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c > @@ -85,18 +85,36 @@ int get_dma_requestor_id(uint16_t seg, u > return req_id; > } > > -static void amd_iommu_setup_domain_device( > +static int __must_check allocate_domain_resources(struct domain_iommu *hd) > +{ > +int rc; > + > +spin_lock(>arch.mapping_lock); > +rc = amd_iommu_alloc_root(hd); > +spin_unlock(>arch.mapping_lock); > + > +return rc; > +} > + > +static int __must_check amd_iommu_setup_domain_device( > struct domain *domain, struct amd_iommu *iommu, > uint8_t devfn, struct pci_dev *pdev) > { > struct amd_iommu_dte *table, *dte; > unsigned long flags; > -int req_id, valid = 1; > +int req_id, valid = 1, rc; > u8 bus = pdev->bus; > -const struct domain_iommu *hd = dom_iommu(domain); > +struct domain_iommu *hd = dom_iommu(domain); > + > +/* dom_io is used as a sentinel for quarantined devices */ > +if ( domain == dom_io && !hd->arch.root_table ) This condition (and it's Intel counterpart) would be better in a macro IMO, so that vendor code regardless of the implementation can use the same macro (and to avoid having to add the same comment in all instances), ie: IS_DEVICE_QUARANTINED or some such would be fine IMO. > +return 0; > + > +BUG_ON(!hd->arch.paging_mode || !iommu->dev_table.buffer); > > -BUG_ON( !hd->arch.root_table || !hd->arch.paging_mode || > -!iommu->dev_table.buffer ); > +rc = allocate_domain_resources(hd); > +if ( rc ) > +return rc; > > if ( iommu_hwdom_passthrough && is_hardware_domain(domain) ) > valid = 0; > @@ -151,6 +169,8 @@ static void amd_iommu_setup_domain_devic > > amd_iommu_flush_iotlb(devfn, pdev, INV_IOMMU_ALL_PAGES_ADDRESS, 0); >
Re: [Xen-devel] [PATCH v1] xen-pciback: optionally allow interrupt enable flag writes
On 03.12.2019 06:41, Marek Marczykowski-Górecki wrote: > QEMU running in a stubdom needs to be able to set INTX_DISABLE, and the > MSI(-X) enable flags in the PCI config space. This adds an attribute > 'allow_interrupt_control' which when set for a PCI device allows writes > to this flag(s). The toolstack will need to set this for stubdoms. > When enabled, guest (stubdomain) will be allowed to set relevant enable > flags, but only one at a time - i.e. it refuses to enable more than one > of INTx, MSI, MSI-X at a time. > > This functionality is needed only for config space access done by device > model (stubdomain) serving a HVM with the actual PCI device. It is not > necessary and unsafe to enable direct access to those bits for PV domain > with the device attached. For PV domains, there are separate protocol > messages (XEN_PCI_OP_{enable,disable}_{msi,msix}) for this purpose. > Those ops in addition to setting enable bits, also configure MSI(-X) in > dom0 kernel - which is undesirable for PCI passthrough to HVM guests. > > This should not introduce any new security issues since a malicious > guest (or stubdom) can already generate MSIs through other ways, see > [1] page 8. True, albeit this doesn't cover INTX_DISABLE. > Additionally, when qemu runs in dom0, it already have direct > access to those bits. True again, but any bug here (as in: too wide exposure) is a qemu bug (possibly security issue). The statement also isn't really correct for de-privileged qemu, I think (but I also think PCI pass-through doesn't work in that mode at all yet). On the whole this looks to be an acceptable approach to me. But I'm not the maintainer, so my opinion may not count much. Some issues with the implementation itself were already pointed out. Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v3 2/2] x86/mm: factor out the code for shattering an l2 PTE
On 11.12.2019 11:58, Hongyan Xia wrote: > map_pages_to_xen and modify_xen_mappings are performing almost exactly > the same operations when shattering an l2 PTE, the only difference > being whether we want to flush. > > Signed-off-by: Hongyan Xia As before comments on patch 1 apply here as well. Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v3 1/2] x86/mm: factor out the code for shattering an l3 PTE
On 11.12.2019 11:58, Hongyan Xia wrote: > @@ -5578,27 +5597,8 @@ int modify_xen_mappings(unsigned long s, unsigned long > e, unsigned int nf) > } > > /* PAGE1GB: shatter the superpage and fall through. */ > -pl2e = alloc_xen_pagetable(); > -if ( !pl2e ) > +if ( shatter_l3e(pl3e, 0, locking) ) > return -ENOMEM; Hmm, I didn't expect I'd need to comment on this again: As per my v2 reply, you should hand on the return value from the function, not make up your own. This is so that in case the function gains another error path with a different error code, it wouldn't become indistinguishable to callers further up. Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v3 4/4] xen-blkback: support dynamic unbind/bind
By simply re-attaching to shared rings during connect_ring() rather than assuming they are freshly allocated (i.e assuming the counters are zero) it is possible for vbd instances to be unbound and re-bound from and to (respectively) a running guest. This has been tested by running: while true; do fio --name=randwrite --ioengine=libaio --iodepth=16 \ --rw=randwrite --bs=4k --direct=1 --size=1G --verify=crc32; done in a PV guest whilst running: while true; do echo vbd-$DOMID-$VBD >unbind; echo unbound; sleep 5; echo vbd-$DOMID-$VBD >bind; echo bound; sleep 3; done in dom0 from /sys/bus/xen-backend/drivers/vbd to continuously unbind and re-bind its system disk image. This is a highly useful feature for a backend module as it allows it to be unloaded and re-loaded (i.e. updated) without requiring domUs to be halted. This was also tested by running: while true; do echo vbd-$DOMID-$VBD >unbind; echo unbound; sleep 5; rmmod xen-blkback; echo unloaded; sleep 1; modprobe xen-blkback; echo bound; cd $(pwd); sleep 3; done in dom0 whilst running the same loop as above in the (single) PV guest. Some (less stressful) testing has also been done using a Windows HVM guest with the latest 9.0 PV drivers installed. Signed-off-by: Paul Durrant --- Cc: Konrad Rzeszutek Wilk Cc: "Roger Pau Monné" Cc: Jens Axboe Cc: Boris Ostrovsky Cc: Juergen Gross Cc: Stefano Stabellini v3: - Constify sring_common and re-work error handling in xen_blkif_map() v2: - Apply a sanity check to the value of rsp_prod and fail the re-attach if it is implausible - Set allow_rebind to prevent ring from being closed on unbind - Update test workload from dd to fio (with verification) --- drivers/block/xen-blkback/xenbus.c | 56 -- 1 file changed, 38 insertions(+), 18 deletions(-) diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c index 59d576d27ca7..0d4097bdff3f 100644 --- a/drivers/block/xen-blkback/xenbus.c +++ b/drivers/block/xen-blkback/xenbus.c @@ -190,6 +190,9 @@ static int xen_blkif_map(struct xen_blkif_ring *ring, grant_ref_t *gref, { int err; struct xen_blkif *blkif = ring->blkif; + const struct blkif_common_sring *sring_common; + RING_IDX rsp_prod, req_prod; + unsigned int size; /* Already connected through? */ if (ring->irq) @@ -200,46 +203,62 @@ static int xen_blkif_map(struct xen_blkif_ring *ring, grant_ref_t *gref, if (err < 0) return err; + sring_common = (struct blkif_common_sring *)ring->blk_ring; + rsp_prod = READ_ONCE(sring_common->rsp_prod); + req_prod = READ_ONCE(sring_common->req_prod); + switch (blkif->blk_protocol) { case BLKIF_PROTOCOL_NATIVE: { - struct blkif_sring *sring; - sring = (struct blkif_sring *)ring->blk_ring; - BACK_RING_INIT(>blk_rings.native, sring, - XEN_PAGE_SIZE * nr_grefs); + struct blkif_sring *sring_native = + (struct blkif_sring *)ring->blk_ring; + + BACK_RING_ATTACH(>blk_rings.native, sring_native, +rsp_prod, XEN_PAGE_SIZE * nr_grefs); + size = __RING_SIZE(sring_native, XEN_PAGE_SIZE * nr_grefs); break; } case BLKIF_PROTOCOL_X86_32: { - struct blkif_x86_32_sring *sring_x86_32; - sring_x86_32 = (struct blkif_x86_32_sring *)ring->blk_ring; - BACK_RING_INIT(>blk_rings.x86_32, sring_x86_32, - XEN_PAGE_SIZE * nr_grefs); + struct blkif_x86_32_sring *sring_x86_32 = + (struct blkif_x86_32_sring *)ring->blk_ring; + + BACK_RING_ATTACH(>blk_rings.x86_32, sring_x86_32, +rsp_prod, XEN_PAGE_SIZE * nr_grefs); + size = __RING_SIZE(sring_x86_32, XEN_PAGE_SIZE * nr_grefs); break; } case BLKIF_PROTOCOL_X86_64: { - struct blkif_x86_64_sring *sring_x86_64; - sring_x86_64 = (struct blkif_x86_64_sring *)ring->blk_ring; - BACK_RING_INIT(>blk_rings.x86_64, sring_x86_64, - XEN_PAGE_SIZE * nr_grefs); + struct blkif_x86_64_sring *sring_x86_64 = + (struct blkif_x86_64_sring *)ring->blk_ring; + + BACK_RING_ATTACH(>blk_rings.x86_64, sring_x86_64, +rsp_prod, XEN_PAGE_SIZE * nr_grefs); + size = __RING_SIZE(sring_x86_64, XEN_PAGE_SIZE * nr_grefs); break; } default: BUG(); } + err = -EIO; + if (req_prod - rsp_prod > size) + goto fail; + err = bind_interdomain_evtchn_to_irqhandler(blkif->domid, evtchn,
[Xen-devel] [PATCH v3 3/4] xen/interface: re-define FRONT/BACK_RING_ATTACH()
Currently these macros are defined to re-initialize a front/back ring (respectively) to values read from the shared ring in such a way that any requests/responses that are added to the shared ring whilst the front/back is detached will be skipped over. This, in general, is not a desirable semantic since most frontend implementations will eventually block waiting for a response which would either never appear or never be processed. Since the macros are currently unused, take this opportunity to re-define them to re-initialize a front/back ring using specified values. This also allows FRONT/BACK_RING_INIT() to be re-defined in terms of FRONT/BACK_RING_ATTACH() using a specified value of 0. NOTE: BACK_RING_ATTACH() will be used directly in a subsequent patch. Signed-off-by: Paul Durrant --- Cc: Boris Ostrovsky Cc: Juergen Gross Cc: Stefano Stabellini A patch to add the FRONT/BACK_RING_ATTACH() macros to the canonical ring.h in xen.git will be sent once the definitions have been agreed. v2: - change definitions to take explicit initial index values --- include/xen/interface/io/ring.h | 29 + 1 file changed, 9 insertions(+), 20 deletions(-) diff --git a/include/xen/interface/io/ring.h b/include/xen/interface/io/ring.h index 3f40501fc60b..2af7a1cd6658 100644 --- a/include/xen/interface/io/ring.h +++ b/include/xen/interface/io/ring.h @@ -125,35 +125,24 @@ struct __name##_back_ring { \ memset((_s)->pad, 0, sizeof((_s)->pad)); \ } while(0) -#define FRONT_RING_INIT(_r, _s, __size) do { \ -(_r)->req_prod_pvt = 0;\ -(_r)->rsp_cons = 0; \ +#define FRONT_RING_ATTACH(_r, _s, _i, __size) do { \ +(_r)->req_prod_pvt = (_i); \ +(_r)->rsp_cons = (_i); \ (_r)->nr_ents = __RING_SIZE(_s, __size); \ (_r)->sring = (_s); \ } while (0) -#define BACK_RING_INIT(_r, _s, __size) do {\ -(_r)->rsp_prod_pvt = 0;\ -(_r)->req_cons = 0; \ -(_r)->nr_ents = __RING_SIZE(_s, __size); \ -(_r)->sring = (_s); \ -} while (0) +#define FRONT_RING_INIT(_r, _s, __size) FRONT_RING_ATTACH(_r, _s, 0, __size) -/* Initialize to existing shared indexes -- for recovery */ -#define FRONT_RING_ATTACH(_r, _s, __size) do { \ -(_r)->sring = (_s); \ -(_r)->req_prod_pvt = (_s)->req_prod; \ -(_r)->rsp_cons = (_s)->rsp_prod; \ +#define BACK_RING_ATTACH(_r, _s, _i, __size) do { \ +(_r)->rsp_prod_pvt = (_i); \ +(_r)->req_cons = (_i); \ (_r)->nr_ents = __RING_SIZE(_s, __size); \ -} while (0) - -#define BACK_RING_ATTACH(_r, _s, __size) do { \ (_r)->sring = (_s); \ -(_r)->rsp_prod_pvt = (_s)->rsp_prod; \ -(_r)->req_cons = (_s)->req_prod; \ -(_r)->nr_ents = __RING_SIZE(_s, __size); \ } while (0) +#define BACK_RING_INIT(_r, _s, __size) BACK_RING_ATTACH(_r, _s, 0, __size) + /* How big is this ring? */ #define RING_SIZE(_r) \ ((_r)->nr_ents) -- 2.20.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v3 1/4] xenbus: move xenbus_dev_shutdown() into frontend code...
...and make it static xenbus_dev_shutdown() is seemingly intended to cause clean shutdown of PV frontends when a guest is rebooted. Indeed the function waits for a conpletion which is only set by a call to xenbus_frontend_closed(). This patch removes the shutdown() method from backends and moves xenbus_dev_shutdown() from xenbus_probe.c into xenbus_probe_frontend.c, renaming it appropriately and making it static. NOTE: In the case where the backend is running in a driver domain, the toolstack should have already terminated any frontends that may be using it (since Xen does not support re-startable PV driver domains) so xenbus_dev_shutdown() should never be called. Signed-off-by: Paul Durrant Reviewed-by: Juergen Gross --- Cc: Boris Ostrovsky Cc: Stefano Stabellini --- drivers/xen/xenbus/xenbus.h| 2 -- drivers/xen/xenbus/xenbus_probe.c | 23 - drivers/xen/xenbus/xenbus_probe_backend.c | 1 - drivers/xen/xenbus/xenbus_probe_frontend.c | 24 +- 4 files changed, 23 insertions(+), 27 deletions(-) diff --git a/drivers/xen/xenbus/xenbus.h b/drivers/xen/xenbus/xenbus.h index d75a2385b37c..5f5b8a7d5b80 100644 --- a/drivers/xen/xenbus/xenbus.h +++ b/drivers/xen/xenbus/xenbus.h @@ -116,8 +116,6 @@ int xenbus_probe_devices(struct xen_bus_type *bus); void xenbus_dev_changed(const char *node, struct xen_bus_type *bus); -void xenbus_dev_shutdown(struct device *_dev); - int xenbus_dev_suspend(struct device *dev); int xenbus_dev_resume(struct device *dev); int xenbus_dev_cancel(struct device *dev); diff --git a/drivers/xen/xenbus/xenbus_probe.c b/drivers/xen/xenbus/xenbus_probe.c index c21be6e9d38a..5aa29396c9e3 100644 --- a/drivers/xen/xenbus/xenbus_probe.c +++ b/drivers/xen/xenbus/xenbus_probe.c @@ -281,29 +281,6 @@ int xenbus_dev_remove(struct device *_dev) } EXPORT_SYMBOL_GPL(xenbus_dev_remove); -void xenbus_dev_shutdown(struct device *_dev) -{ - struct xenbus_device *dev = to_xenbus_device(_dev); - unsigned long timeout = 5*HZ; - - DPRINTK("%s", dev->nodename); - - get_device(>dev); - if (dev->state != XenbusStateConnected) { - pr_info("%s: %s: %s != Connected, skipping\n", - __func__, dev->nodename, xenbus_strstate(dev->state)); - goto out; - } - xenbus_switch_state(dev, XenbusStateClosing); - timeout = wait_for_completion_timeout(>down, timeout); - if (!timeout) - pr_info("%s: %s timeout closing device\n", - __func__, dev->nodename); - out: - put_device(>dev); -} -EXPORT_SYMBOL_GPL(xenbus_dev_shutdown); - int xenbus_register_driver_common(struct xenbus_driver *drv, struct xen_bus_type *bus, struct module *owner, const char *mod_name) diff --git a/drivers/xen/xenbus/xenbus_probe_backend.c b/drivers/xen/xenbus/xenbus_probe_backend.c index b0bed4faf44c..14876faff3b0 100644 --- a/drivers/xen/xenbus/xenbus_probe_backend.c +++ b/drivers/xen/xenbus/xenbus_probe_backend.c @@ -198,7 +198,6 @@ static struct xen_bus_type xenbus_backend = { .uevent = xenbus_uevent_backend, .probe = xenbus_dev_probe, .remove = xenbus_dev_remove, - .shutdown = xenbus_dev_shutdown, .dev_groups = xenbus_dev_groups, }, }; diff --git a/drivers/xen/xenbus/xenbus_probe_frontend.c b/drivers/xen/xenbus/xenbus_probe_frontend.c index a7d90a719cea..8a1650bbe18f 100644 --- a/drivers/xen/xenbus/xenbus_probe_frontend.c +++ b/drivers/xen/xenbus/xenbus_probe_frontend.c @@ -126,6 +126,28 @@ static int xenbus_frontend_dev_probe(struct device *dev) return xenbus_dev_probe(dev); } +static void xenbus_frontend_dev_shutdown(struct device *_dev) +{ + struct xenbus_device *dev = to_xenbus_device(_dev); + unsigned long timeout = 5*HZ; + + DPRINTK("%s", dev->nodename); + + get_device(>dev); + if (dev->state != XenbusStateConnected) { + pr_info("%s: %s: %s != Connected, skipping\n", + __func__, dev->nodename, xenbus_strstate(dev->state)); + goto out; + } + xenbus_switch_state(dev, XenbusStateClosing); + timeout = wait_for_completion_timeout(>down, timeout); + if (!timeout) + pr_info("%s: %s timeout closing device\n", + __func__, dev->nodename); + out: + put_device(>dev); +} + static const struct dev_pm_ops xenbus_pm_ops = { .suspend= xenbus_dev_suspend, .resume = xenbus_frontend_dev_resume, @@ -146,7 +168,7 @@ static struct xen_bus_type xenbus_frontend = { .uevent = xenbus_uevent_frontend, .probe = xenbus_frontend_dev_probe, .remove = xenbus_dev_remove, -
[Xen-devel] [PATCH v3 2/4] xenbus: limit when state is forced to closed
If a driver probe() fails then leave the xenstore state alone. There is no reason to modify it as the failure may be due to transient resource allocation issues and hence a subsequent probe() may succeed. If the driver supports re-binding then only force state to closed during remove() only in the case when the toolstack may need to clean up. This can be detected by checking whether the state in xenstore has been set to closing prior to device removal. NOTE: Re-bind support is indicated by new boolean in struct xenbus_driver, which defaults to false. Subsequent patches will add support to some backend drivers. Signed-off-by: Paul Durrant --- Cc: Boris Ostrovsky Cc: Juergen Gross Cc: Stefano Stabellini v2: - Introduce the 'allow_rebind' flag - Expand the commit comment --- drivers/xen/xenbus/xenbus_probe.c | 12 ++-- include/xen/xenbus.h | 1 + 2 files changed, 11 insertions(+), 2 deletions(-) diff --git a/drivers/xen/xenbus/xenbus_probe.c b/drivers/xen/xenbus/xenbus_probe.c index 5aa29396c9e3..378486b79f96 100644 --- a/drivers/xen/xenbus/xenbus_probe.c +++ b/drivers/xen/xenbus/xenbus_probe.c @@ -255,7 +255,6 @@ int xenbus_dev_probe(struct device *_dev) module_put(drv->driver.owner); fail: xenbus_dev_error(dev, err, "xenbus_dev_probe on %s", dev->nodename); - xenbus_switch_state(dev, XenbusStateClosed); return err; } EXPORT_SYMBOL_GPL(xenbus_dev_probe); @@ -276,7 +275,16 @@ int xenbus_dev_remove(struct device *_dev) free_otherend_details(dev); - xenbus_switch_state(dev, XenbusStateClosed); + /* +* If the toolstack has forced the device state to closing then set +* the state to closed now to allow it to be cleaned up. +* Similarly, if the driver does not support re-bind, set the +* closed. +*/ + if (!drv->allow_rebind || + xenbus_read_driver_state(dev->nodename) == XenbusStateClosing) + xenbus_switch_state(dev, XenbusStateClosed); + return 0; } EXPORT_SYMBOL_GPL(xenbus_dev_remove); diff --git a/include/xen/xenbus.h b/include/xen/xenbus.h index 869c816d5f8c..24228a102141 100644 --- a/include/xen/xenbus.h +++ b/include/xen/xenbus.h @@ -93,6 +93,7 @@ struct xenbus_device_id struct xenbus_driver { const char *name; /* defaults to ids[0].devicetype */ const struct xenbus_device_id *ids; + bool allow_rebind; /* avoid setting xenstore closed during remove */ int (*probe)(struct xenbus_device *dev, const struct xenbus_device_id *id); void (*otherend_changed)(struct xenbus_device *dev, -- 2.20.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v3 0/4] xen-blkback: support live update
Patch #1 is clean-up for an apparent mis-feature. Paul Durrant (4): xenbus: move xenbus_dev_shutdown() into frontend code... xenbus: limit when state is forced to closed xen/interface: re-define FRONT/BACK_RING_ATTACH() xen-blkback: support dynamic unbind/bind drivers/block/xen-blkback/xenbus.c | 56 +++--- drivers/xen/xenbus/xenbus.h| 2 - drivers/xen/xenbus/xenbus_probe.c | 35 -- drivers/xen/xenbus/xenbus_probe_backend.c | 1 - drivers/xen/xenbus/xenbus_probe_frontend.c | 24 +- include/xen/interface/io/ring.h| 29 --- include/xen/xenbus.h | 1 + 7 files changed, 81 insertions(+), 67 deletions(-) --- v3: - Only patch #4 modified Cc: Boris Ostrovsky Cc: Jens Axboe Cc: Juergen Gross Cc: Konrad Rzeszutek Wilk Cc: "Roger Pau Monné" Cc: Stefano Stabellini -- 2.20.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v3 1/2] x86/mm: factor out the code for shattering an l3 PTE
On 11.12.2019 11:58, Hongyan Xia wrote: > map_pages_to_xen and modify_xen_mappings are performing almost exactly > the same operations when shattering an l3 PTE, the only difference > being whether we want to flush. > > Signed-off-by: Hongyan Xia > > --- > Changes in v3: > - style and indentation changes. > - return -ENOMEM instead of -1. > > Changes in v2: > - improve asm. > - re-read pl3e from memory when taking the lock. > - move the allocation of l2t inside the shatter function. > --- > xen/arch/x86/mm.c | 98 +++ > 1 file changed, 49 insertions(+), 49 deletions(-) > > diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c > index 7d4dd80a85..97f11b6016 100644 > --- a/xen/arch/x86/mm.c > +++ b/xen/arch/x86/mm.c > @@ -5151,6 +5151,52 @@ l1_pgentry_t *virt_to_xen_l1e(unsigned long v) > flush_area_local((const void *)v, f) : \ > flush_area_all((const void *)v, f)) > > +/* Shatter an l3 entry and populate l2. If virt is passed in, also do flush. > */ > +static int shatter_l3e(l3_pgentry_t *pl3e, unsigned long virt, bool locking) > +{ > +unsigned int i; > +l3_pgentry_t ol3e = *pl3e; > +l2_pgentry_t l2e = l2e_from_intpte(l3e_get_intpte(ol3e)); > +l2_pgentry_t *l2t = alloc_xen_pagetable(); > + > +if ( !l2t ) > +return -ENOMEM; > + > +for ( i = 0; i < L2_PAGETABLE_ENTRIES; i++ ) > +{ > +l2e_write(l2t + i, l2e); > +l2e = l2e_from_intpte( > + l2e_get_intpte(l2e) + (PAGE_SIZE << PAGETABLE_ORDER)); Andrew - iirc you had suggested this (in some different form, but to the same effect) to improve code generation. If you're convinced that the downside of the loss of type safety is worth the win in generated code, I'm not going to stand in the way here, but it'll then need to be you to ack these two patches in their eventually final shape. > +} > + > +if ( locking ) > +spin_lock(_pgdir_lock); > +if ( (l3e_get_flags(*pl3e) & _PAGE_PRESENT) && > + (l3e_get_flags(*pl3e) & _PAGE_PSE) ) > +{ > +l3e_write_atomic(pl3e, > +l3e_from_paddr((paddr_t)virt_to_maddr(l2t), __PAGE_HYPERVISOR)); Why the cast? (I'm sorry if this was there on v3 already and I didn't spot it. And if this remains the only thing to adjust, then I guess this could be taken care of while committing.) Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2 4/4] xen-blkback: support dynamic unbind/bind
> -Original Message- > > > diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen- > > blkback/xenbus.c > > > index e8c5c54e1d26..13d09630b237 100644 > > > --- a/drivers/block/xen-blkback/xenbus.c > > > +++ b/drivers/block/xen-blkback/xenbus.c > > > @@ -181,6 +181,8 @@ static int xen_blkif_map(struct xen_blkif_ring > > *ring, grant_ref_t *gref, > > > { > > > int err; > > > struct xen_blkif *blkif = ring->blkif; > > > + struct blkif_common_sring *sring_common; > > > + RING_IDX rsp_prod, req_prod; > > > > > > /* Already connected through? */ > > > if (ring->irq) > > > @@ -191,46 +193,66 @@ static int xen_blkif_map(struct xen_blkif_ring > > *ring, grant_ref_t *gref, > > > if (err < 0) > > > return err; > > > > > > + sring_common = (struct blkif_common_sring *)ring->blk_ring; > > > + rsp_prod = READ_ONCE(sring_common->rsp_prod); > > > + req_prod = READ_ONCE(sring_common->req_prod); > > > + > > > switch (blkif->blk_protocol) { > > > case BLKIF_PROTOCOL_NATIVE: > > > { > > > - struct blkif_sring *sring; > > > - sring = (struct blkif_sring *)ring->blk_ring; > > > - BACK_RING_INIT(>blk_rings.native, sring, > > > -XEN_PAGE_SIZE * nr_grefs); > > > + struct blkif_sring *sring_native = > > > + (struct blkif_sring *)ring->blk_ring; > > > > I think you can constify both sring_native and sring_common (and the > > other instances below). > > Yes, I can do that. I don't think the macros would mind. > Spoke to soon. They do mind, of course, because the sring pointer in the front/back ring is not (and should not) be const. I can const sring_common but no others. Paul ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [ovmf test] 144713: regressions - FAIL
flight 144713 ovmf real [real] http://logs.test-lab.xenproject.org/osstest/logs/144713/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: build-amd64 6 xen-buildfail REGR. vs. 144637 build-i386-xsm6 xen-buildfail REGR. vs. 144637 build-amd64-xsm 6 xen-buildfail REGR. vs. 144637 build-i3866 xen-buildfail REGR. vs. 144637 Tests which did not succeed, but are not blocking: build-i386-libvirt1 build-check(1) blocked n/a build-amd64-libvirt 1 build-check(1) blocked n/a test-amd64-i386-xl-qemuu-ovmf-amd64 1 build-check(1) blocked n/a test-amd64-amd64-xl-qemuu-ovmf-amd64 1 build-check(1) blocked n/a version targeted for testing: ovmf 2fe25a74d6fee3c2ac0b930f7f3596cb432e766e baseline version: ovmf 804666c86e7b6f04fe5c5cfdb13199c19e0e99b0 Last test of basis 144637 2019-12-09 09:09:49 Z2 days Failing since144646 2019-12-10 01:39:53 Z1 days9 attempts Testing same since 144713 2019-12-11 12:09:19 Z0 days1 attempts People who touched revisions under test: Antoine Coeur Ard Biesheuvel Bob Feng Jiewen Yao Michael Kubacki Philippe Mathieu-Daude Steven Shi jobs: build-amd64-xsm fail build-i386-xsm fail build-amd64 fail build-i386 fail build-amd64-libvirt blocked build-i386-libvirt blocked build-amd64-pvopspass build-i386-pvops pass test-amd64-amd64-xl-qemuu-ovmf-amd64 blocked test-amd64-i386-xl-qemuu-ovmf-amd64 blocked sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Not pushing. commit 2fe25a74d6fee3c2ac0b930f7f3596cb432e766e Author: Ard Biesheuvel Date: Tue Mar 5 14:32:48 2019 +0100 ArmPkg/MmCommunicationDxe: relay architected PI events to MM context PI defines a few architected events that have significance in the MM context as well as in the non-secure DXE context. So register notify handlers for these events, and relay them into the standalone MM world. Signed-off-by: Ard Biesheuvel Reviewed-by: Jiewen Yao Reviewed-by: Achin Gupta commit d3add11e87dace180387562d6f1951f2bffbd3d9 Author: Michael Kubacki Date: Wed Nov 20 17:31:24 2019 -0800 MdeModulePkg PeiCore: Improve comment semantics This patch clarifies wording in several PeiCore comments to improve reading comprehension. Cc: Dandan Bi Cc: Liming Gao Cc: Jian J Wang Cc: Hao A Wu Signed-off-by: Michael Kubacki Reviewed-by: Liming Gao Reviewed-by: Jian J Wang commit d39d1260c615b716675f67f5c4e1f4f52df01dad Author: Michael Kubacki Date: Wed Nov 20 17:10:48 2019 -0800 MdeModulePkg PeiCore: Fix typos Cc: Dandan Bi Cc: Liming Gao Cc: Jian J Wang Cc: Hao A Wu Signed-off-by: Michael Kubacki Reviewed-by: Liming Gao Reviewed-by: Philippe Mathieu-Daude Reviewed-by: Jian J Wang commit 97eedf5dfbaffde33210fd88066247cf0b7d3325 Author: Antoine Coeur Date: Wed Dec 4 12:14:53 2019 +0800 IntelFsp2WrapperPkg: Fix various typos Fix various typos in comments and documentation. Cc: Chasel Chiu Cc: Nate DeSimone Cc: Star Zeng Reviewed-by: Philippe Mathieu-Daude Signed-off-by: Philippe Mathieu-Daude Reviewed-by: Nate DeSimone Reviewed-by: Chasel Chiu Reviewed-by: Star Zeng commit 7e55cf6b48dcd43de46d008b2f12caaad2554503 Author: Jiewen Yao Date: Sat Dec 7 21:41:10 2019 +0800 SecurityPkg/Tcg2Smm: Measure the table before patch. REF: https://bugzilla.tianocore.org/show_bug.cgi?id=1940 According to TCG PFP specification: the ACPI table must be measured prior to any modification, and the measurement must
[Xen-devel] [xen-unstable-smoke test] 144711: tolerable all pass - PUSHED
flight 144711 xen-unstable-smoke real [real] http://logs.test-lab.xenproject.org/osstest/logs/144711/ Failures :-/ but no regressions. Tests which did not succeed, but are not blocking: test-amd64-amd64-libvirt 13 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 13 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 14 saverestore-support-checkfail never pass test-armhf-armhf-xl 13 migrate-support-checkfail never pass test-armhf-armhf-xl 14 saverestore-support-checkfail never pass version targeted for testing: xen 80268f63640f5c3a9a4e1c688a62e35448fde9e2 baseline version: xen 272c18435e93cbf749c096a9552ab5ef0d79a4ca Last test of basis 144672 2019-12-10 18:00:40 Z0 days Testing same since 144711 2019-12-11 12:02:23 Z0 days1 attempts People who touched revisions under test: Jan Beulich Juergen Gross Wei Liu jobs: build-arm64-xsm pass build-amd64 pass build-armhf pass build-amd64-libvirt pass test-armhf-armhf-xl pass test-arm64-arm64-xl-xsm pass test-amd64-amd64-xl-qemuu-debianhvm-amd64pass test-amd64-amd64-libvirt pass sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Pushing revision : To xenbits.xen.org:/home/xen/git/xen.git 272c18435e..80268f6364 80268f63640f5c3a9a4e1c688a62e35448fde9e2 -> smoke ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] xen-blkback: prevent premature module unload
> -Original Message- > From: Roger Pau Monné > Sent: 11 December 2019 13:55 > To: Durrant, Paul ; Juergen Gross > Cc: xen-devel@lists.xenproject.org; linux-bl...@vger.kernel.org; linux- > ker...@vger.kernel.org; Konrad Rzeszutek Wilk ; > Jens Axboe > Subject: Re: [PATCH] xen-blkback: prevent premature module unload > > On Wed, Dec 11, 2019 at 01:27:42PM +, Durrant, Paul wrote: > > > -Original Message- > > > From: Roger Pau Monné > > > Sent: 11 December 2019 11:29 > > > To: Durrant, Paul > > > Cc: xen-devel@lists.xenproject.org; linux-bl...@vger.kernel.org; > linux- > > > ker...@vger.kernel.org; Konrad Rzeszutek Wilk > ; > > > Jens Axboe > > > Subject: Re: [PATCH] xen-blkback: prevent premature module unload > > > > > > On Tue, Dec 10, 2019 at 02:53:05PM +, Paul Durrant wrote: > > > > Objects allocated by xen_blkif_alloc come from the 'blkif_cache' > kmem > > > > cache. This cache is destoyed when xen-blkif is unloaded so it is > > > > necessary to wait for the deferred free routine used for such > objects to > > > > complete. This necessity was missed in commit 14855954f636 "xen- > blkback: > > > > allow module to be cleanly unloaded". This patch fixes the problem > by > > > > taking/releasing extra module references in xen_blkif_alloc/free() > > > > respectively. > > > > > > > > Signed-off-by: Paul Durrant > > > > > > Reviewed-by: Roger Pau Monné > > > > > > One nit below. > > > > > > > --- > > > > Cc: Konrad Rzeszutek Wilk > > > > Cc: "Roger Pau Monné" > > > > Cc: Jens Axboe > > > > --- > > > > drivers/block/xen-blkback/xenbus.c | 10 ++ > > > > 1 file changed, 10 insertions(+) > > > > > > > > diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen- > > > blkback/xenbus.c > > > > index e8c5c54e1d26..59d576d27ca7 100644 > > > > --- a/drivers/block/xen-blkback/xenbus.c > > > > +++ b/drivers/block/xen-blkback/xenbus.c > > > > @@ -171,6 +171,15 @@ static struct xen_blkif > *xen_blkif_alloc(domid_t > > > domid) > > > > blkif->domid = domid; > > > > atomic_set(>refcnt, 1); > > > > init_completion(>drain_complete); > > > > + > > > > + /* > > > > +* Because freeing back to the cache may be deferred, it is > not > > > > +* safe to unload the module (and hence destroy the cache) > until > > > > +* this has completed. To prevent premature unloading, take an > > > > +* extra module reference here and release only when the > object > > > > +* has been free back to the cache. > > > ^ freed > > > > Oh yes. Can this be done on commit, or would you like me to send a v2? > > Adjusting on commit would be fine for me, but it's up to Juergen since > he is the one that will pick this up. IIRC the module unload patches > didn't go through the block subsystem. True. I forgot manually add Juergen cc list. Paul > > Thanks, Roger. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] xen-blkback: prevent premature module unload
On 11.12.19 14:55, Roger Pau Monné wrote: On Wed, Dec 11, 2019 at 01:27:42PM +, Durrant, Paul wrote: -Original Message- From: Roger Pau Monné Sent: 11 December 2019 11:29 To: Durrant, Paul Cc: xen-devel@lists.xenproject.org; linux-bl...@vger.kernel.org; linux- ker...@vger.kernel.org; Konrad Rzeszutek Wilk ; Jens Axboe Subject: Re: [PATCH] xen-blkback: prevent premature module unload On Tue, Dec 10, 2019 at 02:53:05PM +, Paul Durrant wrote: Objects allocated by xen_blkif_alloc come from the 'blkif_cache' kmem cache. This cache is destoyed when xen-blkif is unloaded so it is necessary to wait for the deferred free routine used for such objects to complete. This necessity was missed in commit 14855954f636 "xen-blkback: allow module to be cleanly unloaded". This patch fixes the problem by taking/releasing extra module references in xen_blkif_alloc/free() respectively. Signed-off-by: Paul Durrant Reviewed-by: Roger Pau Monné One nit below. --- Cc: Konrad Rzeszutek Wilk Cc: "Roger Pau Monné" Cc: Jens Axboe --- drivers/block/xen-blkback/xenbus.c | 10 ++ 1 file changed, 10 insertions(+) diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen- blkback/xenbus.c index e8c5c54e1d26..59d576d27ca7 100644 --- a/drivers/block/xen-blkback/xenbus.c +++ b/drivers/block/xen-blkback/xenbus.c @@ -171,6 +171,15 @@ static struct xen_blkif *xen_blkif_alloc(domid_t domid) blkif->domid = domid; atomic_set(>refcnt, 1); init_completion(>drain_complete); + + /* +* Because freeing back to the cache may be deferred, it is not +* safe to unload the module (and hence destroy the cache) until +* this has completed. To prevent premature unloading, take an +* extra module reference here and release only when the object +* has been free back to the cache. ^ freed Oh yes. Can this be done on commit, or would you like me to send a v2? Adjusting on commit would be fine for me, but it's up to Juergen since he is the one that will pick this up. IIRC the module unload patches didn't go through the block subsystem. Oh, right. Yes, will fix this when committing. Juergen ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] xen-blkback: prevent premature module unload
On Wed, Dec 11, 2019 at 01:27:42PM +, Durrant, Paul wrote: > > -Original Message- > > From: Roger Pau Monné > > Sent: 11 December 2019 11:29 > > To: Durrant, Paul > > Cc: xen-devel@lists.xenproject.org; linux-bl...@vger.kernel.org; linux- > > ker...@vger.kernel.org; Konrad Rzeszutek Wilk ; > > Jens Axboe > > Subject: Re: [PATCH] xen-blkback: prevent premature module unload > > > > On Tue, Dec 10, 2019 at 02:53:05PM +, Paul Durrant wrote: > > > Objects allocated by xen_blkif_alloc come from the 'blkif_cache' kmem > > > cache. This cache is destoyed when xen-blkif is unloaded so it is > > > necessary to wait for the deferred free routine used for such objects to > > > complete. This necessity was missed in commit 14855954f636 "xen-blkback: > > > allow module to be cleanly unloaded". This patch fixes the problem by > > > taking/releasing extra module references in xen_blkif_alloc/free() > > > respectively. > > > > > > Signed-off-by: Paul Durrant > > > > Reviewed-by: Roger Pau Monné > > > > One nit below. > > > > > --- > > > Cc: Konrad Rzeszutek Wilk > > > Cc: "Roger Pau Monné" > > > Cc: Jens Axboe > > > --- > > > drivers/block/xen-blkback/xenbus.c | 10 ++ > > > 1 file changed, 10 insertions(+) > > > > > > diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen- > > blkback/xenbus.c > > > index e8c5c54e1d26..59d576d27ca7 100644 > > > --- a/drivers/block/xen-blkback/xenbus.c > > > +++ b/drivers/block/xen-blkback/xenbus.c > > > @@ -171,6 +171,15 @@ static struct xen_blkif *xen_blkif_alloc(domid_t > > domid) > > > blkif->domid = domid; > > > atomic_set(>refcnt, 1); > > > init_completion(>drain_complete); > > > + > > > + /* > > > + * Because freeing back to the cache may be deferred, it is not > > > + * safe to unload the module (and hence destroy the cache) until > > > + * this has completed. To prevent premature unloading, take an > > > + * extra module reference here and release only when the object > > > + * has been free back to the cache. > > ^ freed > > Oh yes. Can this be done on commit, or would you like me to send a v2? Adjusting on commit would be fine for me, but it's up to Juergen since he is the one that will pick this up. IIRC the module unload patches didn't go through the block subsystem. Thanks, Roger. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2 2/4] xenbus: limit when state is forced to closed
> -Original Message- > From: Jürgen Groß > Sent: 11 December 2019 10:21 > To: Durrant, Paul ; Roger Pau Monné > > Cc: xen-devel@lists.xenproject.org; linux-ker...@vger.kernel.org; Stefano > Stabellini ; Boris Ostrovsky > > Subject: Re: [Xen-devel] [PATCH v2 2/4] xenbus: limit when state is forced > to closed > > On 11.12.19 11:14, Durrant, Paul wrote: > >> -Original Message- > >> From: Roger Pau Monné > >> Sent: 11 December 2019 10:06 > >> To: Durrant, Paul > >> Cc: xen-devel@lists.xenproject.org; linux-ker...@vger.kernel.org; > Juergen > >> Gross ; Stefano Stabellini ; > >> Boris Ostrovsky > >> Subject: Re: [Xen-devel] [PATCH v2 2/4] xenbus: limit when state is > forced > >> to closed > >> > >> On Tue, Dec 10, 2019 at 11:33:45AM +, Paul Durrant wrote: > >>> If a driver probe() fails then leave the xenstore state alone. There > is > >> no > >>> reason to modify it as the failure may be due to transient resource > >>> allocation issues and hence a subsequent probe() may succeed. > >>> > >>> If the driver supports re-binding then only force state to closed > during > >>> remove() only in the case when the toolstack may need to clean up. > This > >> can > >>> be detected by checking whether the state in xenstore has been set to > >>> closing prior to device removal. > >>> > >>> NOTE: Re-bind support is indicated by new boolean in struct > >> xenbus_driver, > >>>which defaults to false. Subsequent patches will add support to > >>>some backend drivers. > >> > >> My intention was to specify whether you want to close the > >> backends on unbind in sysfs, so that an user can decide at runtime, > >> rather than having a hardcoded value in the driver. > >> > >> Anyway, I'm less sure whether such runtime tunable is useful at all, > >> so let's leave it out and can always be added afterwards. At the end > >> of day a user wrongly doing a rmmod blkback can always recover > >> gracefully by loading blkback again with your proposed approach to > >> leave connections open on module removal. > >> > >> Sorry for the extra work. > >> > > > > Does this mean you don't think the extra driver flag is necessary any > more? NB: now that xenbus actually takes module references you can't > accidentally rmmod any more :-) > > I'd like it to be kept, please. > Ok. I'll leave this patch alone then. Paul > Juergen ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] xen-blkback: prevent premature module unload
> -Original Message- > From: Roger Pau Monné > Sent: 11 December 2019 11:29 > To: Durrant, Paul > Cc: xen-devel@lists.xenproject.org; linux-bl...@vger.kernel.org; linux- > ker...@vger.kernel.org; Konrad Rzeszutek Wilk ; > Jens Axboe > Subject: Re: [PATCH] xen-blkback: prevent premature module unload > > On Tue, Dec 10, 2019 at 02:53:05PM +, Paul Durrant wrote: > > Objects allocated by xen_blkif_alloc come from the 'blkif_cache' kmem > > cache. This cache is destoyed when xen-blkif is unloaded so it is > > necessary to wait for the deferred free routine used for such objects to > > complete. This necessity was missed in commit 14855954f636 "xen-blkback: > > allow module to be cleanly unloaded". This patch fixes the problem by > > taking/releasing extra module references in xen_blkif_alloc/free() > > respectively. > > > > Signed-off-by: Paul Durrant > > Reviewed-by: Roger Pau Monné > > One nit below. > > > --- > > Cc: Konrad Rzeszutek Wilk > > Cc: "Roger Pau Monné" > > Cc: Jens Axboe > > --- > > drivers/block/xen-blkback/xenbus.c | 10 ++ > > 1 file changed, 10 insertions(+) > > > > diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen- > blkback/xenbus.c > > index e8c5c54e1d26..59d576d27ca7 100644 > > --- a/drivers/block/xen-blkback/xenbus.c > > +++ b/drivers/block/xen-blkback/xenbus.c > > @@ -171,6 +171,15 @@ static struct xen_blkif *xen_blkif_alloc(domid_t > domid) > > blkif->domid = domid; > > atomic_set(>refcnt, 1); > > init_completion(>drain_complete); > > + > > + /* > > +* Because freeing back to the cache may be deferred, it is not > > +* safe to unload the module (and hence destroy the cache) until > > +* this has completed. To prevent premature unloading, take an > > +* extra module reference here and release only when the object > > +* has been free back to the cache. > ^ freed Oh yes. Can this be done on commit, or would you like me to send a v2? Paul > > +*/ > > + __module_get(THIS_MODULE); > > INIT_WORK(>free_work, xen_blkif_deferred_free); > > > > return blkif; > > @@ -320,6 +329,7 @@ static void xen_blkif_free(struct xen_blkif *blkif) > > > > /* Make sure everything is drained before shutting down */ > > kmem_cache_free(xen_blkif_cachep, blkif); > > + module_put(THIS_MODULE); > > } > > > > int __init xen_blkif_interface_init(void) > > -- > > 2.20.1 > > ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Xen Security Advisory 311 v4 (CVE-2019-19577) - Bugs in dynamic height handling for AMD IOMMU pagetables
On 11/12/2019 13:09, Xen.org security team wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA256 > > Xen Security Advisory CVE-2019-19577 / XSA-311 >version 4 > > Bugs in dynamic height handling for AMD IOMMU pagetables > > > CREDITS > === > > This issue was discovered by Sander Eikelenboom, along with Andrew Cooper of > Citrix. Ahh this was why Jan's two patches were skipped, I was about to inquire if it would be picked up in the future in some form. -- Sander ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH] IOMMU: make DMA containment of quarantined devices optional
Containing still in flight DMA was introduced to work around certain devices / systems hanging hard upon hitting an IOMMU fault. Passing through (such) devices (on such systems) is inherently insecure (as guests could easily arrange for IOMMU faults to occur). Defaulting to a mode where admins may not even become aware of issues with devices can be considered undesirable. Therefore convert this mode of operation to an optional one, not one enabled by default. This involves resurrecting code commit ea38867831da ("x86 / iommu: set up a scratch page in the quarantine domain") did remove, in a slightly extended fashion. Here, instead of reintroducing a pretty pointless use of "goto" in domain_context_unmap(), and instead of making the function (at least temporarily) inconsistent, take the opportunity and replace the other similarly pointless "goto" as well. In order to key the re-instated bypasses off of there (not) being a root page table this further requires moving the allocate_domain_resources() invocation from reassign_device() to amd_iommu_setup_domain_device() (or else reassign_device() would allocate a root page table anyway); this is benign to the second caller of the latter function. Signed-off-by: Jan Beulich --- I'm happy to take better suggestions to replace "full". --- a/docs/misc/xen-command-line.pandoc +++ b/docs/misc/xen-command-line.pandoc @@ -1232,7 +1232,7 @@ detection of systems known to misbehave > Default: `new` unless directed-EOI is supported ### iommu -= List of [ , verbose, debug, force, required, quarantine, += List of [ , verbose, debug, force, required, quarantine[=full], sharept, intremap, intpost, crash-disable, snoop, qinval, igfx, amd-iommu-perdev-intremap, dom0-{passthrough,strict} ] @@ -1270,11 +1270,13 @@ boolean (e.g. `iommu=no`) can override t will prevent Xen from booting if IOMMUs aren't discovered and enabled successfully. -* The `quarantine` boolean can be used to control Xen's behavior when -de-assigning devices from guests. If enabled (the default), Xen always -quarantines such devices; they must be explicitly assigned back to Dom0 -before they can be used there again. If disabled, Xen will only -quarantine devices the toolstack hass arranged for getting quarantined. +* The `quarantine` option can be used to control Xen's behavior when +de-assigning devices from guests. If set to true (the default), Xen +always quarantines such devices; they must be explicitly assigned back +to Dom0 before they can be used there again. If set to "full", still +active DMA will additionally be directed to a "sink" page. If set to +false, Xen will only quarantine devices the toolstack has arranged for +getting quarantined. * The `sharept` boolean controls whether the IOMMU pagetables are shared with the CPU-side HAP pagetables, or allocated separately. Sharing --- a/xen/drivers/passthrough/amd/pci_amd_iommu.c +++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c @@ -85,18 +85,36 @@ int get_dma_requestor_id(uint16_t seg, u return req_id; } -static void amd_iommu_setup_domain_device( +static int __must_check allocate_domain_resources(struct domain_iommu *hd) +{ +int rc; + +spin_lock(>arch.mapping_lock); +rc = amd_iommu_alloc_root(hd); +spin_unlock(>arch.mapping_lock); + +return rc; +} + +static int __must_check amd_iommu_setup_domain_device( struct domain *domain, struct amd_iommu *iommu, uint8_t devfn, struct pci_dev *pdev) { struct amd_iommu_dte *table, *dte; unsigned long flags; -int req_id, valid = 1; +int req_id, valid = 1, rc; u8 bus = pdev->bus; -const struct domain_iommu *hd = dom_iommu(domain); +struct domain_iommu *hd = dom_iommu(domain); + +/* dom_io is used as a sentinel for quarantined devices */ +if ( domain == dom_io && !hd->arch.root_table ) +return 0; + +BUG_ON(!hd->arch.paging_mode || !iommu->dev_table.buffer); -BUG_ON( !hd->arch.root_table || !hd->arch.paging_mode || -!iommu->dev_table.buffer ); +rc = allocate_domain_resources(hd); +if ( rc ) +return rc; if ( iommu_hwdom_passthrough && is_hardware_domain(domain) ) valid = 0; @@ -151,6 +169,8 @@ static void amd_iommu_setup_domain_devic amd_iommu_flush_iotlb(devfn, pdev, INV_IOMMU_ALL_PAGES_ADDRESS, 0); } + +return 0; } int __init acpi_ivrs_init(void) @@ -220,17 +240,6 @@ int amd_iommu_alloc_root(struct domain_i return 0; } -static int __must_check allocate_domain_resources(struct domain_iommu *hd) -{ -int rc; - -spin_lock(>arch.mapping_lock); -rc = amd_iommu_alloc_root(hd); -spin_unlock(>arch.mapping_lock); - -return rc; -} - int amd_iommu_get_paging_mode(unsigned long entries) { int level = 1; @@ -287,6 +296,10 @@ static void amd_iommu_disable_domain_dev int req_id; u8 bus =
Re: [Xen-devel] [PATCH v6 1/3] xenbus/backend: Add memory pressure handler callback
On 11.12.19 12:46, Roger Pau Monné wrote: On Wed, Dec 11, 2019 at 04:26:57AM +, SeongJae Park wrote: + return 0; } subsys_initcall(xenbus_probe_backend_init); diff --git a/include/xen/xenbus.h b/include/xen/xenbus.h index 869c816d5f8c..196260017666 100644 --- a/include/xen/xenbus.h +++ b/include/xen/xenbus.h @@ -104,6 +104,7 @@ struct xenbus_driver { struct device_driver driver; int (*read_otherend_details)(struct xenbus_device *dev); int (*is_ready)(struct xenbus_device *dev); + void (*reclaim)(struct xenbus_device *dev); reclaim_memory (if Juergen agrees). I do agree. Juergen ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v6 1/3] xenbus/backend: Add memory pressure handler callback
On Wed, 11 Dec 2019 12:46:51 +0100 "Roger Pau Monné" wrote: > > Granting pages consumes backend system memory. In systems configured > > with insufficient spare memory for those pages, it can cause a memory > > pressure situation. However, finding the optimal amount of the spare > ^ s/the// > > memory is challenging for large systems having dynamic resource > > utilization patterns. Also, such a static configuration might lack > > flexibility. > > > > To mitigate such problems, this commit adds a memory reclaim callback to > > 'xenbus_driver'. If a memory pressure is detected, 'xenbus' requests >^ s/a// > > every backend driver to volunarily release its memory. > > > > Note that it would be able to improve the callback facility for more > ^ possible > > sophisticated handlings of general pressures. For example, it would be > ^ handling of resource starvation. > > possible to monitor the memory consumption of each device and issue the > > release requests to only devices which causing the pressure. Also, the > > callback could be extended to handle not only memory, but general > > resources. Nevertheless, this version of the implementation defers such > > sophisticated goals as a future work. > > > > Reviewed-by: Juergen Gross > > Signed-off-by: SeongJae Park > > --- > > drivers/xen/xenbus/xenbus_probe_backend.c | 32 +++ > > include/xen/xenbus.h | 1 + > > 2 files changed, 33 insertions(+) > > > > diff --git a/drivers/xen/xenbus/xenbus_probe_backend.c > > b/drivers/xen/xenbus/xenbus_probe_backend.c > > index b0bed4faf44c..aedbe2198de5 100644 > > --- a/drivers/xen/xenbus/xenbus_probe_backend.c > > +++ b/drivers/xen/xenbus/xenbus_probe_backend.c > > @@ -248,6 +248,35 @@ static int backend_probe_and_watch(struct > > notifier_block *notifier, > > return NOTIFY_DONE; > > } > > > > +static int xenbus_backend_reclaim(struct device *dev, void *data) > > No need for the xenbus_ prefix since it's a static function, ie: > backend_reclaim_memory should be fine IMO. Agreed, will change the name in the next version. > > > +{ > > + struct xenbus_driver *drv; > > I've asked for this variable to be constified in v5, is it not > possible to make it const? Sorry, my mistake... I was difinitely too hurry. > > > + > > + if (!dev->driver) > > + return 0; > > + drv = to_xenbus_driver(dev->driver); > > + if (drv && drv->reclaim) > > + drv->reclaim(to_xenbus_device(dev)); > > + return 0; > > +} > > + > > +/* > > + * Returns 0 always because we are using shrinker to only detect memory > > + * pressure. > > + */ > > +static unsigned long xenbus_backend_shrink_count(struct shrinker *shrinker, > > + struct shrink_control *sc) > > +{ > > + bus_for_each_dev(_backend.bus, NULL, NULL, > > + xenbus_backend_reclaim); > > + return 0; > > +} > > + > > +static struct shrinker xenbus_backend_shrinker = { > > I would drop the xenbus prefix, and I think it's not possible to > constify this due to register_shrinker expecting a non-const > parameter? Yes, constifying it results in another compile warning. Will drop the prefix. > > > + .count_objects = xenbus_backend_shrink_count, > > + .seeks = DEFAULT_SEEKS, > > +}; > > + > > static int __init xenbus_probe_backend_init(void) > > { > > static struct notifier_block xenstore_notifier = { > > @@ -264,6 +293,9 @@ static int __init xenbus_probe_backend_init(void) > > > > register_xenstore_notifier(_notifier); > > > > + if (register_shrinker(_backend_shrinker)) > > + pr_warn("shrinker registration failed\n"); > > Can you add a xenbus prefix to the error message? Or else it's hard to > know which subsystem is complaining when you see such message on the > log. ie: "xenbus: shrinker ..." Because we have #define `pr_fmt(fmt) KBUILD_MODNAME ": " fmt` in the beginning of the file, the message will have a proper prefix. > > > + > > return 0; > > } > > subsys_initcall(xenbus_probe_backend_init); > > diff --git a/include/xen/xenbus.h b/include/xen/xenbus.h > > index 869c816d5f8c..196260017666 100644 > > --- a/include/xen/xenbus.h > > +++ b/include/xen/xenbus.h > > @@ -104,6 +104,7 @@ struct xenbus_driver { > > struct device_driver driver; > > int (*read_otherend_details)(struct xenbus_device *dev); > > int (*is_ready)(struct xenbus_device *dev); > > + void (*reclaim)(struct xenbus_device *dev); > > reclaim_memory (if Juergen agrees). Okay. Thanks, SeongJae Park > > Thanks, Roger. > ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] Xen Security Advisory 308 v3 (CVE-2019-19583) - VMX: VMentry failure with debug exceptions and blocked states
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Xen Security Advisory CVE-2019-19583 / XSA-308 version 3 VMX: VMentry failure with debug exceptions and blocked states UPDATES IN VERSION 3 Public release. Updated metadata to add 4.13, update StableRef's ISSUE DESCRIPTION = Please see XSA-260 for background on the MovSS shadow: http://xenbits.xen.org/xsa/advisory-260.html Please see XSA-156 for background on the need for #DB interception: http://xenbits.xen.org/xsa/advisory-156.html The VMX VMEntry checks does not like the exact combination of state which occurs when #DB in intercepted, Single Stepping is active, and blocked by STI/MovSS is active, despite this being a legitimate state to be in. The resulting VMEntry failure is fatal to the guest. IMPACT == HVM/PVH guest userspace code may be able to crash the guest, resulting in a guest Denial of Service. VULNERABLE SYSTEMS == All versions of Xen are affected. Only systems supporting VMX hardware virtual extensions (Intel, Cyrix or Zhaoxin CPUs) are affected. Arm and AMD systems are unaffected. Only HVM/PVH guests are affected. PV guests cannot leverage the vulnerability. MITIGATION == Running only PV guests will avoid this vulnerability. Running HVM guests on only AMD hardware will also avoid this vulnerability. CREDITS === This issue was discovered by Håkon Alstadheim and diagnosed as a security issue by Andrew Cooper of Citrix. RESOLUTION == Applying the attached patch resolves this issue. xsa308.patch xen-unstable, Xen 4.13.x .. Xen 4.8.x $ sha256sum xsa308* 4aa06d21478d9debb12388ff14d8abc31982e18895db40d0cec78fcc9fe68ef2 xsa308.meta 7e782b09b16f7534c8db52042f7bb3bd730d108571c8b10af184ae0b02fdae9d xsa308.patch $ DEPLOYMENT DURING EMBARGO = Deployment of the patches and/or mitigations described above (or others which are substantially similar) is permitted during the embargo, even on public-facing systems with untrusted guest users and administrators. But: Distribution of updated software is prohibited (except to other members of the predisclosure list). Predisclosure list members who wish to deploy significantly different patches and/or mitigations, please contact the Xen Project Security Team. -BEGIN PGP SIGNATURE- iQFABAEBCAAqFiEEI+MiLBRfRHX6gGCng/4UyVfoK9kFAl3w3FsMHHBncEB4ZW4u b3JnAAoJEIP+FMlX6CvZWHwIAIfuiZE/IyxMwTAkZL3EugBnlxxHodoBuj6imn+n c9DvMk3TCi3vSgvZQtVpP0eNuuLN5285hVyI95lRE0LTmtRLc7jATktStRTgGkua znW8U1sqkVRWJcVuN4uAM2zIY60pMZnFjZxdJW12+wpcA13LInE1cDWnlRv+cdD9 7DtVkGUWXjfbcm3KXGZw8YpKvTgVp983VpywR/1lzXZ+MexWzKuEco8fZFayw0ne 3nT/23Y1ofjCflNFjc7HoeJZl+zy493J/rqHS8yYI3d4vTdIfjue3rZ/X6305el9 zjCG5zXygrWVAoKGWVnPZweX1jw8rd6BlsPTqQb53UH94zc= =yTxW -END PGP SIGNATURE- xsa308.meta Description: Binary data xsa308.patch Description: Binary data ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] Xen Security Advisory 310 v3 (CVE-2019-19580) - Further issues with restartable PV type change operations
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Xen Security Advisory CVE-2019-19580 / XSA-310 version 3 Further issues with restartable PV type change operations UPDATES IN VERSION 3 Public release. Updated metadata to add 4.13, update StableRef's ISSUE DESCRIPTION = XSA-299 addressed several critical issues in restartable PV type change operations. Despite extensive testing and auditing, some corner cases were missed. IMPACT == A malicious PV guest administrator may be able to escalate their privilege to that of the host. VULNERABLE SYSTEMS == All security-supported versions of Xen are vulnerable. Only x86 systems are affected. Arm systems are not affected. Only x86 PV guests can leverage the vulnerability. x86 HVM and PVH guests cannot leverage the vulnerability. Note that these attacks require very precise timing, which may be difficult to exploit in practice. MITIGATION == Running only HVM or PVH guests will avoid this vulnerability. Running PV guests in "shim" mode will also avoid this vulnerability. CREDITS === This issue was discovered by Sarah Newman at prgmr.com. RESOLUTION == Applying the appropriate attached patch resolves this issue. xsa310/*.patch xen-unstable, Xen 4.13 - 4.10 xsa310-4.9/*.patch Xen 4.9 - 4.8 $ sha256sum xsa310* xsa310*/* 2208e40c71aa521ae487782bd751963ce696be451d10a179fcecdff7a0065369 xsa310.meta 8e75f0fb5fe890a661c8d46ec622131bc650f1a95b170b99569b50dd2224616c xsa310-4.9/0001-x86-mm-Set-old_guest_table-when-destroying-vcpu-page.patch 3da404a0c088936ed92377ccef1fa6fdeb23900358ca9284e3488e8e1dcb5dd2 xsa310-4.9/0002-x86-mm-alloc-free_lN_table-Retain-partial_flags-on-E.patch cd1a77c2f767474dcfbd1e6282ad3219ce2abcac2021b040120d40b52fc76bc8 xsa310-4.9/0003-x86-mm-relinquish_memory-Grab-an-extra-type-ref-when.patch 44c670a1b1b8164202766d52fb741e62c104118525eb7a3e56f4b232bcb8be3f xsa310/0001-x86-mm-Set-old_guest_table-when-destroying-vcpu-page.patch 173dc0ffb4c572c8493bd9d5f3309b113e51888bdc9e462c78933f5c85f69b7a xsa310/0002-x86-mm-alloc-free_lN_table-Retain-partial_flags-on-E.patch 1833fbfc2cdea9b37f161b09df947dffdd8db5e60a2f3512913de0e0c0d4b3ef xsa310/0003-x86-mm-relinquish_memory-Grab-an-extra-type-ref-when.patch $ DEPLOYMENT DURING EMBARGO = Deployment of the patches and/or mitigations described above (or others which are substantially similar) is permitted during the embargo, even on public-facing systems with untrusted guest users and administrators. But: Distribution of updated software is prohibited (except to other members of the predisclosure list). Predisclosure list members who wish to deploy significantly different patches and/or mitigations, please contact the Xen Project Security Team. (Note: this during-embargo deployment notice is retained in post-embargo publicly released Xen Project advisories, even though it is then no longer applicable. This is to enable the community to have oversight of the Xen Project Security Team's decisionmaking.) For more information about permissible uses of embargoed information, consult the Xen Project community's agreed Security Policy: http://www.xenproject.org/security-policy.html -BEGIN PGP SIGNATURE- iQFABAEBCAAqFiEEI+MiLBRfRHX6gGCng/4UyVfoK9kFAl3w3F0MHHBncEB4ZW4u b3JnAAoJEIP+FMlX6CvZ1noH/i6Sb3F6ZiaSl460OvdCRKd9lZm3ONunOH4IHuc6 +Q/G0G4b48UYfK/8FSAAjldv8tPOA5+j3GAFr2JgVtTWjP7tZyzSs0tDvn37sZrZ D3l0AeOHxLCuSRxnoRDtpKiuJv71DrnYEfCDdc6R4DTZuciOWYpYq6PQTac5bLZX 8G5nR+33SvzdIpncvONa0Xqm1+Cgy8yOOQQJHeQvN7GJfVvs6AHepU5zuP2Ez42W ReNA6o13xwiI8LGKvf8cV7s74JklIxR9gzkv4bBtMKInUY2loSIbKpI8E9GsVa3n VOJ2kwKgGgszewBoVyJdGYY1ZlXeIdPjOj7+575bsRnDlGo= =f2/B -END PGP SIGNATURE- xsa310.meta Description: Binary data xsa310-4.9/0001-x86-mm-Set-old_guest_table-when-destroying-vcpu-page.patch Description: Binary data xsa310-4.9/0002-x86-mm-alloc-free_lN_table-Retain-partial_flags-on-E.patch Description: Binary data xsa310-4.9/0003-x86-mm-relinquish_memory-Grab-an-extra-type-ref-when.patch Description: Binary data xsa310/0001-x86-mm-Set-old_guest_table-when-destroying-vcpu-page.patch Description: Binary data xsa310/0002-x86-mm-alloc-free_lN_table-Retain-partial_flags-on-E.patch Description: Binary data xsa310/0003-x86-mm-relinquish_memory-Grab-an-extra-type-ref-when.patch Description: Binary data ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] Xen Security Advisory 309 v3 (CVE-2019-19578) - Linear pagetable use / entry miscounts
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Xen Security Advisory CVE-2019-19578 / XSA-309 version 3 Linear pagetable use / entry miscounts UPDATES IN VERSION 3 Public release. Updated metadata to add 4.13, update StableRef's ISSUE DESCRIPTION = "Linear pagetables" is a technique which involves either pointing a pagetable at itself, or to another pagetable of the same or higher level. Xen has limited support for linear pagetables: A page may either point to itself, or point to another pagetable of the same level (i.e., L2 to L2, L3 to L3, and so on). XSA-240 introduced an additional restriction that limited the "depth" of such chains by allowing pages to either *point to* other pages of the same level, or *be pointed to* by other pages of the same level, but not both. To implement this, we keep track of the number of outstanding times a page points to or is pointed to another page table, to prevent both from happening at the same time. Unfortunately, the original commit introducing this reset this count when resuming validation of a partially-validated pagetable, incorrectly dropping some "linear_pt_entry" counts. If an attacker could engineer such a situation to occur, they might be able to make loops or other arbitrary chains of linear pagetables, as described in XSA-240. IMPACT == A malicious or buggy PV guest may cause the hypervisor to crash, resulting in Denial of Service (DoS) affecting the entire host. Privilege escalation and information leaks cannot be excluded. VULNERABLE SYSTEMS == All versions of Xen are vulnerable. Only x86 systems are affected. Arm systems are not affected. Only x86 PV guests can leverage the vulnerability. x86 HVM and PVH guests cannot leverage the vulnerability. Only systems which have enabled linear pagetables are vulnerable. Systems which have disabled linear pagetables, either by selecting CONFIG_PV_LINEAR_PT=n when building the hypervisor, or adding pv-linear-pt=false on the command-line, are not vulnerable. MITIGATION == If you don't have any guests which need linear pagetables, you can disable the feature by adding pv-linear-pt=false to your Xen command-line. NetBSD is known to use linear pagetables; Linux and MiniOS are known not to use linear pagetables. CREDITS === This issue was discovered by Manuel Bouyer and diagnosed as a security issue by Jan Beulich of SUSE. RESOLUTION == Applying the appropriate attached patch resolves this issue. xsa309.patch xen-unstable, Xen 4.13 - Xen 4.8 $ sha256sum xsa309* ddd00dfbc85bada4e4cee8a51b989e3138cc47c58992657054246bc95c8ae34d xsa309.meta 0e4b75f4416624de698f3ed619c28418917ab0a5c9663c1641804e1d0a0dec1b xsa309.patch $ DEPLOYMENT DURING EMBARGO = Deployment of the patches described above (or others which are substantially similar) is permitted during the embargo, even on public-facing systems with untrusted guest users and administrators. But: Distribution of updated software is prohibited (except to other members of the predisclosure list). Deployment of the `pv-linear-pt=false` mitigation is NOT permitted (except where all the affected systems and VMs are administered and used only by organisations which are members of the Xen Project Security Issues Predisclosure List). Specifically, deployment on public cloud systems is NOT permitted. This is because someone may notice the feature going away, and armed with the knowledge of where the issue is, re-discover it. Deployment of the mitigation is permitted only AFTER the embargo ends. Predisclosure list members who wish to deploy significantly different patches and/or mitigations, please contact the Xen Project Security Team. (Note: this during-embargo deployment notice is retained in post-embargo publicly released Xen Project advisories, even though it is then no longer applicable. This is to enable the community to have oversight of the Xen Project Security Team's decisionmaking.) For more information about permissible uses of embargoed information, consult the Xen Project community's agreed Security Policy: http://www.xenproject.org/security-policy.html -BEGIN PGP SIGNATURE- iQFABAEBCAAqFiEEI+MiLBRfRHX6gGCng/4UyVfoK9kFAl3w3FwMHHBncEB4ZW4u b3JnAAoJEIP+FMlX6CvZt+QIAL4wU2XUXRQZFk4uS9m4EYV3tlzOidJVcAOvr4pC x9O0rCRrUTnXvaqDj/X7fqPC4e/uHy4yPgg2gnRqb4y/jXJexPBkY/fsZJ64JdWJ Fo+0a9CK8IrlzhXFcxVff49kUC3Vv/X2FMa5mY07wfg3ww2qyh9rUiKSFEX4B8vV 6lfMbFZNyOiO2vm1RnQzUCRnUeHnLXmR22BIvwLX6496qoI/ubHDBOK8NX0RU81e N1wdKlOlfmX1SuXfYzKPcdulmKLHnxiVgxG5FAsaQ5At3luA0+WEn5scoBXG99uB e6EkbmDpLabceQufMPR7Bvad3uVSzg3qLe/NvW4bd4Fvzb0= =Td+m -END PGP SIGNATURE- xsa309.meta Description: Binary data xsa309.patch Description: Binary data ___ Xen-devel mailing list Xen-devel@lists.xenproject.org
[Xen-devel] Xen Security Advisory 311 v4 (CVE-2019-19577) - Bugs in dynamic height handling for AMD IOMMU pagetables
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Xen Security Advisory CVE-2019-19577 / XSA-311 version 4 Bugs in dynamic height handling for AMD IOMMU pagetables UPDATES IN VERSION 4 Public release. Re-base 4.12 patch onto latest stable tree commits. Updated metadata to add 4.13, update StableRef's ISSUE DESCRIPTION = When running on AMD systems with an IOMMU, Xen attempted to dynamically adapt the number of levels of pagetables (the pagetable height) in the IOMMU according to the guest's address space size. The code to select and update the height had several bugs. Notably, the update was done without taking a lock which is necessary for safe operation. IMPACT == A malicious guest administrator can cause Xen to access data structures while they are being modified, causing Xen to crash. Privilege escalation is thought to be very difficult but cannot be ruled out. Additionally, there is a potential memory leak of 4kb per guest boot, under memory pressure. VULNERABLE SYSTEMS == Only Xen on AMD CPUs is vulnerable. Xen running on Intel CPUs is not vulnerable. ARM systems are not vulnerable. Only systems where guests are given direct access to physical devices are vulnerable. Systems which do not use PCI pass-through are not vulnerable. Only HVM guests can exploit the vulnerability. PV and PVH guests cannot. All versions of Xen with IOMMU support are vulnerable. MITIGATION == In some configurations, use of passthrough can be replaced with a higher-level protocol such as Xen PV block or network devices. There is no other mitigation. CREDITS === This issue was discovered by Sander Eikelenboom, along with Andrew Cooper of Citrix. RESOLUTION == Applying the appropriate (set of) attached patch(es) resolves this issue. xsa311.patch xen-unstable, Xen 4.13.x xsa311-4.12.patch Xen 4.12.x xsa311-4.11.patch Xen 4.11.x xsa311-4.10-*.patchXen 4.10.x xsa311-4.9-*.patch Xen 4.9.x xsa311-4.8-*.patch Xen 4.8.x $ sha256sum xsa311* ea929752043b5d4659cb605314887441daa33ee6450e755d6f077e57fc7abf9e xsa311.meta 732975f33b6d893b984540c4c748eb5cdf1cf81bd565e41b57795458cae3ccad xsa311.patch 27e30da9360eec850f6e7d8f2ea465d2f00a5a5a45c43042e4c18786c6c9338f xsa311-4.8-1.patch 6e2372eb18f3ca25093445a93bcdf674ed2d7d3012e8611911ea2b9ca8d58bd4 xsa311-4.8-2.patch c73bee7aa8fac02d0982b4fb21de053918f80cc0158bd5bfca68e3dc994759be xsa311-4.9-1.patch e89f5c381bd6a8fa8c5f63a829b586fdbefefe311c0f1084d2baeea3e933da66 xsa311-4.9-2.patch c73bee7aa8fac02d0982b4fb21de053918f80cc0158bd5bfca68e3dc994759be xsa311-4.10-1.patch 189a51048ad88efd855e6e78a307fff68e0c139225ce528c253558d266fffe02 xsa311-4.10-2.patch 1aaf26d1c231c8b5dd00900c00c18bf884d23b9568c9746866d92f39daf1c02f xsa311-4.11.patch 5f43fa4628f6d1a8f6f903e662226a09524b8c354e06e1a6039837db656c0218 xsa311-4.12.patch $ DEPLOYMENT DURING EMBARGO = Deployment of the patches and/or mitigations described above (or others which are substantially similar) is permitted during the embargo, even on public-facing systems with untrusted guest users and administrators. But: Distribution of updated software is prohibited (except to other members of the predisclosure list). Predisclosure list members who wish to deploy significantly different patches and/or mitigations, please contact the Xen Project Security Team. (Note: this during-embargo deployment notice is retained in post-embargo publicly released Xen Project advisories, even though it is then no longer applicable. This is to enable the community to have oversight of the Xen Project Security Team's decisionmaking.) For more information about permissible uses of embargoed information, consult the Xen Project community's agreed Security Policy: http://www.xenproject.org/security-policy.html -BEGIN PGP SIGNATURE- iQFABAEBCAAqFiEEI+MiLBRfRHX6gGCng/4UyVfoK9kFAl3w3F8MHHBncEB4ZW4u b3JnAAoJEIP+FMlX6CvZgF0IAIOtY9LMbRkBWgc16lOs+MTDOC7h4fYqofjQetFN wAJ2Q3w2QXN+Zt54L8dmc6+Zzvn9Do4AJeMvfCzFxuw2OaMBwcwI9DcEbZ+CvYsa hiXf9xKBBEfCu8PjisRnBqKuyqrLQdBSad9vXcGOVloXiFzJ1wbKnSMBNig9ZTi2 us3c9MeUTnf95W/KTQNe2Gu8KQiogzzBUUifdB6YU0MNNhL60OzfSwgautD9XHfA +NcRogDnf6KgAs6VKgHSDxyVWbvnaWvKWGF2M2QXwXHjqCH/ox87OIIgZ/HSodXB e07vCaweCG4GgWDGQN5K3+9Cu1B6+t0RYzPYmuhPDy/kWF0= =RJ0B -END PGP SIGNATURE- xsa311.meta Description: Binary data xsa311.patch Description: Binary data xsa311-4.8-1.patch Description: Binary data xsa311-4.8-2.patch Description: Binary data xsa311-4.9-1.patch Description: Binary data xsa311-4.9-2.patch Description: Binary data xsa311-4.10-1.patch Description: Binary data xsa311-4.10-2.patch Description: Binary data xsa311-4.11.patch Description: Binary data xsa311-4.12.patch Description: Binary data ___ Xen-devel mailing list Xen-devel@lists.xenproject.org
[Xen-devel] Xen Security Advisory 307 v3 (CVE-2019-19581, CVE-2019-19582) - find_next_bit() issues
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Xen Security Advisory CVE-2019-19581,CVE-2019-19582 / XSA-307 version 3 find_next_bit() issues UPDATES IN VERSION 3 Public release. Updated metadata to add 4.13, update StableRef's ISSUE DESCRIPTION = In a number of places bitmaps are being used by the hypervisor to track certain state. Iteration over all bits involves functions which may misbehave in certain corner cases: - - On 32-bit Arm accesses to bitmaps with bit a count which is a multiple of 32, an out of bounds access may occur. (CVE-2019-19581) - - On x86 accesses to bitmaps with a compile time known size of 64 may incur undefined behavior, which may in particular result in infinite loops. (CVE-2019-19582) IMPACT == A malicious guest may cause a hypervisor crash or hang, resulting in a Denial of Service (DoS). VULNERABLE SYSTEMS == All versions of Xen are vulnerable. 32-bit Arm systems are vulnerable. x86 systems with 64 or more nodes are vulnerable. We are unaware of any such systems that Xen would run on. 64-bit Arm systems as well as x86 systems with less than 64 nodes are not vulnerable. MITIGATION == There is no known mitigation for 32-bit Arm systems. For x86 systems the issue can be avoided by suppressing the use of NUMA information provided by firmware, via the "numa=off" command line option. RESOLUTION == Applying the attached patch resolves this issue. xsa307.patch xen-unstable, Xen 4.13.x ... 4.8.x $ sha256sum xsa307* e589e96a0b3ec66f1d2d6393b82fab13ed18fd9fb112044a12263336b8499c68 xsa307.meta 7df052768cc05329bc44bf724897227885da8bb2cde9ff01d0ba2a34611bde97 xsa307.patch $ DEPLOYMENT DURING EMBARGO = Deployment of the patches and/or mitigations described above (or others which are substantially similar) is permitted during the embargo, even on public-facing systems with untrusted guest users and administrators. But: Distribution of updated software is prohibited (except to other members of the predisclosure list). Predisclosure list members who wish to deploy significantly different patches and/or mitigations, please contact the Xen Project Security Team. (Note: this during-embargo deployment notice is retained in post-embargo publicly released Xen Project advisories, even though it is then no longer applicable. This is to enable the community to have oversight of the Xen Project Security Team's decisionmaking.) For more information about permissible uses of embargoed information, consult the Xen Project community's agreed Security Policy: http://www.xenproject.org/security-policy.html -BEGIN PGP SIGNATURE- iQFABAEBCAAqFiEEI+MiLBRfRHX6gGCng/4UyVfoK9kFAl3w24gMHHBncEB4ZW4u b3JnAAoJEIP+FMlX6CvZxokH/2bGTmGUZP0tyc+oDHjlrr3+FarhoJnRTl4EoqJS hzsa5OkcqzcEgrQ+7VL7dLW3AboT2zcx2RQ9HyxCz61BfDY1XF8EDDr6chJiNofN J7OGirNzSBHFFQJOc2KFG8al+1F8WzzKP3UMbqNBrqB07/tQc5lttdbA/t5Tnp9c xreCAkkBscDk1LFR8HiUA3YeykiHQtF09O+VnxXO2AD/Dpo8e+K6AmJkCZ4+ysNP JKMc13vQ3UKjMmYzgbuNCIswNu1Wy3EnNZMf2zvGIhuw6iN6vSJJgoz0OSPUb4yY kXEe1dlgseSbMxXEqj4IyZ69pEw6Ijj+H6PybQo/IOie7q0= =7XWU -END PGP SIGNATURE- xsa307.meta Description: Binary data xsa307.patch Description: Binary data ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 4/6] xen/hypercall: Cope with -ERESTART on more hypercall paths
Hi, On 09/12/2019 17:37, Andrew Cooper wrote: On 08/12/2019 12:57, Julien Grall wrote: Hi Andrew, On 05/12/2019 22:30, Andrew Cooper wrote: These hypercalls each use continue_hypercall_on_cpu(), whose API is about to switch to use -ERESTART. Update the soon-to-be affected paths to cope, folding existing contination logic where applicable. In addition: * For platform op and sysctl, insert a cpu_relax() into what is otherwise a tight spinlock loop, and make the continuation logic common at the epilogue. * Contrary to the comment in the code, kexec_exec() does return in the KEXEC_REBOOT case, needs to pass ret back to the caller. It is not entirely trivial to me that KEXEC_REBOOT refers to KEXEC_DEFAULT_TYPE. The more that if you look at the kexec_reboot() helper, it will not return (see BUG()). What may return is continue_hypercall_on_cpu(). So would it make sense to use KEXEC_DEFAULT_TYPE? I'm not sure why I capitalised it, but no - using KEXEC_DEFAULT_TYPE is worse. A casual reader is far more likely to understand kexec_reboot() in this context. But kexec_reboot() cannot return (see BUG()) so a reader may not understand why you suggest that it will return. So I think this want to be reworded. [...] @@ -816,6 +819,13 @@ ret_t do_platform_op(XEN_GUEST_HANDLE_PARAM(xen_platform_op_t) u_xenpf_op) out: spin_unlock(_lock); + if ( ret == -ERESTART ) + { + create_continuation: Shall we indent create_continuation the same way as out? They have different scopes, and while it may look weird, this is in accordance with our style. [...] @@ -1263,13 +1263,25 @@ static int do_kexec_op_internal(unsigned long op, long do_kexec_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) uarg) { - return do_kexec_op_internal(op, uarg, 0); + int ret = do_kexec_op_internal(op, uarg, 0); Shouldn't it be long (or unsigned long) here? Otherwise, the return of hypercall_create_continuation() may be truncated. If you're concerned about truncation via this pattern, then there are other areas of the code to be worred about. I knew you would mention the other places :). However, there is nothing to truncate. The return value of hypercall_create_continuation() is the primary hypercall number, i.e. __HYPERVISOR_kexec_op in this case. Make sense. And I guess the signed bit will be propagated so if do_kexec_op() return -EINVAL (32-bit), then it would still be -EINVAL (64-bit). Cheers, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v6 3/3] xen/blkback: Remove unnecessary static variable name prefixes
On Wed, Dec 11, 2019 at 04:27:33AM +, SeongJae Park wrote: > A few of static variables in blkback have 'xen_blkif_' prefix, though it > is unnecessary for static variables. This commit removes such prefixes. > > Signed-off-by: SeongJae Park Thanks. Reviewed-by: Roger Pau Monné ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v5 1/2] xenbus/backend: Add memory pressure handler callback
On Wed, 11 Dec 2019 11:51:12 +0100 "Roger Pau Monné" wrote: > > On Tue, 10 Dec 2019 11:16:35 +0100 "Roger Pau Monné" > > wrote: > > > > diff --git a/include/xen/xenbus.h b/include/xen/xenbus.h > > > > index 869c816d5f8c..cdb075e4182f 100644 > > > > --- a/include/xen/xenbus.h > > > > +++ b/include/xen/xenbus.h > > > > @@ -104,6 +104,7 @@ struct xenbus_driver { > > > > struct device_driver driver; > > > > int (*read_otherend_details)(struct xenbus_device *dev); > > > > int (*is_ready)(struct xenbus_device *dev); > > > > + unsigned (*reclaim)(struct xenbus_device *dev); > > > > > > ... hence I wonder why it's returning an unsigned when it's just > > > ignored. > > > > > > IMO it should return an int to signal errors, and the return should be > > > ignored. > > > > I first thought similarly and set the callback to return something. > > However, > > as this callback is called to simply notify the memory pressure and ask the > > driver to free its memory as many as possible, I couldn't easily imagine > > what > > kind of errors that need to be handled by its caller can occur in the > > callback, > > especially because current blkback's callback implementation has no such > > error. > > So, if you and others agree, I would like to simply set the return type to > > 'void' for now and defer the error handling to a future change. > > Yes, I also wondered the same, but seeing you returned an integer I > assumed there was interest in returning some kind of value. If there's > nothing to return let's just make it void. > > > > > > > Also, I think it would preferable for this function to take an extra > > > parameter to describe the resource the driver should attempt to free > > > (ie: memory or interrupts for example). I'm however not able to find > > > any existing Linux type to describe such resources. > > > > Yes, such extention would be the right direction. However, because there > > is no > > existing Linux type to describe the type of resources to reclaim as you also > > mentioned, there could be many different opinions about its implementation > > detail. In my opinion, it could be also possible to simply add another > > callback for another resource type. That said, because currently we have an > > use case and an implementation for the memory pressure only, I would like to > > let it as is for now and defer the extension as a future work, if you and > > others have no objection. > > Ack, can I please ask the callback to be named reclaim_memory or some > such then? Yes, I will change the name. Thanks, SeongJae Park > > Thanks, Roger. > ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v5 2/2] xen/blkback: Squeeze page pools if a memory pressure is detected
On Wed, 11 Dec 2019 12:14:44 +0100 "Roger Pau Monné" wrote: > > I see that you have already sent v6, for future iterations can you > please wait until the conversation on the previous version has been > settled? > > I'm still replying to your replies to v5, and hence you should hold off > sending v6 until we get some kind of conclusion/agreement. Sorry, I was inpatient. > > On Wed, Dec 11, 2019 at 05:08:12AM +0100, SeongJae Park wrote: > > On Tue, 10 Dec 2019 12:04:32 +0100 "Roger Pau Monné" > > wrote: > > > > > > Each `blkif` has a free pages pool for the grant mapping. The size of > > > > the pool starts from zero and be increased on demand while processing > > > > the I/O requests. If current I/O requests handling is finished or 100 > > > > milliseconds has passed since last I/O requests handling, it checks and > > > > shrinks the pool to not exceed the size limit, `max_buffer_pages`. > > > > > > > > Therefore, `blkfront` running guests can cause a memory pressure in the > > > > `blkback` running guest by attaching a large number of block devices and > > > > inducing I/O. > > > > > > Hm, I don't think this is actually true. blkfront cannot attach an > > > arbitrary number of devices, blkfront is just a frontend for a device > > > that's instantiated by the Xen toolstack, so it's the toolstack the one > > > that controls the amount of PV block devices. > > > > Right, the problem can occur only if it is mis-configured so that the > > frontend > > running guests can attach a large number of devices which is enough to cause > > the memory pressure. I tried to explain it in below paragraph, but seems > > above > > paragraph is a little bit confusing. I will wordsmith the sentence in the > > next > > version. > > I would word it along these lines: > > "Host administrators can cause memory pressure in blkback by attaching > a large number of block devices and inducing I/O." Hmm, much better :) > > > > > > > > System administrators can avoid such problematic > > > > situations by limiting the maximum number of devices each guest can > > > > attach. However, finding the optimal limit is not so easy. Improper > > > > set of the limit can results in the memory pressure or a resource > > > > underutilization. This commit avoids such problematic situations by > > > > squeezing the pools (returns every free page in the pool to the system) > > > > for a while (users can set this duration via a module parameter) if a > > > > memory pressure is detected. > > > > > > > > Discussions > > > > === > > > > > > > > The `blkback`'s original shrinking mechanism returns only pages in the > > > > pool, which are not currently be used by `blkback`, to the system. In > > > > other words, the pages are not mapped with foreign pages. Because this > > > ^ that ^ granted > > > > commit is changing only the shrink limit but uses the mechanism as is, > > > > this commit does not introduce improper mappings related security > > > > issues. > > > > > > That last sentence is hard to parse. I think something like: > > > > > > "Because this commit is changing only the shrink limit but still uses the > > > same freeing mechanism it does not touch pages which are currently > > > mapping grants." > > > > > > > > > > > Once a memory pressure is detected, this commit keeps the squeezing > > > > limit for a user-specified time duration. The duration should be > > > > neither too long nor too short. If it is too long, the squeezing > > > > incurring overhead can reduce the I/O performance. If it is too short, > > > > `blkback` will not free enough pages to reduce the memory pressure. > > > > This commit sets the value as `10 milliseconds` by default because it is > > > > a short time in terms of I/O while it is a long time in terms of memory > > > > operations. Also, as the original shrinking mechanism works for at > > > > least every 100 milliseconds, this could be a somewhat reasonable > > > > choice. I also tested other durations (refer to the below section for > > > > more details) and confirmed that 10 milliseconds is the one that works > > > > best with the test. That said, the proper duration depends on actual > > > > configurations and workloads. That's why this commit is allowing users > > > ^ allows > > > > to set it as their optimal value via the module parameter. > > > > > > ... to set the duration as a module parameter. > > > > Thank you for great suggestions, I will apply those. > > > > > > > > > > > > > Memory Pressure Test > > > > > > > > > > > > To show how this commit fixes the memory pressure situation well, I > > > > configured a test environment on a xen-running virtualization system. > > > > On the `blkfront` running guest instances, I attach a large number of > > > > network-backed volume devices and induce I/O to those. Meanwhile, I > > > > measure the
[Xen-devel] [ovmf test] 144706: regressions - FAIL
flight 144706 ovmf real [real] http://logs.test-lab.xenproject.org/osstest/logs/144706/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: build-amd64 6 xen-buildfail REGR. vs. 144637 build-i386-xsm6 xen-buildfail REGR. vs. 144637 build-amd64-xsm 6 xen-buildfail REGR. vs. 144637 build-i3866 xen-buildfail REGR. vs. 144637 Tests which did not succeed, but are not blocking: test-amd64-i386-xl-qemuu-ovmf-amd64 1 build-check(1) blocked n/a build-i386-libvirt1 build-check(1) blocked n/a build-amd64-libvirt 1 build-check(1) blocked n/a test-amd64-amd64-xl-qemuu-ovmf-amd64 1 build-check(1) blocked n/a version targeted for testing: ovmf d3add11e87dace180387562d6f1951f2bffbd3d9 baseline version: ovmf 804666c86e7b6f04fe5c5cfdb13199c19e0e99b0 Last test of basis 144637 2019-12-09 09:09:49 Z2 days Failing since144646 2019-12-10 01:39:53 Z1 days8 attempts Testing same since 144703 2019-12-11 10:09:05 Z0 days2 attempts People who touched revisions under test: Antoine Coeur Bob Feng Jiewen Yao Michael Kubacki Philippe Mathieu-Daude Steven Shi jobs: build-amd64-xsm fail build-i386-xsm fail build-amd64 fail build-i386 fail build-amd64-libvirt blocked build-i386-libvirt blocked build-amd64-pvopspass build-i386-pvops pass test-amd64-amd64-xl-qemuu-ovmf-amd64 blocked test-amd64-i386-xl-qemuu-ovmf-amd64 blocked sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Not pushing. commit d3add11e87dace180387562d6f1951f2bffbd3d9 Author: Michael Kubacki Date: Wed Nov 20 17:31:24 2019 -0800 MdeModulePkg PeiCore: Improve comment semantics This patch clarifies wording in several PeiCore comments to improve reading comprehension. Cc: Dandan Bi Cc: Liming Gao Cc: Jian J Wang Cc: Hao A Wu Signed-off-by: Michael Kubacki Reviewed-by: Liming Gao Reviewed-by: Jian J Wang commit d39d1260c615b716675f67f5c4e1f4f52df01dad Author: Michael Kubacki Date: Wed Nov 20 17:10:48 2019 -0800 MdeModulePkg PeiCore: Fix typos Cc: Dandan Bi Cc: Liming Gao Cc: Jian J Wang Cc: Hao A Wu Signed-off-by: Michael Kubacki Reviewed-by: Liming Gao Reviewed-by: Philippe Mathieu-Daude Reviewed-by: Jian J Wang commit 97eedf5dfbaffde33210fd88066247cf0b7d3325 Author: Antoine Coeur Date: Wed Dec 4 12:14:53 2019 +0800 IntelFsp2WrapperPkg: Fix various typos Fix various typos in comments and documentation. Cc: Chasel Chiu Cc: Nate DeSimone Cc: Star Zeng Reviewed-by: Philippe Mathieu-Daude Signed-off-by: Philippe Mathieu-Daude Reviewed-by: Nate DeSimone Reviewed-by: Chasel Chiu Reviewed-by: Star Zeng commit 7e55cf6b48dcd43de46d008b2f12caaad2554503 Author: Jiewen Yao Date: Sat Dec 7 21:41:10 2019 +0800 SecurityPkg/Tcg2Smm: Measure the table before patch. REF: https://bugzilla.tianocore.org/show_bug.cgi?id=1940 According to TCG PFP specification: the ACPI table must be measured prior to any modification, and the measurement must be same cross every boot cycle. There is a fix 3a63c17ebc853cbb27d190729d01e27f68e65b94 for the HID data. However that is not enough. The LAML/LASA and PCD configuration change may also cause similar problem. We need measure the table before any update. Cc: Jian J Wang Cc: Chao Zhang Signed-off-by: Jiewen Yao Reviewed-by: Chao Zhang commit a80032dc44a1071a34f4415a7c5cef5170ee6159 Author: Steven Shi Date: Tue Nov 19 16:22:08 2019 +0800 BaseTools: Remove
Re: [Xen-devel] [PATCH v6 1/3] xenbus/backend: Add memory pressure handler callback
On Wed, Dec 11, 2019 at 04:26:57AM +, SeongJae Park wrote: > Granting pages consumes backend system memory. In systems configured > with insufficient spare memory for those pages, it can cause a memory > pressure situation. However, finding the optimal amount of the spare ^ s/the// > memory is challenging for large systems having dynamic resource > utilization patterns. Also, such a static configuration might lack > flexibility. > > To mitigate such problems, this commit adds a memory reclaim callback to > 'xenbus_driver'. If a memory pressure is detected, 'xenbus' requests ^ s/a// > every backend driver to volunarily release its memory. > > Note that it would be able to improve the callback facility for more ^ possible > sophisticated handlings of general pressures. For example, it would be ^ handling of resource starvation. > possible to monitor the memory consumption of each device and issue the > release requests to only devices which causing the pressure. Also, the > callback could be extended to handle not only memory, but general > resources. Nevertheless, this version of the implementation defers such > sophisticated goals as a future work. > > Reviewed-by: Juergen Gross > Signed-off-by: SeongJae Park > --- > drivers/xen/xenbus/xenbus_probe_backend.c | 32 +++ > include/xen/xenbus.h | 1 + > 2 files changed, 33 insertions(+) > > diff --git a/drivers/xen/xenbus/xenbus_probe_backend.c > b/drivers/xen/xenbus/xenbus_probe_backend.c > index b0bed4faf44c..aedbe2198de5 100644 > --- a/drivers/xen/xenbus/xenbus_probe_backend.c > +++ b/drivers/xen/xenbus/xenbus_probe_backend.c > @@ -248,6 +248,35 @@ static int backend_probe_and_watch(struct notifier_block > *notifier, > return NOTIFY_DONE; > } > > +static int xenbus_backend_reclaim(struct device *dev, void *data) No need for the xenbus_ prefix since it's a static function, ie: backend_reclaim_memory should be fine IMO. > +{ > + struct xenbus_driver *drv; I've asked for this variable to be constified in v5, is it not possible to make it const? > + > + if (!dev->driver) > + return 0; > + drv = to_xenbus_driver(dev->driver); > + if (drv && drv->reclaim) > + drv->reclaim(to_xenbus_device(dev)); > + return 0; > +} > + > +/* > + * Returns 0 always because we are using shrinker to only detect memory > + * pressure. > + */ > +static unsigned long xenbus_backend_shrink_count(struct shrinker *shrinker, > + struct shrink_control *sc) > +{ > + bus_for_each_dev(_backend.bus, NULL, NULL, > + xenbus_backend_reclaim); > + return 0; > +} > + > +static struct shrinker xenbus_backend_shrinker = { I would drop the xenbus prefix, and I think it's not possible to constify this due to register_shrinker expecting a non-const parameter? > + .count_objects = xenbus_backend_shrink_count, > + .seeks = DEFAULT_SEEKS, > +}; > + > static int __init xenbus_probe_backend_init(void) > { > static struct notifier_block xenstore_notifier = { > @@ -264,6 +293,9 @@ static int __init xenbus_probe_backend_init(void) > > register_xenstore_notifier(_notifier); > > + if (register_shrinker(_backend_shrinker)) > + pr_warn("shrinker registration failed\n"); Can you add a xenbus prefix to the error message? Or else it's hard to know which subsystem is complaining when you see such message on the log. ie: "xenbus: shrinker ..." > + > return 0; > } > subsys_initcall(xenbus_probe_backend_init); > diff --git a/include/xen/xenbus.h b/include/xen/xenbus.h > index 869c816d5f8c..196260017666 100644 > --- a/include/xen/xenbus.h > +++ b/include/xen/xenbus.h > @@ -104,6 +104,7 @@ struct xenbus_driver { > struct device_driver driver; > int (*read_otherend_details)(struct xenbus_device *dev); > int (*is_ready)(struct xenbus_device *dev); > + void (*reclaim)(struct xenbus_device *dev); reclaim_memory (if Juergen agrees). Thanks, Roger. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2 1/2] x86/mm: factor out the code for shattering an l3 PTE
Hi Hongyan, On 11/12/2019 11:28, Xia, Hongyan wrote: On Wed, 2019-12-11 at 11:10 +, Julien Grall wrote: Hi Hongyan, ... While this involves more instructions, how often do we expect the code to be called? Cheers, I don't expect this to be called very often in the current Xen. Although with direct map removal, a lot of the memory allocations (mostly xenheap allocations) will be mapped and unmapped on-demand and there is a much higher change of merging/shattering. Thank you for the explanation. In order to merge/shatter, you need the buddy allocator to ensure all the xenheap are allocated contiguously. I am pretty unconvinced there are a lot of page allocated via xenheap. So the chance to have contiguous xenheap allocation is very limited. But the merging/shattering can be counterproductive. An example short memory allocation (we have a few places like that): xmalloc(...); do something xfree(...); We would end up to merge and then a few ms later shatter again. So it feels to me, that merging is probably not worth it (I am planning to discuss with Andrew today about it). However, the series moved all PTEs from xenheap to domheap, and we might see other things moved to domheap in the future, so we might not have many things left on xenheap anyway. Typesafe is an important part of making our code base more secure than basic C (such as not mixing type). In this case, I think if we start to merge/shatter a lot, then we have a bigger problem and we may want to consider to remove it (see above) So it feels the optimization is not worth it. Note that I am not maintaining this code, so the final call is on Andrew and Jan. Cheers, -- ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH for-next 2/7] x86: fix up hyperv-tlfs.h
On Tue, Dec 10, 2019 at 04:35:41PM +0100, Jan Beulich wrote: > On 25.10.2019 11:16, Wei Liu wrote: > > Do the following: > > 1. include xen/types.h and xen/bitops.h > > 2. fix up invocations of BIT macro > > Is it truly BIT(..., UL) in _all_ cases, and not BIT(..., U) in some? In Linux BIT is #define BIT(nr) (UL(1) << nr) so yes Linux developers did mean UL everywhere. I haven't checked them one by one per TLFS's definitions though. Wei. > > > Signed-off-by: Wei Liu > > --- > > This can be squashed into previous patch if preferred. > > Afaic - yes please. > > Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [xen-unstable test] 144686: regressions - FAIL
flight 144686 xen-unstable real [real] http://logs.test-lab.xenproject.org/osstest/logs/144686/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-i386-xl-qemuu-dmrestrict-amd64-dmrestrict 12 guest-start/debianhvm.repeat fail REGR. vs. 144668 Regressions which are regarded as allowable (not blocking): test-amd64-amd64-xl-rtds 16 guest-localmigrate fail REGR. vs. 144668 test-armhf-armhf-xl-rtds16 guest-start/debian.repeat fail REGR. vs. 144668 Tests which did not succeed, but are not blocking: test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stopfail like 144668 test-armhf-armhf-libvirt 14 saverestore-support-checkfail like 144668 test-amd64-amd64-xl-qemut-win7-amd64 17 guest-stopfail like 144668 test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop fail like 144668 test-amd64-i386-xl-qemut-win7-amd64 17 guest-stop fail like 144668 test-armhf-armhf-libvirt-raw 13 saverestore-support-checkfail like 144668 test-amd64-amd64-xl-qemuu-ws16-amd64 17 guest-stopfail like 144668 test-amd64-amd64-xl-qemut-ws16-amd64 17 guest-stopfail like 144668 test-amd64-i386-xl-qemuu-ws16-amd64 17 guest-stop fail like 144668 test-amd64-i386-xl-pvshim12 guest-start fail never pass test-amd64-amd64-libvirt 13 migrate-support-checkfail never pass test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail never pass test-amd64-i386-libvirt-xsm 13 migrate-support-checkfail never pass test-amd64-i386-libvirt 13 migrate-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-arm64-arm64-xl-credit2 13 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 14 saverestore-support-checkfail never pass test-arm64-arm64-xl-xsm 13 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 14 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit1 13 migrate-support-checkfail never pass test-arm64-arm64-xl-credit1 14 saverestore-support-checkfail never pass test-arm64-arm64-xl 13 migrate-support-checkfail never pass test-arm64-arm64-xl 14 saverestore-support-checkfail never pass test-arm64-arm64-libvirt-xsm 13 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 14 saverestore-support-checkfail never pass test-arm64-arm64-xl-thunderx 13 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-arndale 13 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 14 saverestore-support-checkfail never pass test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2 fail never pass test-amd64-amd64-libvirt-vhd 12 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 13 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 14 saverestore-support-checkfail never pass test-armhf-armhf-xl 13 migrate-support-checkfail never pass test-armhf-armhf-xl 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit1 13 migrate-support-checkfail never pass test-armhf-armhf-xl-credit1 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-cubietruck 13 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 14 saverestore-support-checkfail never pass test-armhf-armhf-libvirt 13 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 13 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit2 13 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 14 saverestore-support-checkfail never pass test-arm64-arm64-xl-seattle 13 migrate-support-checkfail never pass test-arm64-arm64-xl-seattle 14 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-raw 12 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 12 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 13 saverestore-support-checkfail never pass test-amd64-i386-xl-qemut-ws16-amd64 17 guest-stop fail never pass version targeted for testing: xen 272c18435e93cbf749c096a9552ab5ef0d79a4ca baseline version: xen 4935a5433db28077fe6313f920bbedcd54516cec Last test of basis 144668 2019-12-10 15:37:57 Z0 days Testing same since 144686 2019-12-11 01:51:21 Z0 days1
Re: [Xen-devel] [PATCH for-next 5/7] x86: use running_on_hypervisor to gate hypervisor_setup
On Tue, Dec 10, 2019 at 05:17:28PM +0100, Jan Beulich wrote: > On 25.10.2019 11:16, Wei Liu wrote: > > --- a/xen/arch/x86/setup.c > > +++ b/xen/arch/x86/setup.c > > @@ -1577,7 +1577,7 @@ void __init noreturn __start_xen(unsigned long mbi_p) > > max_cpus = nr_cpu_ids; > > } > > > > -if ( xen_guest ) > > +if ( running_on_hypervisor ) > > hypervisor_setup(); > > This code is using hypervisor_name already, so I guess the patch > has become unnecessary? Yes. I basically squashed this patch into my previous series while reworking that. This patch here is not needed anymore. Wei. > > Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] xen-blkback: prevent premature module unload
On Tue, Dec 10, 2019 at 02:53:05PM +, Paul Durrant wrote: > Objects allocated by xen_blkif_alloc come from the 'blkif_cache' kmem > cache. This cache is destoyed when xen-blkif is unloaded so it is > necessary to wait for the deferred free routine used for such objects to > complete. This necessity was missed in commit 14855954f636 "xen-blkback: > allow module to be cleanly unloaded". This patch fixes the problem by > taking/releasing extra module references in xen_blkif_alloc/free() > respectively. > > Signed-off-by: Paul Durrant Reviewed-by: Roger Pau Monné One nit below. > --- > Cc: Konrad Rzeszutek Wilk > Cc: "Roger Pau Monné" > Cc: Jens Axboe > --- > drivers/block/xen-blkback/xenbus.c | 10 ++ > 1 file changed, 10 insertions(+) > > diff --git a/drivers/block/xen-blkback/xenbus.c > b/drivers/block/xen-blkback/xenbus.c > index e8c5c54e1d26..59d576d27ca7 100644 > --- a/drivers/block/xen-blkback/xenbus.c > +++ b/drivers/block/xen-blkback/xenbus.c > @@ -171,6 +171,15 @@ static struct xen_blkif *xen_blkif_alloc(domid_t domid) > blkif->domid = domid; > atomic_set(>refcnt, 1); > init_completion(>drain_complete); > + > + /* > + * Because freeing back to the cache may be deferred, it is not > + * safe to unload the module (and hence destroy the cache) until > + * this has completed. To prevent premature unloading, take an > + * extra module reference here and release only when the object > + * has been free back to the cache. ^ freed > + */ > + __module_get(THIS_MODULE); > INIT_WORK(>free_work, xen_blkif_deferred_free); > > return blkif; > @@ -320,6 +329,7 @@ static void xen_blkif_free(struct xen_blkif *blkif) > > /* Make sure everything is drained before shutting down */ > kmem_cache_free(xen_blkif_cachep, blkif); > + module_put(THIS_MODULE); > } > > int __init xen_blkif_interface_init(void) > -- > 2.20.1 > ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH for-next 1/7] x86: import hyperv-tlfs.h from Linux
On Wed, Dec 11, 2019 at 11:22:49AM +, Durrant, Paul wrote: > > -Original Message- > > From: Wei Liu > > Sent: 11 December 2019 11:15 > > To: Jan Beulich > > Cc: Durrant, Paul ; Wei Liu ; Wei Liu > > ; Paul Durrant ; Andrew Cooper > > ; Michael Kelley ; Xen > > Development List ; Roger Pau Monné > > > > Subject: Re: [PATCH for-next 1/7] x86: import hyperv-tlfs.h from Linux > > > > On Tue, Dec 10, 2019 at 04:43:30PM +0100, Jan Beulich wrote: > > > On 10.12.2019 16:37, Durrant, Paul wrote: > > > >> -Original Message- > > > >> From: Xen-devel On Behalf Of > > Jan > > > >> Beulich > > > >> Sent: 10 December 2019 15:34 > > > >> To: Wei Liu > > > >> Cc: Wei Liu ; Paul Durrant ; > > Andrew > > > >> Cooper ; Michael Kelley > > > >> ; Xen Development List > > >> de...@lists.xenproject.org>; Roger Pau Monné > > > >> Subject: Re: [Xen-devel] [PATCH for-next 1/7] x86: import hyperv- > > tlfs.h > > > >> from Linux > > > >> > > > >> On 25.10.2019 11:16, Wei Liu wrote: > > > >>> Taken from Linux commit b2d8b167e15bb5ec2691d1119c025630a247f649. > > > >>> > > > >>> This is a pristine copy from Linux. It is not used yet and probably > > > >>> doesn't compile. Changes to make it work will come later. > > > >>> > > > >>> Signed-off-by: Wei Liu > > > >> > > > >> This coming from Linux and assuming at least a fair part of it is > > > >> going to be used, in principle > > > >> Acked-by: Jan Beulich > > > >> > > > >> However, there are many seemingly unnecessary uses of __packed > > > >> here, which I'd rather not see go in at all (i.e. not be dropped > > > >> later on, and then potentially missing some). I find ... > > > >> > > > >>> +typedef struct _HV_REFERENCE_TSC_PAGE { > > > >>> + __u32 tsc_sequence; > > > >>> + __u32 res1; > > > >>> + __u64 tsc_scale; > > > >>> + __s64 tsc_offset; > > > >>> +} __packed HV_REFERENCE_TSC_PAGE, *PHV_REFERENCE_TSC_PAGE; > > > >> > > > > > > > > You realise there's a definition of this in the viridian code already, > > right? > > > > > > It looked familiar, but it didn't occur to me to point this out. > > > Yes, there looks to be room for deduplication... > > > > > > > I had a plan to make viridian code use this copy directly. > > > > I have no objection to that, but I think it ought to be done as part of this > series so that we don't end up with long-term duplication. Sure. That shouldn't be too hard. Wei. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2 1/2] x86/mm: factor out the code for shattering an l3 PTE
On Wed, 2019-12-11 at 11:10 +, Julien Grall wrote: > Hi Hongyan, > ... > > While this involves more instructions, how often do we expect the > code > to be called? > > Cheers, > I don't expect this to be called very often in the current Xen. Although with direct map removal, a lot of the memory allocations (mostly xenheap allocations) will be mapped and unmapped on-demand and there is a much higher change of merging/shattering. However, the series moved all PTEs from xenheap to domheap, and we might see other things moved to domheap in the future, so we might not have many things left on xenheap anyway. Hongyan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH for-next 1/7] x86: import hyperv-tlfs.h from Linux
> -Original Message- > From: Wei Liu > Sent: 11 December 2019 11:15 > To: Jan Beulich > Cc: Durrant, Paul ; Wei Liu ; Wei Liu > ; Paul Durrant ; Andrew Cooper > ; Michael Kelley ; Xen > Development List ; Roger Pau Monné > > Subject: Re: [PATCH for-next 1/7] x86: import hyperv-tlfs.h from Linux > > On Tue, Dec 10, 2019 at 04:43:30PM +0100, Jan Beulich wrote: > > On 10.12.2019 16:37, Durrant, Paul wrote: > > >> -Original Message- > > >> From: Xen-devel On Behalf Of > Jan > > >> Beulich > > >> Sent: 10 December 2019 15:34 > > >> To: Wei Liu > > >> Cc: Wei Liu ; Paul Durrant ; > Andrew > > >> Cooper ; Michael Kelley > > >> ; Xen Development List > >> de...@lists.xenproject.org>; Roger Pau Monné > > >> Subject: Re: [Xen-devel] [PATCH for-next 1/7] x86: import hyperv- > tlfs.h > > >> from Linux > > >> > > >> On 25.10.2019 11:16, Wei Liu wrote: > > >>> Taken from Linux commit b2d8b167e15bb5ec2691d1119c025630a247f649. > > >>> > > >>> This is a pristine copy from Linux. It is not used yet and probably > > >>> doesn't compile. Changes to make it work will come later. > > >>> > > >>> Signed-off-by: Wei Liu > > >> > > >> This coming from Linux and assuming at least a fair part of it is > > >> going to be used, in principle > > >> Acked-by: Jan Beulich > > >> > > >> However, there are many seemingly unnecessary uses of __packed > > >> here, which I'd rather not see go in at all (i.e. not be dropped > > >> later on, and then potentially missing some). I find ... > > >> > > >>> +typedef struct _HV_REFERENCE_TSC_PAGE { > > >>> + __u32 tsc_sequence; > > >>> + __u32 res1; > > >>> + __u64 tsc_scale; > > >>> + __s64 tsc_offset; > > >>> +} __packed HV_REFERENCE_TSC_PAGE, *PHV_REFERENCE_TSC_PAGE; > > >> > > > > > > You realise there's a definition of this in the viridian code already, > right? > > > > It looked familiar, but it didn't occur to me to point this out. > > Yes, there looks to be room for deduplication... > > > > I had a plan to make viridian code use this copy directly. > I have no objection to that, but I think it ought to be done as part of this series so that we don't end up with long-term duplication. Paul > > > Actually, Wei, one more thing I was curious about - what is "tlfs" > > an acronym of? > > It means "Top-Level Function Specification". > > (I wish Xen had something similar) > > Wei. > > > > > Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v5 2/2] xen/blkback: Squeeze page pools if a memory pressure is detected
Hello, I see that you have already sent v6, for future iterations can you please wait until the conversation on the previous version has been settled? I'm still replying to your replies to v5, and hence you should hold off sending v6 until we get some kind of conclusion/agreement. On Wed, Dec 11, 2019 at 05:08:12AM +0100, SeongJae Park wrote: > On Tue, 10 Dec 2019 12:04:32 +0100 "Roger Pau Monné" > wrote: > > > > Each `blkif` has a free pages pool for the grant mapping. The size of > > > the pool starts from zero and be increased on demand while processing > > > the I/O requests. If current I/O requests handling is finished or 100 > > > milliseconds has passed since last I/O requests handling, it checks and > > > shrinks the pool to not exceed the size limit, `max_buffer_pages`. > > > > > > Therefore, `blkfront` running guests can cause a memory pressure in the > > > `blkback` running guest by attaching a large number of block devices and > > > inducing I/O. > > > > Hm, I don't think this is actually true. blkfront cannot attach an > > arbitrary number of devices, blkfront is just a frontend for a device > > that's instantiated by the Xen toolstack, so it's the toolstack the one > > that controls the amount of PV block devices. > > Right, the problem can occur only if it is mis-configured so that the frontend > running guests can attach a large number of devices which is enough to cause > the memory pressure. I tried to explain it in below paragraph, but seems > above > paragraph is a little bit confusing. I will wordsmith the sentence in the > next > version. I would word it along these lines: "Host administrators can cause memory pressure in blkback by attaching a large number of block devices and inducing I/O." > > > > > System administrators can avoid such problematic > > > situations by limiting the maximum number of devices each guest can > > > attach. However, finding the optimal limit is not so easy. Improper > > > set of the limit can results in the memory pressure or a resource > > > underutilization. This commit avoids such problematic situations by > > > squeezing the pools (returns every free page in the pool to the system) > > > for a while (users can set this duration via a module parameter) if a > > > memory pressure is detected. > > > > > > Discussions > > > === > > > > > > The `blkback`'s original shrinking mechanism returns only pages in the > > > pool, which are not currently be used by `blkback`, to the system. In > > > other words, the pages are not mapped with foreign pages. Because this > > ^ that ^ granted > > > commit is changing only the shrink limit but uses the mechanism as is, > > > this commit does not introduce improper mappings related security > > > issues. > > > > That last sentence is hard to parse. I think something like: > > > > "Because this commit is changing only the shrink limit but still uses the > > same freeing mechanism it does not touch pages which are currently > > mapping grants." > > > > > > > > Once a memory pressure is detected, this commit keeps the squeezing > > > limit for a user-specified time duration. The duration should be > > > neither too long nor too short. If it is too long, the squeezing > > > incurring overhead can reduce the I/O performance. If it is too short, > > > `blkback` will not free enough pages to reduce the memory pressure. > > > This commit sets the value as `10 milliseconds` by default because it is > > > a short time in terms of I/O while it is a long time in terms of memory > > > operations. Also, as the original shrinking mechanism works for at > > > least every 100 milliseconds, this could be a somewhat reasonable > > > choice. I also tested other durations (refer to the below section for > > > more details) and confirmed that 10 milliseconds is the one that works > > > best with the test. That said, the proper duration depends on actual > > > configurations and workloads. That's why this commit is allowing users > > ^ allows > > > to set it as their optimal value via the module parameter. > > > > ... to set the duration as a module parameter. > > Thank you for great suggestions, I will apply those. > > > > > > > > > Memory Pressure Test > > > > > > > > > To show how this commit fixes the memory pressure situation well, I > > > configured a test environment on a xen-running virtualization system. > > > On the `blkfront` running guest instances, I attach a large number of > > > network-backed volume devices and induce I/O to those. Meanwhile, I > > > measure the number of pages that swapped in and out on the `blkback` > > > running guest. The test ran twice, once for the `blkback` before this > > > commit and once for that after this commit. As shown below, this commit > > > has dramatically reduced the memory pressure: > > > > > > pswpin
Re: [Xen-devel] [PATCH for-next 1/7] x86: import hyperv-tlfs.h from Linux
On Tue, Dec 10, 2019 at 04:43:30PM +0100, Jan Beulich wrote: > On 10.12.2019 16:37, Durrant, Paul wrote: > >> -Original Message- > >> From: Xen-devel On Behalf Of Jan > >> Beulich > >> Sent: 10 December 2019 15:34 > >> To: Wei Liu > >> Cc: Wei Liu ; Paul Durrant ; Andrew > >> Cooper ; Michael Kelley > >> ; Xen Development List >> de...@lists.xenproject.org>; Roger Pau Monné > >> Subject: Re: [Xen-devel] [PATCH for-next 1/7] x86: import hyperv-tlfs.h > >> from Linux > >> > >> On 25.10.2019 11:16, Wei Liu wrote: > >>> Taken from Linux commit b2d8b167e15bb5ec2691d1119c025630a247f649. > >>> > >>> This is a pristine copy from Linux. It is not used yet and probably > >>> doesn't compile. Changes to make it work will come later. > >>> > >>> Signed-off-by: Wei Liu > >> > >> This coming from Linux and assuming at least a fair part of it is > >> going to be used, in principle > >> Acked-by: Jan Beulich > >> > >> However, there are many seemingly unnecessary uses of __packed > >> here, which I'd rather not see go in at all (i.e. not be dropped > >> later on, and then potentially missing some). I find ... > >> > >>> +typedef struct _HV_REFERENCE_TSC_PAGE { > >>> + __u32 tsc_sequence; > >>> + __u32 res1; > >>> + __u64 tsc_scale; > >>> + __s64 tsc_offset; > >>> +} __packed HV_REFERENCE_TSC_PAGE, *PHV_REFERENCE_TSC_PAGE; > >> > > > > You realise there's a definition of this in the viridian code already, > > right? > > It looked familiar, but it didn't occur to me to point this out. > Yes, there looks to be room for deduplication... > I had a plan to make viridian code use this copy directly. > Actually, Wei, one more thing I was curious about - what is "tlfs" > an acronym of? It means "Top-Level Function Specification". (I wish Xen had something similar) Wei. > > Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2 1/2] x86/mm: factor out the code for shattering an l3 PTE
Hi Hongyan, On 11/12/2019 10:28, Xia, Hongyan wrote: On Tue, 2019-12-10 at 16:20 +0100, Jan Beulich wrote: ol2e = l2e_from_intpte( l2e_get_intpte(ol2e) + (PAGE_SIZE << PAGETABLE_ORDER)); Of course, as mentioned before, I'm not overly happy to see type safety lost in case like this one, where it's not needed like e.g. further up to convert from L3 to L2 entry. Okay, so I did a comparison between the efficiency of the assembly under a release build. The old "type-safe" way requires 16 instructions to prepare the first l2e, and each iteration of the inner loop of populating l2t requires 7 instructions. The new type-unsafe way requires 6 to prepare the first l2e, and each iteration of populating l2t takes 5 instructions. So the difference of populating l2t is 3600 vs. 2566 instructions, which is not very small. While this involves more instructions, how often do we expect the code to be called? Cheers, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2 1/2] x86/mm: factor out the code for shattering an l3 PTE
On Tue, 2019-12-10 at 16:20 +0100, Jan Beulich wrote: > On 09.12.2019 12:48, Hongyan Xia wrote: > > --- a/xen/arch/x86/mm.c > > +++ b/xen/arch/x86/mm.c > > @@ -5151,6 +5151,51 @@ l1_pgentry_t *virt_to_xen_l1e(unsigned long > > v) > > flush_area_local((const void *)v, f) : \ > > flush_area_all((const void *)v, f)) > > > > +/* Shatter an l3 entry and populate l2. If virt is passed in, also > > do flush. */ > > +static int shatter_l3e(l3_pgentry_t *pl3e, unsigned long virt, > > bool locking) > > +{ > > +unsigned int i; > > +l3_pgentry_t ol3e; > > +l2_pgentry_t ol2e, *l2t = alloc_xen_pagetable(); > > + > > +if ( l2t == NULL ) > > Nowadays we seem to prefer !l2t in cases like this one. > > > +return -1; > > -ENOMEM please (and then handed on by the caller). > > > +ol3e = *pl3e; > > This could be the variable's initializer. > > > +ol2e = l2e_from_intpte(l3e_get_intpte(ol3e)); > > There's nothing "old" about this L2 entry, so its name would better > be just "l2e" I think. > > > +for ( i = 0; i < L2_PAGETABLE_ENTRIES; i++ ) > > +{ > > +l2e_write(l2t + i, ol2e); > > +ol2e = l2e_from_intpte( > > +l2e_get_intpte(ol2e) + (1 << (PAGETABLE_ORDER + > > PAGE_SHIFT))); > > Indentation looks odd here (also further down). If the first argument > of a function call doesn't fit on the line and would also be ugly to > split across lines, what we do is indent it the usual 4 characters > from the function invocation, i.e. in this case > > ol2e = l2e_from_intpte( >l2e_get_intpte(ol2e) + (1 << (PAGETABLE_ORDER + > PAGE_SHIFT))); > > and then slightly shorter > > ol2e = l2e_from_intpte( >l2e_get_intpte(ol2e) + (PAGE_SIZE << > PAGETABLE_ORDER)); > > Of course, as mentioned before, I'm not overly happy to see type > safety lost in case like this one, where it's not needed like e.g. > further up to convert from L3 to L2 entry. > > > +} > > +if ( locking ) > > +spin_lock(_pgdir_lock); > > +if ( (l3e_get_flags(*pl3e) & _PAGE_PRESENT) && > > + (l3e_get_flags(*pl3e) & _PAGE_PSE) ) > > +{ > > +l3e_write_atomic(pl3e, > > +l3e_from_paddr((paddr_t)virt_to_maddr(l2t), > > __PAGE_HYPERVISOR)); > > +l2t = NULL; > > +} > > +if ( locking ) > > +spin_unlock(_pgdir_lock); > > +if ( virt ) > > +{ > > +unsigned int flush_flags = > > +FLUSH_TLB | FLUSH_ORDER(2 * PAGETABLE_ORDER); > > + > > +if ( (l3e_get_flags(ol3e) & _PAGE_GLOBAL) ) > > Unnecessary pair of parentheses (which also wasn't there in the > original code). > > > +flush_flags |= FLUSH_TLB_GLOBAL; > > Too deep indentation. > > > +flush_area(virt, flush_flags); > > +} > > +if ( l2t ) > > +free_xen_pagetable(l2t); > > + > > +return 0; > > +} > > Also please add blank lines between > - L2 population and lock acquire, > - lock release and TLB flush, > - TLB flush and free. > > Jan Issues fixed in v3. I have not touched the type safety part. If we think this is really important we can revert to what it was before, although from the quick study I did in my previous email, there is a performance difference. Hongyan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Setting up a monthly Xen Project dinner/pub-meeting
And this time with the correct list. Really despairing with Outlook which keeps adding random old contacts to my address book and puts them in front of frequently used entries (sigh) From: Lars Kurth Date: Wednesday, 11 December 2019 at 10:31 To: "xen-de...@lists.xensource.com" , "mirageos-de...@lists.xenproject.org" , "win-pv-de...@lists.xenproject.org" , "minios-de...@lists.xenproject.org" Subject: Setting up a monthly Xen Project dinner/pub-meeting Hi all, with quite a few people working on Xen Project across different companies and organisations these days, I was wondering whether we should set up a regular monthly get-together. I would like to get a sense as to 1. Who would be willing to turn up – need to get a sense of numbers, because we need to see whether it is necessary to book a table 2. What day would be best and what week of the month 3. Whether we would always choose the same venue – which I guess partly depends on the answer to a). If the core group attending is larger than 8 people, we probably need to book a table, which is easier if we choose the same venue If the answer to c) is NO, we should probably have a local coordinator and/or establish what venues we do like to go through well in advance Feel free to voice your opinion Regards Lars ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v3 0/2] Refactor super page shattering
map_pages_to_xen and modify_xen_mappings use almost exactly the same page shattering logic, and the code is mingled with other PTE manipulations so it is less obvious that the intention is page shattering. Factor out the functions to make them reusable and to make the intention more obvious. Of course, there is not much difference between the shattering logic of each level, so we could further turn the per-level functions into a single macro, although this is not that simple since we have per-level functions and macros all over the place and there are slight differences between levels. Keep it per-level for now. tree: https://xenbits.xen.org/git-http/people/hx242/xen.git shatter-refactor --- Changes in v3: - style and indentation fixes. Changes in v2: - rebase. - improve asm code. - avoid stale values when taking the lock. - move allocation of PTE tables inside the shatter function. Hongyan Xia (2): x86/mm: factor out the code for shattering an l3 PTE x86/mm: factor out the code for shattering an l2 PTE xen/arch/x86/mm.c | 194 +++--- 1 file changed, 98 insertions(+), 96 deletions(-) -- 2.17.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v3 1/2] x86/mm: factor out the code for shattering an l3 PTE
map_pages_to_xen and modify_xen_mappings are performing almost exactly the same operations when shattering an l3 PTE, the only difference being whether we want to flush. Signed-off-by: Hongyan Xia --- Changes in v3: - style and indentation changes. - return -ENOMEM instead of -1. Changes in v2: - improve asm. - re-read pl3e from memory when taking the lock. - move the allocation of l2t inside the shatter function. --- xen/arch/x86/mm.c | 98 +++ 1 file changed, 49 insertions(+), 49 deletions(-) diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c index 7d4dd80a85..97f11b6016 100644 --- a/xen/arch/x86/mm.c +++ b/xen/arch/x86/mm.c @@ -5151,6 +5151,52 @@ l1_pgentry_t *virt_to_xen_l1e(unsigned long v) flush_area_local((const void *)v, f) : \ flush_area_all((const void *)v, f)) +/* Shatter an l3 entry and populate l2. If virt is passed in, also do flush. */ +static int shatter_l3e(l3_pgentry_t *pl3e, unsigned long virt, bool locking) +{ +unsigned int i; +l3_pgentry_t ol3e = *pl3e; +l2_pgentry_t l2e = l2e_from_intpte(l3e_get_intpte(ol3e)); +l2_pgentry_t *l2t = alloc_xen_pagetable(); + +if ( !l2t ) +return -ENOMEM; + +for ( i = 0; i < L2_PAGETABLE_ENTRIES; i++ ) +{ +l2e_write(l2t + i, l2e); +l2e = l2e_from_intpte( + l2e_get_intpte(l2e) + (PAGE_SIZE << PAGETABLE_ORDER)); +} + +if ( locking ) +spin_lock(_pgdir_lock); +if ( (l3e_get_flags(*pl3e) & _PAGE_PRESENT) && + (l3e_get_flags(*pl3e) & _PAGE_PSE) ) +{ +l3e_write_atomic(pl3e, +l3e_from_paddr((paddr_t)virt_to_maddr(l2t), __PAGE_HYPERVISOR)); +l2t = NULL; +} +if ( locking ) +spin_unlock(_pgdir_lock); + +if ( virt ) +{ +unsigned int flush_flags = +FLUSH_TLB | FLUSH_ORDER(2 * PAGETABLE_ORDER); + +if ( l3e_get_flags(ol3e) & _PAGE_GLOBAL ) +flush_flags |= FLUSH_TLB_GLOBAL; +flush_area(virt, flush_flags); +} + +if ( l2t ) +free_xen_pagetable(l2t); + +return 0; +} + int map_pages_to_xen( unsigned long virt, mfn_t mfn, @@ -5244,9 +5290,6 @@ int map_pages_to_xen( if ( (l3e_get_flags(ol3e) & _PAGE_PRESENT) && (l3e_get_flags(ol3e) & _PAGE_PSE) ) { -unsigned int flush_flags = -FLUSH_TLB | FLUSH_ORDER(2 * PAGETABLE_ORDER); - /* Skip this PTE if there is no change. */ if ( ((l3e_get_pfn(ol3e) & ~(L2_PAGETABLE_ENTRIES * L1_PAGETABLE_ENTRIES - 1)) + @@ -5267,33 +5310,9 @@ int map_pages_to_xen( continue; } -pl2e = alloc_xen_pagetable(); -if ( pl2e == NULL ) +/* Pass virt to indicate we need to flush. */ +if ( shatter_l3e(pl3e, virt, locking) ) return -ENOMEM; - -for ( i = 0; i < L2_PAGETABLE_ENTRIES; i++ ) -l2e_write(pl2e + i, - l2e_from_pfn(l3e_get_pfn(ol3e) + - (i << PAGETABLE_ORDER), - l3e_get_flags(ol3e))); - -if ( l3e_get_flags(ol3e) & _PAGE_GLOBAL ) -flush_flags |= FLUSH_TLB_GLOBAL; - -if ( locking ) -spin_lock(_pgdir_lock); -if ( (l3e_get_flags(*pl3e) & _PAGE_PRESENT) && - (l3e_get_flags(*pl3e) & _PAGE_PSE) ) -{ -l3e_write_atomic(pl3e, l3e_from_mfn(virt_to_mfn(pl2e), -__PAGE_HYPERVISOR)); -pl2e = NULL; -} -if ( locking ) -spin_unlock(_pgdir_lock); -flush_area(virt, flush_flags); -if ( pl2e ) -free_xen_pagetable(pl2e); } pl2e = virt_to_xen_l2e(virt); @@ -5578,27 +5597,8 @@ int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf) } /* PAGE1GB: shatter the superpage and fall through. */ -pl2e = alloc_xen_pagetable(); -if ( !pl2e ) +if ( shatter_l3e(pl3e, 0, locking) ) return -ENOMEM; -for ( i = 0; i < L2_PAGETABLE_ENTRIES; i++ ) -l2e_write(pl2e + i, - l2e_from_pfn(l3e_get_pfn(*pl3e) + - (i << PAGETABLE_ORDER), - l3e_get_flags(*pl3e))); -if ( locking ) -spin_lock(_pgdir_lock); -if ( (l3e_get_flags(*pl3e) & _PAGE_PRESENT) && - (l3e_get_flags(*pl3e) & _PAGE_PSE) ) -{ -l3e_write_atomic(pl3e, l3e_from_mfn(virt_to_mfn(pl2e), -__PAGE_HYPERVISOR)); -pl2e =
[Xen-devel] [PATCH v3 2/2] x86/mm: factor out the code for shattering an l2 PTE
map_pages_to_xen and modify_xen_mappings are performing almost exactly the same operations when shattering an l2 PTE, the only difference being whether we want to flush. Signed-off-by: Hongyan Xia --- Changes in v3: - style and indentation changes. - return -ENOMEM instead of -1. Changes in v2: - improve asm. - re-read pl2e from memory when taking the lock. - move the allocation of l1t inside the shatter function. --- xen/arch/x86/mm.c | 96 --- 1 file changed, 49 insertions(+), 47 deletions(-) diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c index 97f11b6016..e5ba6b52fb 100644 --- a/xen/arch/x86/mm.c +++ b/xen/arch/x86/mm.c @@ -5151,6 +5151,52 @@ l1_pgentry_t *virt_to_xen_l1e(unsigned long v) flush_area_local((const void *)v, f) : \ flush_area_all((const void *)v, f)) +/* Shatter an l2 entry and populate l1. If virt is passed in, also do flush. */ +static int shatter_l2e(l2_pgentry_t *pl2e, unsigned long virt, bool locking) +{ +unsigned int i; +l2_pgentry_t ol2e = *pl2e; +l1_pgentry_t l1e = l1e_from_paddr(l2e_get_paddr(ol2e), + lNf_to_l1f(l2e_get_flags(ol2e))); +l1_pgentry_t *l1t = alloc_xen_pagetable(); + +if ( !l1t ) +return -ENOMEM; + +for ( i = 0; i < L1_PAGETABLE_ENTRIES; i++ ) +{ +l1e_write(l1t + i, l1e); +l1e = l1e_from_intpte(l1e_get_intpte(l1e) + PAGE_SIZE); +} + +if ( locking ) +spin_lock(_pgdir_lock); +if ( (l2e_get_flags(*pl2e) & _PAGE_PRESENT) && + (l2e_get_flags(*pl2e) & _PAGE_PSE) ) +{ +l2e_write_atomic(pl2e, +l2e_from_paddr((paddr_t)virt_to_maddr(l1t), __PAGE_HYPERVISOR)); +l1t = NULL; +} +if ( locking ) +spin_unlock(_pgdir_lock); + +if ( virt ) +{ +unsigned int flush_flags = +FLUSH_TLB | FLUSH_ORDER(PAGETABLE_ORDER); + +if ( l2e_get_flags(ol2e) & _PAGE_GLOBAL ) +flush_flags |= FLUSH_TLB_GLOBAL; +flush_area(virt, flush_flags); +} + +if ( l1t ) +free_xen_pagetable(l1t); + +return 0; +} + /* Shatter an l3 entry and populate l2. If virt is passed in, also do flush. */ static int shatter_l3e(l3_pgentry_t *pl3e, unsigned long virt, bool locking) { @@ -5364,9 +5410,6 @@ int map_pages_to_xen( } else if ( l2e_get_flags(*pl2e) & _PAGE_PSE ) { -unsigned int flush_flags = -FLUSH_TLB | FLUSH_ORDER(PAGETABLE_ORDER); - /* Skip this PTE if there is no change. */ if ( (((l2e_get_pfn(*pl2e) & ~(L1_PAGETABLE_ENTRIES - 1)) + l1_table_offset(virt)) == mfn_x(mfn)) && @@ -5385,32 +5428,9 @@ int map_pages_to_xen( goto check_l3; } -pl1e = alloc_xen_pagetable(); -if ( pl1e == NULL ) +/* Pass virt to indicate we need to flush. */ +if ( shatter_l2e(pl2e, virt, locking) ) return -ENOMEM; - -for ( i = 0; i < L1_PAGETABLE_ENTRIES; i++ ) -l1e_write([i], - l1e_from_pfn(l2e_get_pfn(*pl2e) + i, - lNf_to_l1f(l2e_get_flags(*pl2e; - -if ( l2e_get_flags(*pl2e) & _PAGE_GLOBAL ) -flush_flags |= FLUSH_TLB_GLOBAL; - -if ( locking ) -spin_lock(_pgdir_lock); -if ( (l2e_get_flags(*pl2e) & _PAGE_PRESENT) && - (l2e_get_flags(*pl2e) & _PAGE_PSE) ) -{ -l2e_write_atomic(pl2e, l2e_from_mfn(virt_to_mfn(pl1e), -__PAGE_HYPERVISOR)); -pl1e = NULL; -} -if ( locking ) -spin_unlock(_pgdir_lock); -flush_area(virt, flush_flags); -if ( pl1e ) -free_xen_pagetable(pl1e); } pl1e = l2e_to_l1e(*pl2e) + l1_table_offset(virt); @@ -5633,26 +5653,8 @@ int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf) else { /* PSE: shatter the superpage and try again. */ -pl1e = alloc_xen_pagetable(); -if ( !pl1e ) +if ( shatter_l2e(pl2e, 0, locking) ) return -ENOMEM; -for ( i = 0; i < L1_PAGETABLE_ENTRIES; i++ ) -l1e_write([i], - l1e_from_pfn(l2e_get_pfn(*pl2e) + i, - l2e_get_flags(*pl2e) & ~_PAGE_PSE)); -if ( locking ) -spin_lock(_pgdir_lock); -if ( (l2e_get_flags(*pl2e) & _PAGE_PRESENT) && -
[Xen-devel] [ovmf test] 144703: regressions - FAIL
flight 144703 ovmf real [real] http://logs.test-lab.xenproject.org/osstest/logs/144703/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: build-amd64 6 xen-buildfail REGR. vs. 144637 build-i386-xsm6 xen-buildfail REGR. vs. 144637 build-amd64-xsm 6 xen-buildfail REGR. vs. 144637 build-i3866 xen-buildfail REGR. vs. 144637 Tests which did not succeed, but are not blocking: test-amd64-i386-xl-qemuu-ovmf-amd64 1 build-check(1) blocked n/a build-i386-libvirt1 build-check(1) blocked n/a test-amd64-amd64-xl-qemuu-ovmf-amd64 1 build-check(1) blocked n/a build-amd64-libvirt 1 build-check(1) blocked n/a version targeted for testing: ovmf d3add11e87dace180387562d6f1951f2bffbd3d9 baseline version: ovmf 804666c86e7b6f04fe5c5cfdb13199c19e0e99b0 Last test of basis 144637 2019-12-09 09:09:49 Z2 days Failing since144646 2019-12-10 01:39:53 Z1 days7 attempts Testing same since 144703 2019-12-11 10:09:05 Z0 days1 attempts People who touched revisions under test: Antoine Coeur Bob Feng Jiewen Yao Michael Kubacki Philippe Mathieu-Daude Steven Shi jobs: build-amd64-xsm fail build-i386-xsm fail build-amd64 fail build-i386 fail build-amd64-libvirt blocked build-i386-libvirt blocked build-amd64-pvopspass build-i386-pvops pass test-amd64-amd64-xl-qemuu-ovmf-amd64 blocked test-amd64-i386-xl-qemuu-ovmf-amd64 blocked sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Not pushing. commit d3add11e87dace180387562d6f1951f2bffbd3d9 Author: Michael Kubacki Date: Wed Nov 20 17:31:24 2019 -0800 MdeModulePkg PeiCore: Improve comment semantics This patch clarifies wording in several PeiCore comments to improve reading comprehension. Cc: Dandan Bi Cc: Liming Gao Cc: Jian J Wang Cc: Hao A Wu Signed-off-by: Michael Kubacki Reviewed-by: Liming Gao Reviewed-by: Jian J Wang commit d39d1260c615b716675f67f5c4e1f4f52df01dad Author: Michael Kubacki Date: Wed Nov 20 17:10:48 2019 -0800 MdeModulePkg PeiCore: Fix typos Cc: Dandan Bi Cc: Liming Gao Cc: Jian J Wang Cc: Hao A Wu Signed-off-by: Michael Kubacki Reviewed-by: Liming Gao Reviewed-by: Philippe Mathieu-Daude Reviewed-by: Jian J Wang commit 97eedf5dfbaffde33210fd88066247cf0b7d3325 Author: Antoine Coeur Date: Wed Dec 4 12:14:53 2019 +0800 IntelFsp2WrapperPkg: Fix various typos Fix various typos in comments and documentation. Cc: Chasel Chiu Cc: Nate DeSimone Cc: Star Zeng Reviewed-by: Philippe Mathieu-Daude Signed-off-by: Philippe Mathieu-Daude Reviewed-by: Nate DeSimone Reviewed-by: Chasel Chiu Reviewed-by: Star Zeng commit 7e55cf6b48dcd43de46d008b2f12caaad2554503 Author: Jiewen Yao Date: Sat Dec 7 21:41:10 2019 +0800 SecurityPkg/Tcg2Smm: Measure the table before patch. REF: https://bugzilla.tianocore.org/show_bug.cgi?id=1940 According to TCG PFP specification: the ACPI table must be measured prior to any modification, and the measurement must be same cross every boot cycle. There is a fix 3a63c17ebc853cbb27d190729d01e27f68e65b94 for the HID data. However that is not enough. The LAML/LASA and PCD configuration change may also cause similar problem. We need measure the table before any update. Cc: Jian J Wang Cc: Chao Zhang Signed-off-by: Jiewen Yao Reviewed-by: Chao Zhang commit a80032dc44a1071a34f4415a7c5cef5170ee6159 Author: Steven Shi Date: Tue Nov 19 16:22:08 2019 +0800 BaseTools: Remove
Re: [Xen-devel] [PATCH v2 4/4] xen-blkback: support dynamic unbind/bind
> -Original Message- > From: Roger Pau Monné > Sent: 11 December 2019 10:46 > To: Durrant, Paul > Cc: xen-devel@lists.xenproject.org; linux-ker...@vger.kernel.org; Konrad > Rzeszutek Wilk ; Jens Axboe ; > Boris Ostrovsky ; Juergen Gross > ; Stefano Stabellini > Subject: Re: [PATCH v2 4/4] xen-blkback: support dynamic unbind/bind > > On Tue, Dec 10, 2019 at 11:33:47AM +, Paul Durrant wrote: > > By simply re-attaching to shared rings during connect_ring() rather than > > assuming they are freshly allocated (i.e assuming the counters are zero) > > it is possible for vbd instances to be unbound and re-bound from and to > > (respectively) a running guest. > > > > This has been tested by running: > > > > while true; > > do fio --name=randwrite --ioengine=libaio --iodepth=16 \ > > --rw=randwrite --bs=4k --direct=1 --size=1G --verify=crc32; > > done > > > > in a PV guest whilst running: > > > > while true; > > do echo vbd-$DOMID-$VBD >unbind; > > echo unbound; > > sleep 5; > > Is there anyway to know when the unbind has finished? AFAICT > xen_blkif_disconnect will return EBUSY if there are in flight > requests, and the disconnect won't be completed until those requests > are finished. Yes, the device sysfs node will disappear when remove() completes. > > > echo vbd-$DOMID-$VBD >bind; > > echo bound; > > sleep 3; > > done > > > > in dom0 from /sys/bus/xen-backend/drivers/vbd to continuously unbind and > > re-bind its system disk image. > > > > This is a highly useful feature for a backend module as it allows it to > be > > unloaded and re-loaded (i.e. updated) without requiring domUs to be > halted. > > This was also tested by running: > > > > while true; > > do echo vbd-$DOMID-$VBD >unbind; > > echo unbound; > > sleep 5; > > rmmod xen-blkback; > > echo unloaded; > > sleep 1; > > modprobe xen-blkback; > > echo bound; > > cd $(pwd); > > sleep 3; > > done > > > > in dom0 whilst running the same loop as above in the (single) PV guest. > > > > Some (less stressful) testing has also been done using a Windows HVM > guest > > with the latest 9.0 PV drivers installed. > > > > Signed-off-by: Paul Durrant > > --- > > Cc: Konrad Rzeszutek Wilk > > Cc: "Roger Pau Monné" > > Cc: Jens Axboe > > Cc: Boris Ostrovsky > > Cc: Juergen Gross > > Cc: Stefano Stabellini > > > > v2: > > - Apply a sanity check to the value of rsp_prod and fail the re-attach > >if it is implausible > > - Set allow_rebind to prevent ring from being closed on unbind > > - Update test workload from dd to fio (with verification) > > --- > > drivers/block/xen-blkback/xenbus.c | 59 +- > > 1 file changed, 41 insertions(+), 18 deletions(-) > > > > diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen- > blkback/xenbus.c > > index e8c5c54e1d26..13d09630b237 100644 > > --- a/drivers/block/xen-blkback/xenbus.c > > +++ b/drivers/block/xen-blkback/xenbus.c > > @@ -181,6 +181,8 @@ static int xen_blkif_map(struct xen_blkif_ring > *ring, grant_ref_t *gref, > > { > > int err; > > struct xen_blkif *blkif = ring->blkif; > > + struct blkif_common_sring *sring_common; > > + RING_IDX rsp_prod, req_prod; > > > > /* Already connected through? */ > > if (ring->irq) > > @@ -191,46 +193,66 @@ static int xen_blkif_map(struct xen_blkif_ring > *ring, grant_ref_t *gref, > > if (err < 0) > > return err; > > > > + sring_common = (struct blkif_common_sring *)ring->blk_ring; > > + rsp_prod = READ_ONCE(sring_common->rsp_prod); > > + req_prod = READ_ONCE(sring_common->req_prod); > > + > > switch (blkif->blk_protocol) { > > case BLKIF_PROTOCOL_NATIVE: > > { > > - struct blkif_sring *sring; > > - sring = (struct blkif_sring *)ring->blk_ring; > > - BACK_RING_INIT(>blk_rings.native, sring, > > - XEN_PAGE_SIZE * nr_grefs); > > + struct blkif_sring *sring_native = > > + (struct blkif_sring *)ring->blk_ring; > > I think you can constify both sring_native and sring_common (and the > other instances below). Yes, I can do that. I don't think the macros would mind. > > > + unsigned int size = __RING_SIZE(sring_native, > > + XEN_PAGE_SIZE * nr_grefs); > > + > > + BACK_RING_ATTACH(>blk_rings.native, sring_native, > > +rsp_prod, XEN_PAGE_SIZE * nr_grefs); > > + err = (req_prod - rsp_prod > size) ? -EIO : 0; > > break; > > } > > case BLKIF_PROTOCOL_X86_32: > > { > > - struct blkif_x86_32_sring *sring_x86_32; > > - sring_x86_32 = (struct blkif_x86_32_sring *)ring->blk_ring; > > - BACK_RING_INIT(>blk_rings.x86_32, sring_x86_32, > > - XEN_PAGE_SIZE * nr_grefs); > > + struct blkif_x86_32_sring *sring_x86_32 = > > + (struct blkif_x86_32_sring
Re: [Xen-devel] [PATCH v5 1/2] xenbus/backend: Add memory pressure handler callback
On Wed, Dec 11, 2019 at 04:50:58AM +0100, SeongJae Park wrote: > On Tue, 10 Dec 2019 11:16:35 +0100 "Roger Pau Monné" > wrote: > > > diff --git a/include/xen/xenbus.h b/include/xen/xenbus.h > > > index 869c816d5f8c..cdb075e4182f 100644 > > > --- a/include/xen/xenbus.h > > > +++ b/include/xen/xenbus.h > > > @@ -104,6 +104,7 @@ struct xenbus_driver { > > > struct device_driver driver; > > > int (*read_otherend_details)(struct xenbus_device *dev); > > > int (*is_ready)(struct xenbus_device *dev); > > > + unsigned (*reclaim)(struct xenbus_device *dev); > > > > ... hence I wonder why it's returning an unsigned when it's just > > ignored. > > > > IMO it should return an int to signal errors, and the return should be > > ignored. > > I first thought similarly and set the callback to return something. However, > as this callback is called to simply notify the memory pressure and ask the > driver to free its memory as many as possible, I couldn't easily imagine what > kind of errors that need to be handled by its caller can occur in the > callback, > especially because current blkback's callback implementation has no such > error. > So, if you and others agree, I would like to simply set the return type to > 'void' for now and defer the error handling to a future change. Yes, I also wondered the same, but seeing you returned an integer I assumed there was interest in returning some kind of value. If there's nothing to return let's just make it void. > > > > Also, I think it would preferable for this function to take an extra > > parameter to describe the resource the driver should attempt to free > > (ie: memory or interrupts for example). I'm however not able to find > > any existing Linux type to describe such resources. > > Yes, such extention would be the right direction. However, because there is > no > existing Linux type to describe the type of resources to reclaim as you also > mentioned, there could be many different opinions about its implementation > detail. In my opinion, it could be also possible to simply add another > callback for another resource type. That said, because currently we have an > use case and an implementation for the memory pressure only, I would like to > let it as is for now and defer the extension as a future work, if you and > others have no objection. Ack, can I please ask the callback to be named reclaim_memory or some such then? Thanks, Roger. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2 4/4] xen-blkback: support dynamic unbind/bind
On Tue, Dec 10, 2019 at 11:33:47AM +, Paul Durrant wrote: > By simply re-attaching to shared rings during connect_ring() rather than > assuming they are freshly allocated (i.e assuming the counters are zero) > it is possible for vbd instances to be unbound and re-bound from and to > (respectively) a running guest. > > This has been tested by running: > > while true; > do fio --name=randwrite --ioengine=libaio --iodepth=16 \ > --rw=randwrite --bs=4k --direct=1 --size=1G --verify=crc32; > done > > in a PV guest whilst running: > > while true; > do echo vbd-$DOMID-$VBD >unbind; > echo unbound; > sleep 5; Is there anyway to know when the unbind has finished? AFAICT xen_blkif_disconnect will return EBUSY if there are in flight requests, and the disconnect won't be completed until those requests are finished. > echo vbd-$DOMID-$VBD >bind; > echo bound; > sleep 3; > done > > in dom0 from /sys/bus/xen-backend/drivers/vbd to continuously unbind and > re-bind its system disk image. > > This is a highly useful feature for a backend module as it allows it to be > unloaded and re-loaded (i.e. updated) without requiring domUs to be halted. > This was also tested by running: > > while true; > do echo vbd-$DOMID-$VBD >unbind; > echo unbound; > sleep 5; > rmmod xen-blkback; > echo unloaded; > sleep 1; > modprobe xen-blkback; > echo bound; > cd $(pwd); > sleep 3; > done > > in dom0 whilst running the same loop as above in the (single) PV guest. > > Some (less stressful) testing has also been done using a Windows HVM guest > with the latest 9.0 PV drivers installed. > > Signed-off-by: Paul Durrant > --- > Cc: Konrad Rzeszutek Wilk > Cc: "Roger Pau Monné" > Cc: Jens Axboe > Cc: Boris Ostrovsky > Cc: Juergen Gross > Cc: Stefano Stabellini > > v2: > - Apply a sanity check to the value of rsp_prod and fail the re-attach >if it is implausible > - Set allow_rebind to prevent ring from being closed on unbind > - Update test workload from dd to fio (with verification) > --- > drivers/block/xen-blkback/xenbus.c | 59 +- > 1 file changed, 41 insertions(+), 18 deletions(-) > > diff --git a/drivers/block/xen-blkback/xenbus.c > b/drivers/block/xen-blkback/xenbus.c > index e8c5c54e1d26..13d09630b237 100644 > --- a/drivers/block/xen-blkback/xenbus.c > +++ b/drivers/block/xen-blkback/xenbus.c > @@ -181,6 +181,8 @@ static int xen_blkif_map(struct xen_blkif_ring *ring, > grant_ref_t *gref, > { > int err; > struct xen_blkif *blkif = ring->blkif; > + struct blkif_common_sring *sring_common; > + RING_IDX rsp_prod, req_prod; > > /* Already connected through? */ > if (ring->irq) > @@ -191,46 +193,66 @@ static int xen_blkif_map(struct xen_blkif_ring *ring, > grant_ref_t *gref, > if (err < 0) > return err; > > + sring_common = (struct blkif_common_sring *)ring->blk_ring; > + rsp_prod = READ_ONCE(sring_common->rsp_prod); > + req_prod = READ_ONCE(sring_common->req_prod); > + > switch (blkif->blk_protocol) { > case BLKIF_PROTOCOL_NATIVE: > { > - struct blkif_sring *sring; > - sring = (struct blkif_sring *)ring->blk_ring; > - BACK_RING_INIT(>blk_rings.native, sring, > -XEN_PAGE_SIZE * nr_grefs); > + struct blkif_sring *sring_native = > + (struct blkif_sring *)ring->blk_ring; I think you can constify both sring_native and sring_common (and the other instances below). > + unsigned int size = __RING_SIZE(sring_native, > + XEN_PAGE_SIZE * nr_grefs); > + > + BACK_RING_ATTACH(>blk_rings.native, sring_native, > + rsp_prod, XEN_PAGE_SIZE * nr_grefs); > + err = (req_prod - rsp_prod > size) ? -EIO : 0; > break; > } > case BLKIF_PROTOCOL_X86_32: > { > - struct blkif_x86_32_sring *sring_x86_32; > - sring_x86_32 = (struct blkif_x86_32_sring *)ring->blk_ring; > - BACK_RING_INIT(>blk_rings.x86_32, sring_x86_32, > -XEN_PAGE_SIZE * nr_grefs); > + struct blkif_x86_32_sring *sring_x86_32 = > + (struct blkif_x86_32_sring *)ring->blk_ring; > + unsigned int size = __RING_SIZE(sring_x86_32, > + XEN_PAGE_SIZE * nr_grefs); > + > + BACK_RING_ATTACH(>blk_rings.x86_32, sring_x86_32, > + rsp_prod, XEN_PAGE_SIZE * nr_grefs); > + err = (req_prod - rsp_prod > size) ? -EIO : 0; > break; > } > case BLKIF_PROTOCOL_X86_64: > { > - struct blkif_x86_64_sring *sring_x86_64; > - sring_x86_64 = (struct blkif_x86_64_sring *)ring->blk_ring; > - BACK_RING_INIT(>blk_rings.x86_64, sring_x86_64, > -
Re: [Xen-devel] [PATCH v2 1/2] x86/mm: factor out the code for shattering an l3 PTE
On Tue, 2019-12-10 at 16:20 +0100, Jan Beulich wrote: > > ol2e = l2e_from_intpte( >l2e_get_intpte(ol2e) + (PAGE_SIZE << > PAGETABLE_ORDER)); > > Of course, as mentioned before, I'm not overly happy to see type > safety lost in case like this one, where it's not needed like e.g. > further up to convert from L3 to L2 entry. > Okay, so I did a comparison between the efficiency of the assembly under a release build. The old "type-safe" way requires 16 instructions to prepare the first l2e, and each iteration of the inner loop of populating l2t requires 7 instructions. The new type-unsafe way requires 6 to prepare the first l2e, and each iteration of populating l2t takes 5 instructions. So the difference of populating l2t is 3600 vs. 2566 instructions, which is not very small. I have not tested the packed bit field way you suggested, but I think it could even be higher than 3600 due to masking, shifting and also overflow handling. Hongyan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2 2/4] xenbus: limit when state is forced to closed
On 11.12.19 11:14, Durrant, Paul wrote: -Original Message- From: Roger Pau Monné Sent: 11 December 2019 10:06 To: Durrant, Paul Cc: xen-devel@lists.xenproject.org; linux-ker...@vger.kernel.org; Juergen Gross ; Stefano Stabellini ; Boris Ostrovsky Subject: Re: [Xen-devel] [PATCH v2 2/4] xenbus: limit when state is forced to closed On Tue, Dec 10, 2019 at 11:33:45AM +, Paul Durrant wrote: If a driver probe() fails then leave the xenstore state alone. There is no reason to modify it as the failure may be due to transient resource allocation issues and hence a subsequent probe() may succeed. If the driver supports re-binding then only force state to closed during remove() only in the case when the toolstack may need to clean up. This can be detected by checking whether the state in xenstore has been set to closing prior to device removal. NOTE: Re-bind support is indicated by new boolean in struct xenbus_driver, which defaults to false. Subsequent patches will add support to some backend drivers. My intention was to specify whether you want to close the backends on unbind in sysfs, so that an user can decide at runtime, rather than having a hardcoded value in the driver. Anyway, I'm less sure whether such runtime tunable is useful at all, so let's leave it out and can always be added afterwards. At the end of day a user wrongly doing a rmmod blkback can always recover gracefully by loading blkback again with your proposed approach to leave connections open on module removal. Sorry for the extra work. Does this mean you don't think the extra driver flag is necessary any more? NB: now that xenbus actually takes module references you can't accidentally rmmod any more :-) I'd like it to be kept, please. Juergen ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [xen-unstable-coverity test] 144699: all pass - PUSHED
flight 144699 xen-unstable-coverity real [real] http://logs.test-lab.xenproject.org/osstest/logs/144699/ Perfect :-) All tests in this flight passed as required version targeted for testing: xen 272c18435e93cbf749c096a9552ab5ef0d79a4ca baseline version: xen ae25407faaaddf4abe44137ebf0e177a8c8f9858 Last test of basis 144634 2019-12-08 09:18:58 Z3 days Testing same since 144699 2019-12-11 09:18:24 Z0 days1 attempts People who touched revisions under test: Andrew Cooper Daniel De Graaf George Dunlap Igor Druzhinin Jan Beulich Jeremi Piotrowski Julien Grall Krzysztof Kolasa Mark Pryor Rasmus Villemoes Razvan Cojocaru Roger Pau Monné jobs: coverity-amd64 pass sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Pushing revision : To xenbits.xen.org:/home/xen/git/xen.git ae25407faa..272c18435e 272c18435e93cbf749c096a9552ab5ef0d79a4ca -> coverity-tested/smoke ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2 2/4] xenbus: limit when state is forced to closed
> -Original Message- > From: Roger Pau Monné > Sent: 11 December 2019 10:06 > To: Durrant, Paul > Cc: xen-devel@lists.xenproject.org; linux-ker...@vger.kernel.org; Juergen > Gross ; Stefano Stabellini ; > Boris Ostrovsky > Subject: Re: [Xen-devel] [PATCH v2 2/4] xenbus: limit when state is forced > to closed > > On Tue, Dec 10, 2019 at 11:33:45AM +, Paul Durrant wrote: > > If a driver probe() fails then leave the xenstore state alone. There is > no > > reason to modify it as the failure may be due to transient resource > > allocation issues and hence a subsequent probe() may succeed. > > > > If the driver supports re-binding then only force state to closed during > > remove() only in the case when the toolstack may need to clean up. This > can > > be detected by checking whether the state in xenstore has been set to > > closing prior to device removal. > > > > NOTE: Re-bind support is indicated by new boolean in struct > xenbus_driver, > > which defaults to false. Subsequent patches will add support to > > some backend drivers. > > My intention was to specify whether you want to close the > backends on unbind in sysfs, so that an user can decide at runtime, > rather than having a hardcoded value in the driver. > > Anyway, I'm less sure whether such runtime tunable is useful at all, > so let's leave it out and can always be added afterwards. At the end > of day a user wrongly doing a rmmod blkback can always recover > gracefully by loading blkback again with your proposed approach to > leave connections open on module removal. > > Sorry for the extra work. > Does this mean you don't think the extra driver flag is necessary any more? NB: now that xenbus actually takes module references you can't accidentally rmmod any more :-) Paul ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2 2/4] xenbus: limit when state is forced to closed
On Tue, Dec 10, 2019 at 11:33:45AM +, Paul Durrant wrote: > If a driver probe() fails then leave the xenstore state alone. There is no > reason to modify it as the failure may be due to transient resource > allocation issues and hence a subsequent probe() may succeed. > > If the driver supports re-binding then only force state to closed during > remove() only in the case when the toolstack may need to clean up. This can > be detected by checking whether the state in xenstore has been set to > closing prior to device removal. > > NOTE: Re-bind support is indicated by new boolean in struct xenbus_driver, > which defaults to false. Subsequent patches will add support to > some backend drivers. My intention was to specify whether you want to close the backends on unbind in sysfs, so that an user can decide at runtime, rather than having a hardcoded value in the driver. Anyway, I'm less sure whether such runtime tunable is useful at all, so let's leave it out and can always be added afterwards. At the end of day a user wrongly doing a rmmod blkback can always recover gracefully by loading blkback again with your proposed approach to leave connections open on module removal. Sorry for the extra work. Thanks, Roger. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] x86/microcode: Support builtin CPU microcode
On 11.12.2019 00:18, Eslam Elnikety wrote: > On 10.12.19 10:37, Jan Beulich wrote: >> On 09.12.2019 09:41, Eslam Elnikety wrote: >>> --- a/docs/misc/xen-command-line.pandoc >>> +++ b/docs/misc/xen-command-line.pandoc >>> @@ -2113,7 +2113,7 @@ logic applies: >>> active by default. >>> >>> ### ucode (x86) >>> -> `= List of [ | scan=, nmi= ]` >>> +> `= List of [ | scan= | builtin=, nmi= ]` >> >> Despite my other question regarding the usefulness of this as a >> whole a few comments. >> >> Do "scan" and "builtin" really need to exclude each other? I.e. >> don't you mean , instead of | ? > The useful case here would be specifying ucode=scan,builtin which would > translate to fallback onto the builtin microcode if a scan fails. In > fact, this is already the case given the implementation in v1. It will > be better to clarify this semantic by allowing scan,builtin. > > On that note, I really see the "" and "scan=" to be > equal. Following the logic earlier we should probably also allow > ucode=,builtin. This translates to fallback to builtin if there > are no modules at index . Almost - if the builtin one is newer than the separate blob, then either of the cmdline options you name should still cause the builtin one to be loaded. IOW you want to honor both options, not prefer the earlier over a later one. >>> + ---help--- >>> + Include the CPU microcode update in the Xen image itself. With this >>> + support, Xen can update the CPU microcode upon boot using the builtin >>> + microcode, with no need for an additional microcode boot modules. >>> + >>> + If unsure, say N. >>> + >>> +config BUILTIN_UCODE_DIR >>> + string >>> + default "/lib/firmware" >>> + depends on BUILTIN_UCODE >> >> ... are two separate options needed at all? Can't this latter one >> being the empty string just imply the feature to be disabled? > > I can go either way here. To me, two options is clearer. It's the other way around here, but I'd accept being outvoted. >>> +ucode_blob.data = xmalloc_bytes(ucode_blob.size); >>> +if ( !ucode_blob.data ) >>> +return; >>> + >>> +if ( boot_cpu_data.x86_vendor == X86_VENDOR_AMD ) >>> +memcpy(ucode_blob.data, __builtin_amd_ucode_start, >>> ucode_blob.size); >>> +else >>> +memcpy(ucode_blob.data, __builtin_intel_ucode_start, >>> ucode_blob.size); >> >> ... why the copying? Can't you simply point ucode_blob.data at >> __builtin_{amd,intel}_ucode_start? > > I am all onboard. See my earlier response to Andrew. I used the same > logic that already exists for scan (which assumes that ucode_blob.data > is allocated and should be freed when all CPUs are updated). The scan case may be different in that it may not lend itself to re-using the blob in its original location. If that's not the reason for the present behavior, then I think we would want to do away with the unnecessary copying there as well. Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [ovmf test] 144693: regressions - FAIL
flight 144693 ovmf real [real] http://logs.test-lab.xenproject.org/osstest/logs/144693/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: build-amd64 6 xen-buildfail REGR. vs. 144637 build-i386-xsm6 xen-buildfail REGR. vs. 144637 build-amd64-xsm 6 xen-buildfail REGR. vs. 144637 build-i3866 xen-buildfail REGR. vs. 144637 Tests which did not succeed, but are not blocking: test-amd64-amd64-xl-qemuu-ovmf-amd64 1 build-check(1) blocked n/a build-i386-libvirt1 build-check(1) blocked n/a test-amd64-i386-xl-qemuu-ovmf-amd64 1 build-check(1) blocked n/a build-amd64-libvirt 1 build-check(1) blocked n/a version targeted for testing: ovmf 97eedf5dfbaffde33210fd88066247cf0b7d3325 baseline version: ovmf 804666c86e7b6f04fe5c5cfdb13199c19e0e99b0 Last test of basis 144637 2019-12-09 09:09:49 Z2 days Failing since144646 2019-12-10 01:39:53 Z1 days6 attempts Testing same since 144693 2019-12-11 06:15:38 Z0 days1 attempts People who touched revisions under test: Antoine Coeur Bob Feng Jiewen Yao Philippe Mathieu-Daude Steven Shi jobs: build-amd64-xsm fail build-i386-xsm fail build-amd64 fail build-i386 fail build-amd64-libvirt blocked build-i386-libvirt blocked build-amd64-pvopspass build-i386-pvops pass test-amd64-amd64-xl-qemuu-ovmf-amd64 blocked test-amd64-i386-xl-qemuu-ovmf-amd64 blocked sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Not pushing. commit 97eedf5dfbaffde33210fd88066247cf0b7d3325 Author: Antoine Coeur Date: Wed Dec 4 12:14:53 2019 +0800 IntelFsp2WrapperPkg: Fix various typos Fix various typos in comments and documentation. Cc: Chasel Chiu Cc: Nate DeSimone Cc: Star Zeng Reviewed-by: Philippe Mathieu-Daude Signed-off-by: Philippe Mathieu-Daude Reviewed-by: Nate DeSimone Reviewed-by: Chasel Chiu Reviewed-by: Star Zeng commit 7e55cf6b48dcd43de46d008b2f12caaad2554503 Author: Jiewen Yao Date: Sat Dec 7 21:41:10 2019 +0800 SecurityPkg/Tcg2Smm: Measure the table before patch. REF: https://bugzilla.tianocore.org/show_bug.cgi?id=1940 According to TCG PFP specification: the ACPI table must be measured prior to any modification, and the measurement must be same cross every boot cycle. There is a fix 3a63c17ebc853cbb27d190729d01e27f68e65b94 for the HID data. However that is not enough. The LAML/LASA and PCD configuration change may also cause similar problem. We need measure the table before any update. Cc: Jian J Wang Cc: Chao Zhang Signed-off-by: Jiewen Yao Reviewed-by: Chao Zhang commit a80032dc44a1071a34f4415a7c5cef5170ee6159 Author: Steven Shi Date: Tue Nov 19 16:22:08 2019 +0800 BaseTools: Remove redundant binary cache file Redesign the binary cache and not need to save the cache intermediate result and state in memory as a ModuleBuildCacheIR class instance. So remove the CacheIR.py which define the ModuleBuildCacheIR class. Signed-off-by: Steven Shi Cc: Liming Gao Cc: Bob Feng Reviewed-by: Bob Feng commit fc8b8deac2d77524ff8cfe44acf95b5e1f59804e Author: Steven Shi Date: Tue Nov 19 16:17:00 2019 +0800 BaseTools: Leverage compiler output to optimize binary cache Redesign the binary cache and bases on the compiler to output the dependency header files info for every module. The binary cache will directly consume the dependency header files info and doesn't parse the C source code by iteself. Also redesign the dependency files list format for
Re: [Xen-devel] [PATCH] x86/microcode: Support builtin CPU microcode
On 10.12.2019 23:40, Eslam Elnikety wrote: > On 10.12.19 10:21, Jan Beulich wrote: >> On 09.12.2019 22:49, Eslam Elnikety wrote: >>> On 09.12.19 16:19, Andrew Cooper wrote: On 09/12/2019 08:41, Eslam Elnikety wrote: > --- /dev/null > +++ b/xen/arch/x86/microcode/Makefile > @@ -0,0 +1,40 @@ > +# Copyright (C) 2019 Amazon.com, Inc. or its affiliates. > +# Author: Eslam Elnikety > +# > +# This program is free software; you can redistribute it and/or modify > +# it under the terms of the GNU General Public License as published by > +# the Free Software Foundation; either version 2 of the License, or > +# (at your option) any later version. > +# > +# This program is distributed in the hope that it will be useful, > +# but WITHOUT ANY WARRANTY; without even the implied warranty of > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > +# GNU General Public License for more details. > + > +obj-y += builtin_ucode.o > + > +# Directory holding the microcode updates. > +UCODE_DIR=$(patsubst "%",%,$(CONFIG_BUILTIN_UCODE_DIR)) > +amd-blobs := $(wildcard $(UCODE_DIR)/amd-ucode/*) > +intel-blobs := $(wildcard $(UCODE_DIR)/intel-ucode/*) This is a little dangerous. I can see why you want to do it like this, and I can't provide any obvious suggestions, but if this glob picks up anything which isn't a microcode file, it will break the logic to search for the right blob. >>> >>> We can limit the amd-blobs and intel-blob to binaries following the >>> naming convention AuthenticAMD.bin and GenuineIntel.bin for amd and >>> intel, respectively. Yet, this would be imposing an unnecessary >>> restriction on administrators who may want to be innovative with naming >>> (or want to use the naming microcode_amd_*.bin or FF-MM-SS instead). >>> >>> Alternatively, we can introduce CONFIG_BUILTIN_UCODE_INTEL and >>> CONFIG_BUILTIN_UCODE_AMD. Both default to empty strings. Then, an >>> administrator can specify exactly the microcodes to include relative to >>> the CONFIG_BUILTIN_UCODE_DIR. For example: >>> CONFIG_BUILTIN_UCODE_INTEL="intel-ucode/06-3a-09" >>> CONFIG_BUILTIN_UCODE_AMD="amd-ucode/microcode_amd_fam15h.bin" >> >> This would make the feature even less generic - I already meant to > > I do not follow the point about being less generic. (I hope my example > did not give the false impression that CONFIG_BUILTIN_UCODE_{AMD,INTEL} > allow for only a single microcode blob for a single signature). Well, the example indeed has given this impression to me. I'm having a hard time seeing how, beyond very narrow special cases, either of the examples could be useful to anyone. Yet I think examples should be generally useful. >> ask whether building ucode into binaries is really a useful thing >> when we already have more flexible ways. I could see this being >> useful if there was no other way to make ucode available at boot >> time. > > It is useful in addition to the existing ways to do early microcode > updates. First, when operating many hosts, using boot modules (either a > distinct microcode module or within an initrd) becomes involved. For > instance, tools to update boot entries (e.g., > https://linux.die.net/man/8/grubby) do not support adding arbitrary > (microcode) modules. I.e. you suggest to work around tools shortcomings by extending Xen? Wouldn't the more appropriate way to deal with this be to make the tools more capable? > Second, there is often need to couple a Xen build with a minimum > microcode patch level. Having the microcode built within the Xen image > itself is a streamlined, natural way of achieving that. Okay, I can accept this as a reason, to some degree at least. Yet as said elsewhere, I don't think you want then to override a possible "external" ucode module with the builtin blobs. Instead the newest of everything that's available should then be loaded. Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH-for-4.13] SUPPORT.md: add core scheduling
On 11.12.2019 09:45, Juergen Gross wrote: > Add core scheduling feature to SUPPORT.md. > > Signed-off-by: Juergen Gross Acked-by: Jan Beulich ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH] x86emul: correct LFS et al handling for 64-bit mode
AMD and friends explicitly specify that 64-bit operands aren't possible for these insns. Nevertheless REX.W isn't fully ignored: It still cancels a possible operand size override (0x66). Intel otoh explicitly provides for 64-bit operands on the respective insn page of the SDM. Signed-off-by: Jan Beulich --- a/xen/arch/x86/x86_emulate/x86_emulate.c +++ b/xen/arch/x86/x86_emulate/x86_emulate.c @@ -2640,6 +2640,15 @@ x86_decode_twobyte( } break; +case 0xb2: /* lss */ +case 0xb4: /* lfs */ +case 0xb5: /* lgs */ +/* REX.W ignored on a vendor-dependent basis. */ +if ( op_bytes == 8 && + (ctxt->cpuid->x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON)) ) +op_bytes = 4; +break; + case 0xb8: /* jmpe / popcnt */ if ( rep_prefix() ) ctxt->opcode |= MASK_INSR(vex.pfx, X86EMUL_OPC_PFX_MASK); ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH] x86emul: correct segment override decode for 64-bit mode
The legacy / compatibility mode ES, CS, SS, and DS overrides are null prefixes in 64-bit mode, i.e. they in particular don't cancel an earlier FS or GS one. Signed-off-by: Jan Beulich --- a/xen/arch/x86/x86_emulate/x86_emulate.c +++ b/xen/arch/x86/x86_emulate/x86_emulate.c @@ -2830,14 +2830,17 @@ x86_decode( case 0x67: /* address-size override */ ad_bytes = def_ad_bytes ^ (mode_64bit() ? 12 : 6); break; -case 0x2e: /* CS override */ -override_seg = x86_seg_cs; +case 0x2e: /* CS override / null prefix in 64-bit mode */ +if ( !mode_64bit() ) +override_seg = x86_seg_cs; break; -case 0x3e: /* DS override */ -override_seg = x86_seg_ds; +case 0x3e: /* DS override / null prefix in 64-bit mode */ +if ( !mode_64bit() ) +override_seg = x86_seg_ds; break; -case 0x26: /* ES override */ -override_seg = x86_seg_es; +case 0x26: /* ES override / null prefix in 64-bit mode */ +if ( !mode_64bit() ) +override_seg = x86_seg_es; break; case 0x64: /* FS override */ override_seg = x86_seg_fs; @@ -2845,8 +2848,9 @@ x86_decode( case 0x65: /* GS override */ override_seg = x86_seg_gs; break; -case 0x36: /* SS override */ -override_seg = x86_seg_ss; +case 0x36: /* SS override / null prefix in 64-bit mode */ +if ( !mode_64bit() ) +override_seg = x86_seg_ss; break; case 0xf0: /* LOCK */ lock_prefix = 1; @@ -2871,10 +2875,6 @@ x86_decode( } done_prefixes: -/* %{e,c,s,d}s overrides are ignored in 64bit mode. */ -if ( mode_64bit() && override_seg < x86_seg_fs ) -override_seg = x86_seg_none; - if ( rex_prefix & REX_W ) op_bytes = 8; ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 5/6] xen/tasklet: Return -ERESTART from continue_hypercall_on_cpu()
On 11/12/2019 07:41, Jan Beulich wrote: > On 10.12.2019 18:55, Andrew Cooper wrote: >> On 10/12/2019 08:55, Jan Beulich wrote: >>> On 09.12.2019 18:49, Andrew Cooper wrote: On 09/12/2019 16:52, Jan Beulich wrote: > On 05.12.2019 23:30, Andrew Cooper wrote: >> Some hypercalls tasklets want to create a continuation, rather than fail >> the >> hypercall with a hard error. By the time the tasklet is executing, it >> is too >> late to create the continuation, and even continue_hypercall_on_cpu() >> doesn't >> have enough state to do it correctly. > I think it would be quite nice if you made clear what piece of state > it is actually missing. To be honest, I don't recall anymore. How to correctly mutate the registers and/or memory (which is specific to the hypercall subop in some cases). >>> Well, in-memory arguments can be accessed as long as the mapping is >>> the right one (which it typically wouldn't be inside a tasklet). Do >>> existing continue_hypercall_on_cpu() users need this? Looking over >>> patch 4 again, I didn't think so. (Which isn't to say that removing >>> the latent issue is not a good thing.) >>> >>> In-register values can be changed as long as the respective exit >>> path will suitably pick up the value, which I thought was always >>> the case. >>> >>> Hence I'm afraid your single reply sentence didn't really clarify >>> matters. I'm sorry if this is just because of me being dense. >> How, physically, would you arrange for continue_hypercall_on_cpu() to >> make the requisite state adjustments? > You can't (at least not without having sufficient further context), > I agree. Yet ... > >> Yes - registers and memory can be accessed, but only the hypercall >> (sub?)op handler knows how to mutate them appropriately. >> >> You'd have to copy the mutation logic into continue_hypercall_on_cpu(), >> and pass in op/subops and a union of all pointers, *and* whatever >> intermediate state the subop handler needs. >> >> Or you can return -ERESTART and let the caller DTRT with the state it >> has in context, as it would in other cases requiring a continuation. > ... it continues to be unclear to me whether you're fixing an actual > bug here, or just a latent one. I'm not fixing any bug. I am making a fundamental change in behaviour, so tasklet context can use -ERESTART. Tasklet context doesn't even know what the primary hypercall index needs to be to correctly create a continuation. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH-for-4.13] SUPPORT.md: add core scheduling
Add core scheduling feature to SUPPORT.md. Signed-off-by: Juergen Gross --- SUPPORT.md | 8 1 file changed, 8 insertions(+) diff --git a/SUPPORT.md b/SUPPORT.md index 1cad7d6164..169b6f8fcf 100644 --- a/SUPPORT.md +++ b/SUPPORT.md @@ -255,6 +255,14 @@ of using different schedulers and scheduling properties. Status: Supported +### Core Scheduling + +Allows to group virtual cpus into virtual cores which are scheduled on the +physical cores. This results in never running different guests at the same +time on the same physical core. + +Status, x86: Experimental + ### Credit Scheduler A weighted proportional fair share virtual CPU scheduler. -- 2.16.4 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel