Re: [Xen-devel] REGRESSION: Xen 4.13 RC5 fails to bootstrap Dom0 on ARM
Hi Roman, On 19/12/2019 00:28, Roman Shaposhnik wrote: On Wed, Dec 18, 2019 at 2:17 PM Julien Grall wrote: Hi Roman, On 18/12/2019 17:03, Roman Shaposhnik wrote: On Wed, Dec 18, 2019 at 3:50 AM Julien Grall wrote: So -- nothing boots directly by UEFI -- everything goes through GRUB. However, my understanding is that GRUB will detect devicetree information provided by UEFI (even though devicetree command is supposed to completely replace that). Hence it is possible that Linux relies on some residuals left in memory by GRUB that Xen doesn't pay attention to (but this is a pretty wild speculation only). While it goes through GRUB, it is a bootloader and will just act as a proxy for EFI. So EFI application such as Xen/Linux can still be loaded and take advantage of runtime servies if present/implemented. Aha! So then it depends on Xen actually using those EFI services. Which leads to my first question: 1. would it be possible to stay completely with just devicetrees information by passing efi=no-rs to Xen? This will only disabled the runtime services (note that they are not supported on Xen on Arm today). What I described above is part of the boot services and can't be disabled. Also, I am not entirely sure GRUB/EFI will update you device-tree to point out the memory that was carved out for things like ATF. Looking at the DTS memory node you provided in another e-mail, it seems the memory map is slightly different. In fact most of people on Arm are using GRUB rather than EFI directly as this is more friendly to use. Regarding the devicetree, Xen and Linux will completely ignore the memory nodes in Xen if using EFI. This because the EFI memory map will give you an overview of the platform with the EFI regions included. Aha! So in that sense it is a bug in Xen after all, right? (that's what you're referring to when you say you now understand what needs to get fixed). Yes. The EFI memory map is a list of existing memory with a type associated to it (Conventional, BootServiceCodes, MemoryMappedIO...). The OS/Hypervisor will have to go through them and check which regions are usuable. Compare to Linux, Xen has limited itself to only a few types. However, I think we can be on a par with Linux here. Cheers, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2 19/20] x86/mem_sharing: reset a fork
Hi Tamas, On 19/12/2019 00:15, Tamas K Lengyel wrote: On Wed, Dec 18, 2019 at 4:02 PM Julien Grall wrote: Hi, On 18/12/2019 22:33, Tamas K Lengyel wrote: On Wed, Dec 18, 2019 at 3:00 PM Julien Grall wrote: Hi Tamas, On 18/12/2019 19:40, Tamas K Lengyel wrote: Implement hypercall that allows a fork to shed all memory that got allocated for it during its execution and re-load its vCPU context from the parent VM. This allows the forked VM to reset into the same state the parent VM is in a faster way then creating a new fork would be. Measurements show about a 2x speedup during normal fuzzing operations. Performance may vary depending how much memory got allocated for the forked VM. If it has been completely deduplicated from the parent VM then creating a new fork would likely be more performant. Signed-off-by: Tamas K Lengyel --- xen/arch/x86/mm/mem_sharing.c | 105 ++ xen/include/public/memory.h | 1 + 2 files changed, 106 insertions(+) diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c index e93ad2ec5a..4735a334b9 100644 --- a/xen/arch/x86/mm/mem_sharing.c +++ b/xen/arch/x86/mm/mem_sharing.c @@ -1622,6 +1622,87 @@ static int mem_sharing_fork(struct domain *d, struct domain *cd) return 0; } +struct gfn_free; +struct gfn_free { +struct gfn_free *next; +struct page_info *page; +gfn_t gfn; +}; + +static int mem_sharing_fork_reset(struct domain *d, struct domain *cd) +{ +int rc; + +struct p2m_domain* p2m = p2m_get_hostp2m(cd); +struct gfn_free *list = NULL; +struct page_info *page; + +page_list_for_each(page, >page_list) AFAICT, your domain is not paused, so it would be possible to have page added/remove in that list behind your back. Well, it's not that it's not paused, it's just that I haven't added a sanity check to make sure it is. The toolstack can (and should) pause it, so that sanity check would be warranted. I have only read the hypervisor part, so I didn't know what the toolstack has done. I've added the same enforced VM paused operation that is present for the fork hypercall handler. You also have multiple loop on the page_list in this function. Given the number of page_list can be quite big, this is a call for hogging the pCPU and an RCU lock on the domain vCPU running this call. There is just one loop over page_list itself, the second loop is on the internal list that is being built here which will be a subset. The list itself in fact should be small (in our tests usually <100). For a first, nothing in this function tells me that there will be only 100 pages. But then, I don't think this is right to implement your hypercall based only the "normal" scenario. You should also think about the "worst" case scenario. In this case the worst case scenario is have hundreds of page in page_list. Well, this is only an experimental system that's completely disabled by default. Making the assumption that people who make use of it will know what they are doing I think is fair. I assume that if you submit to upstream this new hypercall then there is longer plan to have more people to use it and potentially making "stable". If not, then it raises the question why this is pushed upstream... In any case, all the known assumptions should be documented so they can be fixed rather than forgotten until it is rediscovered via an XSA. Granted the list can grow larger, but in those cases its likely better to just discard the fork and create a new one. So in my opinion adding a hypercall continuation to this not needed How would the caller know it? What would happen if the caller ends up to call this with a growing list. The caller knows by virtue of knowing how long the VM was executed for. In the usecase this is targeted at the VM was executing only for a couple seconds at most. Usually much less then that (we get about ~80 resets/s with AFL). During that time its extremely unlikely you get more then a ~100 pages deduplicated (that is, written to). But even if there are more pages, it just means the hypercall might take a bit longer to run for that iteration. I assume if you upstream the code then you want more people to use it (otherwise what's the point?). In this case, you will likely have people that heard about the feature, wants to test but don't know the internal. Such users need to know how this can be call safely without reading the implementation. In other words, some documentation for your hypercall is needed. I don't see any issue with not breaking up this hypercall with continuation even under the worst case situation though. Xen only supports voluntary preemption, this means that an hypercall can only be preempted if there is code for it. Otherwise the preemption will mostly only happen when returning to the guest. In other words, the vCPU executing the hypercall may go past its timeslice and prevent other vCPU to run.
[Xen-devel] [PATCH v3 1/2] xen: put more code under CONFIG_CRASH_DEBUG
Some code is not needed with CONFIG_CRASH_DEBUG, so only include it if CONFIG_CRASH_DEBUG is defined. While at it remove CONFIG_HAS_GDBSX as it can easily be replaced by CONFIG_CRASH_DEBUG. Signed-off-by: Juergen Gross --- V3: - move domain_pause_for_debugger() into arch/x86/domain.c (Andrew Cooper) --- xen/arch/x86/Kconfig| 1 - xen/arch/x86/domain.c | 13 + xen/arch/x86/hvm/vmx/realmode.c | 1 + xen/common/Kconfig | 3 --- xen/common/domain.c | 14 -- xen/include/asm-x86/debugger.h | 32 xen/include/xen/sched.h | 1 - 7 files changed, 34 insertions(+), 31 deletions(-) diff --git a/xen/arch/x86/Kconfig b/xen/arch/x86/Kconfig index 02bb05f42e..f853c04564 100644 --- a/xen/arch/x86/Kconfig +++ b/xen/arch/x86/Kconfig @@ -13,7 +13,6 @@ config X86 select HAS_EHCI select HAS_EX_TABLE select HAS_FAST_MULTIPLY - select HAS_GDBSX select HAS_IOPORTS select HAS_KEXEC select MEM_ACCESS_ALWAYS_ON diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c index 7cb7fd31dd..3a3fbde642 100644 --- a/xen/arch/x86/domain.c +++ b/xen/arch/x86/domain.c @@ -2318,6 +2318,19 @@ static int __init init_vcpu_kick_softirq(void) } __initcall(init_vcpu_kick_softirq); +void domain_pause_for_debugger(void) +{ +#ifdef CONFIG_CRASH_DEBUG +struct vcpu *curr = current; +struct domain *d = curr->domain; + +domain_pause_by_systemcontroller_nosync(d); + +/* if gdbsx active, we just need to pause the domain */ +if ( curr->arch.gdbsx_vcpu_event == 0 ) +send_global_virq(VIRQ_DEBUGGER); +#endif +} /* * Local variables: diff --git a/xen/arch/x86/hvm/vmx/realmode.c b/xen/arch/x86/hvm/vmx/realmode.c index bb0b4439df..bdbd9cb921 100644 --- a/xen/arch/x86/hvm/vmx/realmode.c +++ b/xen/arch/x86/hvm/vmx/realmode.c @@ -14,6 +14,7 @@ #include #include #include +#include #include #include #include diff --git a/xen/common/Kconfig b/xen/common/Kconfig index 2f516da101..b3d161d057 100644 --- a/xen/common/Kconfig +++ b/xen/common/Kconfig @@ -57,9 +57,6 @@ config HAS_UBSAN config HAS_KEXEC bool -config HAS_GDBSX - bool - config HAS_IOPORTS bool diff --git a/xen/common/domain.c b/xen/common/domain.c index 66c7fc..3a77d717db 100644 --- a/xen/common/domain.c +++ b/xen/common/domain.c @@ -915,20 +915,6 @@ void vcpu_end_shutdown_deferral(struct vcpu *v) vcpu_check_shutdown(v); } -#ifdef CONFIG_HAS_GDBSX -void domain_pause_for_debugger(void) -{ -struct vcpu *curr = current; -struct domain *d = curr->domain; - -domain_pause_by_systemcontroller_nosync(d); - -/* if gdbsx active, we just need to pause the domain */ -if ( curr->arch.gdbsx_vcpu_event == 0 ) -send_global_virq(VIRQ_DEBUGGER); -} -#endif - /* Complete domain destroy after RCU readers are not holding old references. */ static void complete_domain_destroy(struct rcu_head *head) { diff --git a/xen/include/asm-x86/debugger.h b/xen/include/asm-x86/debugger.h index b1b627f1fa..f58726daec 100644 --- a/xen/include/asm-x86/debugger.h +++ b/xen/include/asm-x86/debugger.h @@ -33,6 +33,8 @@ #include #include +void domain_pause_for_debugger(void); + #ifdef CONFIG_CRASH_DEBUG #include @@ -47,18 +49,6 @@ static inline bool debugger_trap_fatal( /* Int3 is a trivial way to gather cpu_user_regs context. */ #define debugger_trap_immediate() __asm__ __volatile__ ( "int3" ); -#else - -static inline bool debugger_trap_fatal( -unsigned int vector, struct cpu_user_regs *regs) -{ -return false; -} - -#define debugger_trap_immediate() ((void)0) - -#endif - static inline bool debugger_trap_entry( unsigned int vector, struct cpu_user_regs *regs) { @@ -84,6 +74,24 @@ static inline bool debugger_trap_entry( return false; } +#else + +static inline bool debugger_trap_fatal( +unsigned int vector, struct cpu_user_regs *regs) +{ +return false; +} + +#define debugger_trap_immediate() ((void)0) + +static inline bool debugger_trap_entry( +unsigned int vector, struct cpu_user_regs *regs) +{ +return false; +} + +#endif + unsigned int dbg_rw_mem(void * __user addr, void * __user buf, unsigned int len, domid_t domid, bool toaddr, uint64_t pgd3); diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h index 9f7bc69293..0b41e936d5 100644 --- a/xen/include/xen/sched.h +++ b/xen/include/xen/sched.h @@ -652,7 +652,6 @@ void domain_destroy(struct domain *d); int domain_kill(struct domain *d); int domain_shutdown(struct domain *d, u8 reason); void domain_resume(struct domain *d); -void domain_pause_for_debugger(void); int domain_soft_reset(struct domain *d); -- 2.16.4 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v3 0/2] xen: make more debugger support code conditional
Support for debugging the hypervisor of guests via gdb/gdbsx should be configurable. Changes in V3: - remove possibility to access hypervisor memory via gdbsx domctl - default gdbsx support to on - some code moving Changes in V2: - split support for gdbstub and gdbsx (Andrew Cooper) Juergen Gross (2): xen: put more code under CONFIG_CRASH_DEBUG xen: make gdbsx support configurable xen/Kconfig.debug | 8 + xen/arch/x86/Kconfig| 1 - xen/arch/x86/Makefile | 2 +- xen/arch/x86/debug.c| 78 + xen/arch/x86/domain.c | 13 +++ xen/arch/x86/domctl.c | 4 +++ xen/arch/x86/hvm/vmx/realmode.c | 1 + xen/common/Kconfig | 3 -- xen/common/domain.c | 14 xen/include/asm-x86/debugger.h | 34 +++--- xen/include/xen/sched.h | 1 - 11 files changed, 58 insertions(+), 101 deletions(-) -- 2.16.4 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [qemu-mainline test] 144940: regressions - FAIL
flight 144940 qemu-mainline real [real] http://logs.test-lab.xenproject.org/osstest/logs/144940/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-i386-freebsd10-i386 14 guest-saverestore fail REGR. vs. 144861 test-amd64-i386-freebsd10-amd64 14 guest-saverestore fail REGR. vs. 144861 test-amd64-amd64-xl-qemuu-debianhvm-amd64 13 guest-saverestore fail REGR. vs. 144861 test-amd64-amd64-xl-qemuu-win7-amd64 13 guest-saverestore fail REGR. vs. 144861 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 guest-saverestore fail REGR. vs. 144861 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 guest-saverestore fail REGR. vs. 144861 test-amd64-amd64-xl-qemuu-debianhvm-i386-xsm 13 guest-saverestore fail REGR. vs. 144861 test-amd64-i386-xl-qemuu-debianhvm-amd64-shadow 13 guest-saverestore fail REGR. vs. 144861 test-amd64-amd64-xl-qemuu-debianhvm-amd64-shadow 13 guest-saverestore fail REGR. vs. 144861 test-amd64-amd64-xl-qemuu-ovmf-amd64 13 guest-saverestore fail REGR. vs. 144861 test-amd64-i386-xl-qemuu-debianhvm-amd64 13 guest-saverestore fail REGR. vs. 144861 test-amd64-i386-xl-qemuu-ovmf-amd64 13 guest-saverestore fail REGR. vs. 144861 test-amd64-i386-xl-qemuu-debianhvm-i386-xsm 13 guest-saverestore fail REGR. vs. 144861 test-amd64-i386-xl-qemuu-win7-amd64 13 guest-saverestore fail REGR. vs. 144861 test-amd64-i386-xl-qemuu-ws16-amd64 13 guest-saverestore fail REGR. vs. 144861 test-amd64-amd64-xl-qemuu-ws16-amd64 13 guest-saverestore fail REGR. vs. 144861 Regressions which are regarded as allowable (not blocking): test-armhf-armhf-xl-rtds16 guest-start/debian.repeat fail REGR. vs. 144861 Tests which did not succeed, but are not blocking: test-amd64-amd64-xl-rtds 18 guest-localmigrate/x10 fail like 144861 test-armhf-armhf-libvirt 14 saverestore-support-checkfail like 144861 test-armhf-armhf-libvirt-raw 13 saverestore-support-checkfail like 144861 test-amd64-amd64-libvirt 13 migrate-support-checkfail never pass test-amd64-i386-libvirt 13 migrate-support-checkfail never pass test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail never pass test-arm64-arm64-xl-seattle 13 migrate-support-checkfail never pass test-arm64-arm64-xl-seattle 14 saverestore-support-checkfail never pass test-amd64-i386-libvirt-xsm 13 migrate-support-checkfail never pass test-amd64-i386-xl-pvshim12 guest-start fail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-arm64-arm64-xl-xsm 13 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 14 saverestore-support-checkfail never pass test-arm64-arm64-xl 13 migrate-support-checkfail never pass test-arm64-arm64-xl 14 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit2 13 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 14 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit1 13 migrate-support-checkfail never pass test-arm64-arm64-xl-credit1 14 saverestore-support-checkfail never pass test-arm64-arm64-xl-thunderx 13 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 14 saverestore-support-checkfail never pass test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2 fail never pass test-arm64-arm64-libvirt-xsm 13 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 14 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-vhd 12 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 13 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-multivcpu 13 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit2 13 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-cubietruck 13 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit1 13 migrate-support-checkfail never pass test-armhf-armhf-xl-credit1 14 saverestore-support-checkfail never pass test-armhf-armhf-libvirt 13 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 13 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 14 saverestore-support-checkfail never pass test-armhf-armhf-xl 13 migrate-support-checkfail never pass
[Xen-devel] [ovmf test] 144957: all pass - PUSHED
flight 144957 ovmf real [real] http://logs.test-lab.xenproject.org/osstest/logs/144957/ Perfect :-) All tests in this flight passed as required version targeted for testing: ovmf c7a0aca0ed0e9b51efe0c437ff77b30cf1457f8a baseline version: ovmf 01b6090b75922bc72604c334bd3dc331490af3bb Last test of basis 144927 2019-12-18 09:10:04 Z0 days Testing same since 144957 2019-12-19 04:17:39 Z0 days1 attempts People who touched revisions under test: Jiewen Yao jobs: build-amd64-xsm pass build-i386-xsm pass build-amd64 pass build-i386 pass build-amd64-libvirt pass build-i386-libvirt pass build-amd64-pvopspass build-i386-pvops pass test-amd64-amd64-xl-qemuu-ovmf-amd64 pass test-amd64-i386-xl-qemuu-ovmf-amd64 pass sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Pushing revision : To xenbits.xen.org:/home/xen/git/osstest/ovmf.git 01b6090b75..c7a0aca0ed c7a0aca0ed0e9b51efe0c437ff77b30cf1457f8a -> xen-tested-master ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [xen-unstable test] 144936: tolerable FAIL - PUSHED
flight 144936 xen-unstable real [real] http://logs.test-lab.xenproject.org/osstest/logs/144936/ Failures :-/ but no regressions. Regressions which are regarded as allowable (not blocking): test-amd64-amd64-xl-rtds 16 guest-localmigrate fail REGR. vs. 144905 test-armhf-armhf-xl-rtds16 guest-start/debian.repeat fail REGR. vs. 144905 Tests which did not succeed, but are not blocking: test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stopfail like 144905 test-armhf-armhf-libvirt 14 saverestore-support-checkfail like 144905 test-amd64-amd64-xl-qemut-win7-amd64 17 guest-stopfail like 144905 test-amd64-i386-xl-qemut-win7-amd64 17 guest-stop fail like 144905 test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop fail like 144905 test-armhf-armhf-libvirt-raw 13 saverestore-support-checkfail like 144905 test-amd64-amd64-xl-qemuu-ws16-amd64 17 guest-stopfail like 144905 test-amd64-amd64-xl-qemut-ws16-amd64 17 guest-stopfail like 144905 test-amd64-i386-xl-qemuu-ws16-amd64 17 guest-stop fail like 144905 test-amd64-i386-xl-pvshim12 guest-start fail never pass test-amd64-amd64-libvirt 13 migrate-support-checkfail never pass test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail never pass test-amd64-i386-libvirt-xsm 13 migrate-support-checkfail never pass test-amd64-i386-libvirt 13 migrate-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-arm64-arm64-xl-credit2 13 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 13 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 14 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit1 13 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 14 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit1 14 saverestore-support-checkfail never pass test-arm64-arm64-xl 13 migrate-support-checkfail never pass test-arm64-arm64-xl 14 saverestore-support-checkfail never pass test-arm64-arm64-libvirt-xsm 13 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 14 saverestore-support-checkfail never pass test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2 fail never pass test-armhf-armhf-xl-arndale 13 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 14 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-vhd 12 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 13 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 14 saverestore-support-checkfail never pass test-armhf-armhf-xl 13 migrate-support-checkfail never pass test-armhf-armhf-xl 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-rtds 13 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-multivcpu 13 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-cubietruck 13 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 14 saverestore-support-checkfail never pass test-armhf-armhf-libvirt 13 migrate-support-checkfail never pass test-arm64-arm64-xl-seattle 13 migrate-support-checkfail never pass test-arm64-arm64-xl-seattle 14 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-raw 12 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 12 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 13 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit1 13 migrate-support-checkfail never pass test-armhf-armhf-xl-credit1 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit2 13 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 14 saverestore-support-checkfail never pass test-amd64-i386-xl-qemut-ws16-amd64 17 guest-stop fail never pass version targeted for testing: xen 0e7c69bd3c0b35a677d73843b39522787ccf5a3f baseline version: xen f50a4f6e244cfc8e773300c03aaf4db391f3028a Last test of basis 144905 2019-12-17 18:36:21 Z1 days Failing since144924 2019-12-18 06:43:35 Z0 days2 attempts Testing same since 144936 2019-12-18 16:07:31 Z0 days1 attempts People who touched revisions under
Re: [Xen-devel] [PATCH v1] xen-pciback: optionally allow interrupt enable flag writes
On Tue, Dec 03, 2019 at 04:17:33PM +0100, Roger Pau Monné wrote: > On Tue, Dec 03, 2019 at 06:41:56AM +0100, Marek Marczykowski-Górecki wrote: > > QEMU running in a stubdom needs to be able to set INTX_DISABLE, and the > > MSI(-X) enable flags in the PCI config space. This adds an attribute > > 'allow_interrupt_control' which when set for a PCI device allows writes > > to this flag(s). The toolstack will need to set this for stubdoms. > > When enabled, guest (stubdomain) will be allowed to set relevant enable > > flags, but only one at a time - i.e. it refuses to enable more than one > > of INTx, MSI, MSI-X at a time. > > > > This functionality is needed only for config space access done by device > > model (stubdomain) serving a HVM with the actual PCI device. It is not > > necessary and unsafe to enable direct access to those bits for PV domain > > with the device attached. For PV domains, there are separate protocol > > messages (XEN_PCI_OP_{enable,disable}_{msi,msix}) for this purpose. > > Those ops in addition to setting enable bits, also configure MSI(-X) in > > dom0 kernel - which is undesirable for PCI passthrough to HVM guests. > > > > This should not introduce any new security issues since a malicious > > guest (or stubdom) can already generate MSIs through other ways, see > > [1] page 8. Additionally, when qemu runs in dom0, it already have direct > > access to those bits. > > > > This is the second iteration of this feature. First was proposed as a > > direct Xen interface through a new hypercall, but ultimately it was > > rejected by the maintainer, because of mixing pciback and hypercalls for > > PCI config space access isn't a good design. Full discussion at [2]. > > > > [1]: > > https://invisiblethingslab.com/resources/2011/Software%20Attacks%20on%20Intel%20VT-d.pdf > > [2]: https://xen.markmail.org/thread/smpgpws4umdzizze > > > > [part of the commit message and sysfs handling] > > Signed-off-by: Simon Gaiser > > [the rest] > > Signed-off-by: Marek Marczykowski-Górecki > > --- > > I'm not very happy about code duplication regarding MSI/MSI-X/INTx > > exclusivity test, but I don't have better ideas how to structure it. Any > > suggestions? > > Can't you create a helper that returns the currently enabled interrupt > mode? > > I expect returning an enum (ie: NONE, INTX, MSI, MSIX) should be fine > since no two of those should be enabled at the same time. Done in v2 (plus ERR member). > > > --- > > .../xen/xen-pciback/conf_space_capability.c | 113 ++ > > drivers/xen/xen-pciback/conf_space_header.c | 30 + > > drivers/xen/xen-pciback/pci_stub.c| 66 ++ > > drivers/xen/xen-pciback/pciback.h | 1 + > > 4 files changed, 210 insertions(+) > > > > diff --git a/drivers/xen/xen-pciback/conf_space_capability.c > > b/drivers/xen/xen-pciback/conf_space_capability.c > > index e5694133ebe5..c5a7c58ff3e3 100644 > > --- a/drivers/xen/xen-pciback/conf_space_capability.c > > +++ b/drivers/xen/xen-pciback/conf_space_capability.c > > @@ -189,6 +189,109 @@ static const struct config_field caplist_pm[] = { > > {} > > }; > > > > +static struct msi_msix_field_config { > > + u16 enable_bit; /* bit for enabling MSI/MSI-X */ > > + int other_cap; /* the other capability for exclusiveness check */ > > Nit: just one space between the declaration and the comment IMO. > > Also capability ID is not a signed value, hence unsigned int would > feel more natural. Replaced with enum in v2. > > +} msi_field_config = { > > + .enable_bit = PCI_MSI_FLAGS_ENABLE, > > + .other_cap = PCI_CAP_ID_MSIX, > > +}, msix_field_config = { > > + .enable_bit = PCI_MSIX_FLAGS_ENABLE, > > + .other_cap = PCI_CAP_ID_MSI, > > +}; > > I think it would be more helpful to store the current capability ID > rather the one you need to check against. Then if you had a helper > that returns the currently enabled interrupt mode you would have to > check that either it's NONE or matches the capability requested to be > enabled. Done in v2. > > + > > +static void *msi_field_init(struct pci_dev *dev, int offset) > > +{ > > + return _field_config; > > +} > > + > > +static void *msix_field_init(struct pci_dev *dev, int offset) > > +{ > > + return _field_config; > > +} > > + > > +static int msi_msix_flags_write(struct pci_dev *dev, int offset, u16 > > new_value, > > +void *data) > > +{ > > + int err; > > + u16 old_value; > > + struct msi_msix_field_config *field_config = data; > > + struct xen_pcibk_dev_data *dev_data = pci_get_drvdata(dev); > > const for both the above. Done in v2. > > + int other_cap_offset; > > unsigned int Done in v2. > > + u16 other_cap_enable_bit; > > + u16 other_cap_value; > > + > > + if (xen_pcibk_permissive || dev_data->permissive) > > + goto write; > > + > > + err = pci_read_config_word(dev, offset, _value); > > + if (err) > > + return err; > > + > > + if (new_value
[Xen-devel] [PATCH v2] xen-pciback: optionally allow interrupt enable flag writes
QEMU running in a stubdom needs to be able to set INTX_DISABLE, and the MSI(-X) enable flags in the PCI config space. This adds an attribute 'allow_interrupt_control' which when set for a PCI device allows writes to this flag(s). The toolstack will need to set this for stubdoms. When enabled, guest (stubdomain) will be allowed to set relevant enable flags, but only one at a time - i.e. it refuses to enable more than one of INTx, MSI, MSI-X at a time. This functionality is needed only for config space access done by device model (stubdomain) serving a HVM with the actual PCI device. It is not necessary and unsafe to enable direct access to those bits for PV domain with the device attached. For PV domains, there are separate protocol messages (XEN_PCI_OP_{enable,disable}_{msi,msix}) for this purpose. Those ops in addition to setting enable bits, also configure MSI(-X) in dom0 kernel - which is undesirable for PCI passthrough to HVM guests. This should not introduce any new security issues since a malicious guest (or stubdom) can already generate MSIs through other ways, see [1] page 8. Additionally, when qemu runs in dom0, it already have direct access to those bits. This is the second iteration of this feature. First was proposed as a direct Xen interface through a new hypercall, but ultimately it was rejected by the maintainer, because of mixing pciback and hypercalls for PCI config space access isn't a good design. Full discussion at [2]. [1]: https://invisiblethingslab.com/resources/2011/Software%20Attacks%20on%20Intel%20VT-d.pdf [2]: https://xen.markmail.org/thread/smpgpws4umdzizze [part of the commit message and sysfs handling] Signed-off-by: Simon Gaiser [the rest] Signed-off-by: Marek Marczykowski-Górecki --- Changes in v2: - introduce xen_pcibk_get_interrupt_type() to deduplicate current INTx/MSI/MSI-X state check - fix checking MSI/MSI-X state on devices not supporting it --- drivers/xen/xen-pciback/conf_space.c | 35 drivers/xen/xen-pciback/conf_space.h | 10 +++ .../xen/xen-pciback/conf_space_capability.c | 88 +++ drivers/xen/xen-pciback/conf_space_header.c | 19 drivers/xen/xen-pciback/pci_stub.c| 66 ++ drivers/xen/xen-pciback/pciback.h | 1 + 6 files changed, 219 insertions(+) diff --git a/drivers/xen/xen-pciback/conf_space.c b/drivers/xen/xen-pciback/conf_space.c index 60111719b01f..10200a7a2da5 100644 --- a/drivers/xen/xen-pciback/conf_space.c +++ b/drivers/xen/xen-pciback/conf_space.c @@ -286,6 +286,41 @@ int xen_pcibk_config_write(struct pci_dev *dev, int offset, int size, u32 value) return xen_pcibios_err_to_errno(err); } +enum interrupt_type xen_pcibk_get_interrupt_type(struct pci_dev *dev) +{ + int err; + u16 val; + + err = pci_read_config_word(dev, PCI_COMMAND, ); + if (err) + return INTERRUPT_TYPE_ERR; + if (!(val & PCI_COMMAND_INTX_DISABLE)) + return INTERRUPT_TYPE_INTX; + + /* Do not trust dev->msi(x)_enabled here, as enabling could be done +* bypassing the pci_*msi* functions, by the qemu. +*/ + if (dev->msi_cap) { + err = pci_read_config_word(dev, + dev->msi_cap + PCI_MSI_FLAGS, + ); + if (err) + return INTERRUPT_TYPE_ERR; + if (val & PCI_MSI_FLAGS_ENABLE) + return INTERRUPT_TYPE_MSI; + } + if (dev->msix_cap) { + err = pci_read_config_word(dev, + dev->msix_cap + PCI_MSIX_FLAGS, + ); + if (err) + return INTERRUPT_TYPE_ERR; + if (val & PCI_MSIX_FLAGS_ENABLE) + return INTERRUPT_TYPE_MSIX; + } + return INTERRUPT_TYPE_NONE; +} + void xen_pcibk_config_free_dyn_fields(struct pci_dev *dev) { struct xen_pcibk_dev_data *dev_data = pci_get_drvdata(dev); diff --git a/drivers/xen/xen-pciback/conf_space.h b/drivers/xen/xen-pciback/conf_space.h index 22db630717ea..b6fff5161331 100644 --- a/drivers/xen/xen-pciback/conf_space.h +++ b/drivers/xen/xen-pciback/conf_space.h @@ -65,6 +65,14 @@ struct config_field_entry { void *data; }; +enum interrupt_type { +INTERRUPT_TYPE_ERR = -1, +INTERRUPT_TYPE_NONE, +INTERRUPT_TYPE_INTX, +INTERRUPT_TYPE_MSI, +INTERRUPT_TYPE_MSIX, +}; + extern bool xen_pcibk_permissive; #define OFFSET(cfg_entry) ((cfg_entry)->base_offset+(cfg_entry)->field->offset) @@ -126,4 +134,6 @@ int xen_pcibk_config_capability_init(void); int xen_pcibk_config_header_add_fields(struct pci_dev *dev); int xen_pcibk_config_capability_add_fields(struct pci_dev *dev); +enum interrupt_type xen_pcibk_get_interrupt_type(struct pci_dev *dev); + #endif /* __XEN_PCIBACK_CONF_SPACE_H__ */ diff --git
Re: [Xen-devel] [PATCH] [tools/hotplug] Use ip on systems where brctl is not available
On 2019-12-19 02:42, Ian Jackson wrote: Steven Haigh writes ("[PATCH] [tools/hotplug] Use ip on systems where brctl is not available"): Newer distros like CentOS 8 do not have brctl available. As such, we can't use it to configure networking anymore. This patch will fall back to 'ip' or 'bridge' commands if brctl is not available in the working PATH. This looks good to me at least in the brctl case. I have two minor comments. For the avoidance of doubt, I guess you have tested this in the `ip'/`bridge' case ? How thoroughly ? :-) I have tested it to the point that it's almost a port of the Fedora patch - however the Fedora patch removes brctl completely in favour of the ip / bridge commands. While I haven't specifically debugged the result on Fedora, the networking works successfully when running a Domain-0 in Fedora 31 - which was the source of the 'ip' commands to run. -if [ -z "$bridge" ] -then - bridge=$(brctl show | awk 'NR==2{print$1}') - +if [ -z "$bridge" ]; then The presumably-unintentional style change makes the review slightly harder... I'm intending to submit a new patch series after this (to make backporting this easier) that cleans up formatting / whitespace / syntax across the majority of scripts in the Linux directory. It'll look like a hot mess when submitting the next lot of patches - but its better than nothing. -bridge=$(brctl show | cut -d " +if which brctl >&/dev/null; then Maybe introduce have_brctl () { ... } so we can say if have_brctl; then ? I don't really have a preference. brctl is used through quite a few scripts - none of which really have a standard method of operation or common presentation. Some scripts call xen-network-common.sh - some do not. Would I be correct in thinking that your proposal would be to ensure all network scripts source xen-network-common.sh - but this would be a more invasive change for backporting - hence I've tried to keep it as simple as possible for now. Would a restructure of these things be better for something to be committed as yet another patch set (after formatting/style cleanups) that makes things a little more consistent? -- Steven Haigh ? net...@crc.id.au ? https://www.crc.id.au ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] REGRESSION: Xen 4.13 RC5 fails to bootstrap Dom0 on ARM
Hi Julien! First of all -- thank you so much for detailed explanations -- this is very much appreciated. A few questions still (if you don't mind): On Wed, Dec 18, 2019 at 2:17 PM Julien Grall wrote: > > Hi Roman, > > On 18/12/2019 17:03, Roman Shaposhnik wrote: > > On Wed, Dec 18, 2019 at 3:50 AM Julien Grall wrote: > > So -- nothing boots directly by UEFI -- everything goes through GRUB. > > > > However, my understanding is that GRUB will detect devicetree > > information provided by UEFI (even though devicetree command is > > supposed to completely replace that). Hence it is possible that Linux > > relies on some residuals left in memory by GRUB that Xen doesn't pay > > attention to (but this is a pretty wild speculation only). > > While it goes through GRUB, it is a bootloader and will just act as a > proxy for EFI. So EFI application such as Xen/Linux can still be loaded > and take advantage of runtime servies if present/implemented. Aha! So then it depends on Xen actually using those EFI services. Which leads to my first question: 1. would it be possible to stay completely with just devicetrees information by passing efi=no-rs to Xen? > In fact most of people on Arm are using GRUB rather than EFI directly as > this is more friendly to use. > > Regarding the devicetree, Xen and Linux will completely ignore the > memory nodes in Xen if using EFI. This because the EFI memory map will > give you an overview of the platform with the EFI regions included. Aha! So in that sense it is a bug in Xen after all, right? (that's what you're referring to when you say you now understand what needs to get fixed). Thanks, Roman. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2 19/20] x86/mem_sharing: reset a fork
On Wed, Dec 18, 2019 at 4:02 PM Julien Grall wrote: > > Hi, > > On 18/12/2019 22:33, Tamas K Lengyel wrote: > > On Wed, Dec 18, 2019 at 3:00 PM Julien Grall wrote: > >> > >> Hi Tamas, > >> > >> On 18/12/2019 19:40, Tamas K Lengyel wrote: > >>> Implement hypercall that allows a fork to shed all memory that got > >>> allocated > >>> for it during its execution and re-load its vCPU context from the parent > >>> VM. > >>> This allows the forked VM to reset into the same state the parent VM is > >>> in a > >>> faster way then creating a new fork would be. Measurements show about a 2x > >>> speedup during normal fuzzing operations. Performance may vary depending > >>> how > >>> much memory got allocated for the forked VM. If it has been completely > >>> deduplicated from the parent VM then creating a new fork would likely be > >>> more > >>> performant. > >>> > >>> Signed-off-by: Tamas K Lengyel > >>> --- > >>>xen/arch/x86/mm/mem_sharing.c | 105 ++ > >>>xen/include/public/memory.h | 1 + > >>>2 files changed, 106 insertions(+) > >>> > >>> diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c > >>> index e93ad2ec5a..4735a334b9 100644 > >>> --- a/xen/arch/x86/mm/mem_sharing.c > >>> +++ b/xen/arch/x86/mm/mem_sharing.c > >>> @@ -1622,6 +1622,87 @@ static int mem_sharing_fork(struct domain *d, > >>> struct domain *cd) > >>>return 0; > >>>} > >>> > >>> +struct gfn_free; > >>> +struct gfn_free { > >>> +struct gfn_free *next; > >>> +struct page_info *page; > >>> +gfn_t gfn; > >>> +}; > >>> + > >>> +static int mem_sharing_fork_reset(struct domain *d, struct domain *cd) > >>> +{ > >>> +int rc; > >>> + > >>> +struct p2m_domain* p2m = p2m_get_hostp2m(cd); > >>> +struct gfn_free *list = NULL; > >>> +struct page_info *page; > >>> + > >>> +page_list_for_each(page, >page_list) > >> > >> AFAICT, your domain is not paused, so it would be possible to have page > >> added/remove in that list behind your back. > > > > Well, it's not that it's not paused, it's just that I haven't added a > > sanity check to make sure it is. The toolstack can (and should) pause > > it, so that sanity check would be warranted. > I have only read the hypervisor part, so I didn't know what the > toolstack has done. I've added the same enforced VM paused operation that is present for the fork hypercall handler. > > > > >> > >> You also have multiple loop on the page_list in this function. Given the > >> number of page_list can be quite big, this is a call for hogging the > >> pCPU and an RCU lock on the domain vCPU running this call. > > > > There is just one loop over page_list itself, the second loop is on > > the internal list that is being built here which will be a subset. The > > list itself in fact should be small (in our tests usually <100). > > For a first, nothing in this function tells me that there will be only > 100 pages. But then, I don't think this is right to implement your > hypercall based only the "normal" scenario. You should also think about > the "worst" case scenario. > > In this case the worst case scenario is have hundreds of page in page_list. Well, this is only an experimental system that's completely disabled by default. Making the assumption that people who make use of it will know what they are doing I think is fair. > > > Granted the list can grow larger, but in those cases its likely better > > to just discard the fork and create a new one. So in my opinion adding > > a hypercall continuation to this not needed > > How would the caller know it? What would happen if the caller ends up to > call this with a growing list. The caller knows by virtue of knowing how long the VM was executed for. In the usecase this is targeted at the VM was executing only for a couple seconds at most. Usually much less then that (we get about ~80 resets/s with AFL). During that time its extremely unlikely you get more then a ~100 pages deduplicated (that is, written to). But even if there are more pages, it just means the hypercall might take a bit longer to run for that iteration. I don't see any issue with not breaking up this hypercall with continuation even under the worst case situation though. But if others feel that strongly as well about having to have continuation for this I don't really mind adding it. > > > > >> > >>> +{ > >>> +mfn_t mfn = page_to_mfn(page); > >>> +if ( mfn_valid(mfn) ) > >>> +{ > >>> +p2m_type_t p2mt; > >>> +p2m_access_t p2ma; > >>> +gfn_t gfn = mfn_to_gfn(cd, mfn); > >>> +mfn = __get_gfn_type_access(p2m, gfn_x(gfn), , , > >>> +0, NULL, false); > >>> +if ( p2m_is_ram(p2mt) ) > >>> +{ > >>> +struct gfn_free *gfn_free; > >>> +if ( !get_page(page, cd) ) > >>> +goto err_reset; > >>> + > >>> +
Re: [Xen-devel] [PATCH] arm64: xen: Use modern annotations for assembly functions
On Wed, 18 Dec 2019, Mark Brown wrote: > In an effort to clarify and simplify the annotation of assembly functions > in the kernel new macros have been introduced. These replace ENTRY and > ENDPROC. Update the annotations in the xen code to the new macros. > > Signed-off-by: Mark Brown > --- > > This is part of a wider effort to convert all the arch/arm64 code. > > arch/arm64/xen/hypercall.S | 8 > 1 file changed, 4 insertions(+), 4 deletions(-) > > diff --git a/arch/arm64/xen/hypercall.S b/arch/arm64/xen/hypercall.S > index c5f05c4a4d00..305c2274b8eb 100644 > --- a/arch/arm64/xen/hypercall.S > +++ b/arch/arm64/xen/hypercall.S > @@ -56,11 +56,11 @@ > #define XEN_IMM 0xEA1 > > #define HYPERCALL_SIMPLE(hypercall) \ > -ENTRY(HYPERVISOR_##hypercall)\ > +SYM_FUNC_START(HYPERVISOR_##hypercall) \ Could you please adjust the tabs so that the '\' is aligned with the others? With that change: Reviewed-by: Stefano Stabellini > mov x16, #__HYPERVISOR_##hypercall; \ > hvc XEN_IMM;\ > ret;\ > -ENDPROC(HYPERVISOR_##hypercall) > +SYM_FUNC_END(HYPERVISOR_##hypercall) > > #define HYPERCALL0 HYPERCALL_SIMPLE > #define HYPERCALL1 HYPERCALL_SIMPLE > @@ -86,7 +86,7 @@ HYPERCALL2(multicall); > HYPERCALL2(vm_assist); > HYPERCALL3(dm_op); > > -ENTRY(privcmd_call) > +SYM_FUNC_START(privcmd_call) > mov x16, x0 > mov x0, x1 > mov x1, x2 > @@ -109,4 +109,4 @@ ENTRY(privcmd_call) >*/ > uaccess_ttbr0_disable x6, x7 > ret > -ENDPROC(privcmd_call); > +SYM_FUNC_END(privcmd_call); > -- > 2.20.1 > ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2 19/20] x86/mem_sharing: reset a fork
Hi, On 18/12/2019 22:33, Tamas K Lengyel wrote: On Wed, Dec 18, 2019 at 3:00 PM Julien Grall wrote: Hi Tamas, On 18/12/2019 19:40, Tamas K Lengyel wrote: Implement hypercall that allows a fork to shed all memory that got allocated for it during its execution and re-load its vCPU context from the parent VM. This allows the forked VM to reset into the same state the parent VM is in a faster way then creating a new fork would be. Measurements show about a 2x speedup during normal fuzzing operations. Performance may vary depending how much memory got allocated for the forked VM. If it has been completely deduplicated from the parent VM then creating a new fork would likely be more performant. Signed-off-by: Tamas K Lengyel --- xen/arch/x86/mm/mem_sharing.c | 105 ++ xen/include/public/memory.h | 1 + 2 files changed, 106 insertions(+) diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c index e93ad2ec5a..4735a334b9 100644 --- a/xen/arch/x86/mm/mem_sharing.c +++ b/xen/arch/x86/mm/mem_sharing.c @@ -1622,6 +1622,87 @@ static int mem_sharing_fork(struct domain *d, struct domain *cd) return 0; } +struct gfn_free; +struct gfn_free { +struct gfn_free *next; +struct page_info *page; +gfn_t gfn; +}; + +static int mem_sharing_fork_reset(struct domain *d, struct domain *cd) +{ +int rc; + +struct p2m_domain* p2m = p2m_get_hostp2m(cd); +struct gfn_free *list = NULL; +struct page_info *page; + +page_list_for_each(page, >page_list) AFAICT, your domain is not paused, so it would be possible to have page added/remove in that list behind your back. Well, it's not that it's not paused, it's just that I haven't added a sanity check to make sure it is. The toolstack can (and should) pause it, so that sanity check would be warranted. I have only read the hypervisor part, so I didn't know what the toolstack has done. You also have multiple loop on the page_list in this function. Given the number of page_list can be quite big, this is a call for hogging the pCPU and an RCU lock on the domain vCPU running this call. There is just one loop over page_list itself, the second loop is on the internal list that is being built here which will be a subset. The list itself in fact should be small (in our tests usually <100). For a first, nothing in this function tells me that there will be only 100 pages. But then, I don't think this is right to implement your hypercall based only the "normal" scenario. You should also think about the "worst" case scenario. In this case the worst case scenario is have hundreds of page in page_list. Granted the list can grow larger, but in those cases its likely better to just discard the fork and create a new one. So in my opinion adding a hypercall continuation to this not needed How would the caller know it? What would happen if the caller ends up to call this with a growing list. +{ +mfn_t mfn = page_to_mfn(page); +if ( mfn_valid(mfn) ) +{ +p2m_type_t p2mt; +p2m_access_t p2ma; +gfn_t gfn = mfn_to_gfn(cd, mfn); +mfn = __get_gfn_type_access(p2m, gfn_x(gfn), , , +0, NULL, false); +if ( p2m_is_ram(p2mt) ) +{ +struct gfn_free *gfn_free; +if ( !get_page(page, cd) ) +goto err_reset; + +/* + * We can't free the page while iterating over the page_list + * so we build a separate list to loop over. + * + * We want to iterate over the page_list instead of checking + * gfn from 0 to max_gfn because this is ~10x faster. + */ +gfn_free = xmalloc(struct gfn_free); If I did the math right, for a 4G guest this will require at ~24MB of memory. Actually, is it really necessary to do the allocation for a short period of time? If you have a fully deduplicated fork then you should not be using this function to begin with. You get better performance my throwing that one away and creating a new one. How a user knows when/how this can be called? But then, as said above, this may be called by mistake... So I still think you need to be prepare for the worst case. As for using xmalloc here, I'm not sure what other way I have to build a list of pages that need to be freed. I can't free the page itself while I'm iterating on page_list (that I'm aware of). The only other option available is calling __get_gfn_type_access with gfn=0..max_gfn which will be extremely slow because you have to loop over a lot of holes. You can use page_list_for_each_safe(). This is already used by function such as relinquish_memory(). What are you trying to achieve by iterating twice on the GFN? Wouldn't it be easier to pause the domain? I'm not sure what you mean, where do
Re: [Xen-devel] [PATCH] tools/python: Python 3 compatibility
On Wed, Dec 18, 2019 at 10:32:47PM +, Andrew Cooper wrote: > On 18/12/2019 22:26, Marek Marczykowski-Górecki wrote: > >> @@ -70,7 +73,7 @@ class VM(object): > >> > >> # libxl > >> self.libxl = fmt == "libxl" > >> -self.emu_xenstore = "" # NUL terminated key pairs from > >> "toolstack" records > >> +self.emu_xenstore = b"" # NUL terminated key pairs from > >> "toolstack" records > >> > >> def write_libxc_ihdr(): > >> stream_write(pack(libxc.IHDR_FORMAT, > > You also need to update write_record (string constants). > > And few calls to it with string constants (write_libxl_end, > > write_libxl_libxc_context, read_pv_tail, read_hvm_tail). > > And blkid == ... in read_pv_extended_info(). > > Urgh - well spotted. > > Was this manual inspection, or something else? Manual search for " and '. > (I probably should > complete and upstream write-legacy-stream for the purpose of dev-testing > the convert-legacy-stream script now that 4.6 is waaay in the past.) > > ~Andrew -- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? signature.asc Description: PGP signature ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] REGRESSION: Xen 4.13 RC5 fails to bootstrap Dom0 on ARM
Hi, On 18/12/2019 17:09, Roman Shaposhnik wrote: Hi, On Wed, Dec 18, 2019 at 4:56 AM Julien Grall wrote: So that is, in fact, my first question -- why is Xen not showing available memory in xl info? I am not entirely sure what exact information you want. The output you dumped above contain the available memory for the memory (see "free_memory"). Are you looking from something different? Just to be clear: I was giving 2G via devicetrees (the same device trees that would make Linux detect 2G of RAM) hence I was expecting xl info to show that. Instead I only got 1120M shown by xl info. On 18/12/2019 00:04, Roman Shaposhnik wrote: memory { device_type = "memory"; reg = <0x0 0x0 0x0 0x5e0 0x0 0x5f0 0x0 0x1000 0x0 0x5f02000 0x0 0xefd000 0x0 0x6e0 0x0 0x60f000 0x0 0x741 0x0 0x1aaf 0x0 0x21f0 0x0 0x10 0x0 0x2200 0x0 0x1c00>; }; reserved-memory { ranges; #size-cells = <0x2>; #address-cells = <0x2>; ramoops@21f0 { ftrace-size = <0x2>; console-size = <0x2>; reg = <0x0 0x21f0 0x0 0x10>; record-size = <0x2>; compatible = "ramoops"; }; linux,cma { linux,cma-default; reusable; size = <0x0 0x800>; compatible = "shared-dma-pool"; }; }; If you look at the REG -- it does now add up to 2Gb, but booting Xen with it has exactly the same effect as booting it with: reg = <0x0 0x0 0x0 0x8000>;\ If you boot Xen using EFI, the memory information wil come from EFI and the DT node will be ignored. So unless UEFI is able to pick up the modification of the DT memory node, modifying the DT is not going to affect anything. That's a good point, but given that I always go through GRUB, I was expecting devicetree command to completely overshadow whatever information UEFI may have. Am I wrong? GRUB will load Xen/Linux as an EFI application. Both of them will ignore the memory nodes when booting using EFI. For more details, see the answer I wrote separately. I am attaching a full log, and I see the following in the logs: (XEN) Allocating 1:1 mappings totalling 720MB for dom0: (XEN) BANK[0] 0x000800-0x001c00 (320MB) (XEN) BANK[1] 0x004000-0x005800 (384MB) (XEN) BANK[2] 0x007b00-0x007c00 (16MB) Which sort of makes sense, I guess -- but I still don't understand where all these ranges are coming from and how come Xen doesn't see the full 2Gb even with various devicetrees I tried. The range aboves describe the memory range given to Dom0. For all the memory given to Xen,m you want to look at the top of your log: (XEN) Checking for initrd in /chosen (XEN) RAM: - 05df (XEN) RAM: 05f0 - 06dfefff (XEN) RAM: 06e0 - 0740efff (XEN) RAM: 0741 - 1db8dfff (XEN) RAM: 350f - 3dbd2fff (XEN) RAM: 3dbd3000 - 3dff (XEN) RAM: 4000 - 5a653fff (XEN) RAM: 7ada - 7ada3fff (XEN) RAM: 7aea8000 - 7afa9fff (XEN) RAM: 7afaa000 - 7ec73fff (XEN) RAM: 7ec74000 - 7fdddfff (XEN) RAM: 7fdde000 - 7fea5fff (XEN) RAM: 7fea6000 - 7ff6dfff (XEN) RAM: 7000 - 7fff Looking at the differences with the Linux logs, there is indeed some memory not detected by Xen. On Xen, we only consider usuable memory any EFI description with EfiConventionalMemory, EfiBootServicesCode and EfiBootServicesData. Linux include more type here, so this may explain why we see a difference. While Looking at it, I have also noticed that we don't seem to care about the memory attribute. I suspect this could be another latent issue in Xen if the attribute does not match. Anything I can do to help debug this? I can run any kind of debug builds, etc. if needed. Thank you for the offer, I think I have a good understanding of the problem now. So debug should not be necessary. However, I would appreciate if anyone could help to write a patch for it. I mean -- at this point it would be really great to get HiKey back to the status of Xen-on-ARM developer board. Any ideas here would be greatly apprecaited! Thanks, Roman. P.S. Any guess at what these mean? (XEN) traps.c:1973:d0v0 HSR=0x93880006 pc=0x008738 gva=0x872f2000 gpa=0x0f (XEN) traps.c:1973:d0v0 HSR=0x93880006 pc=0x00b734e558 gva=0xb72eb000 gpa=0x0f (XEN) traps.c:1973:d0v0 HSR=0x93880006 pc=0x008f9d2558 gva=0x8f96f000 gpa=0x0f It means
Re: [Xen-devel] [PATCH v2 19/20] x86/mem_sharing: reset a fork
On Wed, Dec 18, 2019 at 3:00 PM Julien Grall wrote: > > Hi Tamas, > > On 18/12/2019 19:40, Tamas K Lengyel wrote: > > Implement hypercall that allows a fork to shed all memory that got allocated > > for it during its execution and re-load its vCPU context from the parent VM. > > This allows the forked VM to reset into the same state the parent VM is in a > > faster way then creating a new fork would be. Measurements show about a 2x > > speedup during normal fuzzing operations. Performance may vary depending how > > much memory got allocated for the forked VM. If it has been completely > > deduplicated from the parent VM then creating a new fork would likely be > > more > > performant. > > > > Signed-off-by: Tamas K Lengyel > > --- > > xen/arch/x86/mm/mem_sharing.c | 105 ++ > > xen/include/public/memory.h | 1 + > > 2 files changed, 106 insertions(+) > > > > diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c > > index e93ad2ec5a..4735a334b9 100644 > > --- a/xen/arch/x86/mm/mem_sharing.c > > +++ b/xen/arch/x86/mm/mem_sharing.c > > @@ -1622,6 +1622,87 @@ static int mem_sharing_fork(struct domain *d, struct > > domain *cd) > > return 0; > > } > > > > +struct gfn_free; > > +struct gfn_free { > > +struct gfn_free *next; > > +struct page_info *page; > > +gfn_t gfn; > > +}; > > + > > +static int mem_sharing_fork_reset(struct domain *d, struct domain *cd) > > +{ > > +int rc; > > + > > +struct p2m_domain* p2m = p2m_get_hostp2m(cd); > > +struct gfn_free *list = NULL; > > +struct page_info *page; > > + > > +page_list_for_each(page, >page_list) > > AFAICT, your domain is not paused, so it would be possible to have page > added/remove in that list behind your back. Well, it's not that it's not paused, it's just that I haven't added a sanity check to make sure it is. The toolstack can (and should) pause it, so that sanity check would be warranted. > > You also have multiple loop on the page_list in this function. Given the > number of page_list can be quite big, this is a call for hogging the > pCPU and an RCU lock on the domain vCPU running this call. There is just one loop over page_list itself, the second loop is on the internal list that is being built here which will be a subset. The list itself in fact should be small (in our tests usually <100). Granted the list can grow larger, but in those cases its likely better to just discard the fork and create a new one. So in my opinion adding a hypercall continuation to this not needed. > > > +{ > > +mfn_t mfn = page_to_mfn(page); > > +if ( mfn_valid(mfn) ) > > +{ > > +p2m_type_t p2mt; > > +p2m_access_t p2ma; > > +gfn_t gfn = mfn_to_gfn(cd, mfn); > > +mfn = __get_gfn_type_access(p2m, gfn_x(gfn), , , > > +0, NULL, false); > > +if ( p2m_is_ram(p2mt) ) > > +{ > > +struct gfn_free *gfn_free; > > +if ( !get_page(page, cd) ) > > +goto err_reset; > > + > > +/* > > + * We can't free the page while iterating over the > > page_list > > + * so we build a separate list to loop over. > > + * > > + * We want to iterate over the page_list instead of > > checking > > + * gfn from 0 to max_gfn because this is ~10x faster. > > + */ > > +gfn_free = xmalloc(struct gfn_free); > > If I did the math right, for a 4G guest this will require at ~24MB of > memory. Actually, is it really necessary to do the allocation for a > short period of time? If you have a fully deduplicated fork then you should not be using this function to begin with. You get better performance my throwing that one away and creating a new one. As for using xmalloc here, I'm not sure what other way I have to build a list of pages that need to be freed. I can't free the page itself while I'm iterating on page_list (that I'm aware of). The only other option available is calling __get_gfn_type_access with gfn=0..max_gfn which will be extremely slow because you have to loop over a lot of holes. > > What are you trying to achieve by iterating twice on the GFN? Wouldn't > it be easier to pause the domain? I'm not sure what you mean, where do you see me iterating twice on the gfn? And what does pausing have to do with it? Than ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] tools/python: Python 3 compatibility
On 18/12/2019 22:26, Marek Marczykowski-Górecki wrote: >> @@ -70,7 +73,7 @@ class VM(object): >> >> # libxl >> self.libxl = fmt == "libxl" >> -self.emu_xenstore = "" # NUL terminated key pairs from >> "toolstack" records >> +self.emu_xenstore = b"" # NUL terminated key pairs from >> "toolstack" records >> >> def write_libxc_ihdr(): >> stream_write(pack(libxc.IHDR_FORMAT, > You also need to update write_record (string constants). > And few calls to it with string constants (write_libxl_end, > write_libxl_libxc_context, read_pv_tail, read_hvm_tail). > And blkid == ... in read_pv_extended_info(). Urgh - well spotted. Was this manual inspection, or something else? (I probably should complete and upstream write-legacy-stream for the purpose of dev-testing the convert-legacy-stream script now that 4.6 is waaay in the past.) ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] tools/python: Python 3 compatibility
On Wed, Dec 18, 2019 at 03:05:22PM +, Andrew Cooper wrote: > convert-legacy-stream is only used for incomming migration from pre Xen 4.7, > and verify-stream-v2 appears to only be used by me during migration > development - it is little surprise that they missed the main converstion > effort in Xen 4.13. > > Fix it all up. > > Move open_file_or_fd() into a new util.py to avoid duplication, making it a > more generic wrapper around open() or fdopen(). > > Signed-off-by: Andrew Cooper > --- > CC: Ian Jackson > CC: Wei Liu > > This needs backporting to 4.13 ASAP > --- > tools/python/scripts/convert-legacy-stream | 49 > +++--- > tools/python/scripts/verify-stream-v2 | 43 +- > tools/python/xen/migration/libxc.py| 2 +- > tools/python/xen/migration/libxl.py| 2 +- > tools/python/xen/migration/verify.py | 4 +-- > tools/python/xen/util.py | 23 ++ > 6 files changed, 46 insertions(+), 77 deletions(-) > create mode 100644 tools/python/xen/util.py > > diff --git a/tools/python/scripts/convert-legacy-stream > b/tools/python/scripts/convert-legacy-stream > index 5f80f13654..b0d81aa92e 100755 > --- a/tools/python/scripts/convert-legacy-stream > +++ b/tools/python/scripts/convert-legacy-stream > @@ -5,6 +5,8 @@ > Convert a legacy migration stream to a v2 stream. > """ > > +from __future__ import print_function > + > import sys > import os, os.path > import syslog > @@ -12,6 +14,7 @@ import traceback > > from struct import calcsize, unpack, pack > > +from xen.util import open_file_or_fd as open_file_or_fd > from xen.migration import legacy, public, libxc, libxl, xl > > __version__ = 1 > @@ -39,16 +42,16 @@ def info(msg): > for line in msg.split("\n"): > syslog.syslog(syslog.LOG_INFO, line) > else: > -print msg > +print(msg) > > def err(msg): > """Error message, routed to appropriate destination""" > if log_to_syslog: > for line in msg.split("\n"): > syslog.syslog(syslog.LOG_ERR, line) > -print >> sys.stderr, msg > +print(msg, file = sys.stderr) > > -class StreamError(StandardError): > +class StreamError(Exception): > """Error with the incoming migration stream""" > pass > > @@ -70,7 +73,7 @@ class VM(object): > > # libxl > self.libxl = fmt == "libxl" > -self.emu_xenstore = "" # NUL terminated key pairs from > "toolstack" records > +self.emu_xenstore = b"" # NUL terminated key pairs from > "toolstack" records > > def write_libxc_ihdr(): > stream_write(pack(libxc.IHDR_FORMAT, You also need to update write_record (string constants). And few calls to it with string constants (write_libxl_end, write_libxl_libxc_context, read_pv_tail, read_hvm_tail). And blkid == ... in read_pv_extended_info(). > @@ -336,7 +339,7 @@ def read_libxl_toolstack(vm, data): > if twidth == 64: > name = name[:-4] > > -if name[-1] != '\x00': > +if name[-1] != b'\x00': > raise StreamError("physmap name not NUL terminated") > > root = "physmap/%x" % (phys,) > @@ -347,7 +350,7 @@ def read_libxl_toolstack(vm, data): > for key, val in zip(kv[0::2], kv[1::2]): > info("'%s' = '%s'" % (key, val)) > > -vm.emu_xenstore += '\x00'.join(kv) + '\x00' > +vm.emu_xenstore += b'\x00'.join(kv) + b'\x00' > > > def read_chunks(vm): > @@ -534,7 +537,7 @@ def read_qemu(vm): > sig, = unpack("21s", rawsig) > info("Qemu signature: %s" % (sig, )) > > -if sig == "DeviceModelRecord0002": > +if sig == b"DeviceModelRecord0002": > rawsz = rdexact(4) > sz, = unpack("I", rawsz) > qdata = rdexact(sz) > @@ -617,36 +620,6 @@ def read_legacy_stream(vm): > return 2 > return 0 > > -def open_file_or_fd(val, mode): > -""" > -If 'val' looks like a decimal integer, open it as an fd. If not, try to > -open it as a regular file. > -""" > - > -fd = -1 > -try: > -# Does it look like an integer? > -try: > -fd = int(val, 10) > -except ValueError: > -pass > - > -# Try to open it... > -if fd != -1: > -return os.fdopen(fd, mode, 0) > -else: > -return open(val, mode, 0) > - > -except StandardError, e: > -if fd != -1: > -err("Unable to open fd %d: %s: %s" % > -(fd, e.__class__.__name__, e)) > -else: > -err("Unable to open file '%s': %s: %s" % > -(val, e.__class__.__name__, e)) > - > -raise SystemExit(1) > - > > def main(): > from optparse import OptionParser > @@ -723,7 +696,7 @@ def main(): > if __name__ == "__main__": > try: > sys.exit(main()) > -except SystemExit, e: > +except SystemExit as e: >
Re: [Xen-devel] [PATCH v2 6/6] x86: implement Hyper-V clock source
On Wed, 18 Dec 2019 at 20:24, Michael Kelley wrote: > > From: Durrant, Paul Sent: Wednesday, December 18, 2019 > 7:24 AM > > > > From: Wei Liu On Behalf Of Wei Liu > > > Sent: 18 December 2019 14:43 > > [snip] > > > > + > > > +static inline uint64_t read_hyperv_timer(void) > > > +{ > > > +uint64_t scale, offset, ret, tsc; > > > +uint32_t seq; > > > +const struct ms_hyperv_tsc_page *tsc_page = hyperv_tsc; > > > + > > > +do { > > > +seq = tsc_page->tsc_sequence; > > > + > > > +/* Seq 0 is special. It means the TSC enlightenment is not > > > + * available at the moment. The reference time can only be > > > + * obtained from the Reference Counter MSR. > > > + */ > > > +if ( seq == 0 ) > > > > Older versions of the spec used to use 0x I think, although when I > > look again they > > seem to have been retro-actively fixed. In any case I think you should > > treat both > > 0x and 0 as invalid. > > FWIW, the 0x was just a bug in the spec. Hyper-V implementations only > set the value to 0 to indicate invalid. The equivalent Linux code checks > only for 0. > Thanks for chiming in, Michael. In that case I will submit a fix to change Xen's viridian code to remove the wrong value there. Wei. > Michael ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2 11/20] x86/mem_sharing: Convert MEM_SHARING_DESTROY_GFN to a bool
On Wed, Dec 18, 2019 at 2:29 PM Julien Grall wrote: > > Hi Tamas, > > On 18/12/2019 19:40, Tamas K Lengyel wrote: > > MEM_SHARING_DESTROY_GFN is used on the 'flags' bitfield during unsharing. > > However, the bitfield is not used for anything else, so just convert it to a > > bool instead. > > > > Signed-off-by: Tamas K Lengyel > > --- > > xen/arch/x86/mm/mem_sharing.c | 7 +++ > > xen/arch/x86/mm/p2m.c | 1 + > > xen/common/memory.c | 2 +- > > xen/include/asm-x86/mem_sharing.h | 5 ++--- > > 4 files changed, 7 insertions(+), 8 deletions(-) > > > > diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c > > index fc1d8be1eb..6e81e1a895 100644 > > --- a/xen/arch/x86/mm/mem_sharing.c > > +++ b/xen/arch/x86/mm/mem_sharing.c > > @@ -1175,7 +1175,7 @@ err_out: > >*/ > > int __mem_sharing_unshare_page(struct domain *d, > > unsigned long gfn, > > - uint16_t flags) > > + bool destroy) > > { > > p2m_type_t p2mt; > > mfn_t mfn; > > @@ -1231,7 +1231,7 @@ int __mem_sharing_unshare_page(struct domain *d, > >* If the GFN is getting destroyed drop the references to MFN > >* (possibly freeing the page), and exit early. > >*/ > > -if ( flags & MEM_SHARING_DESTROY_GFN ) > > +if ( destroy ) > > { > > if ( !last_gfn ) > > mem_sharing_gfn_destroy(page, d, gfn_info); > > @@ -1321,8 +1321,7 @@ int relinquish_shared_pages(struct domain *d) > > if ( mfn_valid(mfn) && p2m_is_shared(t) ) > > { > > /* Does not fail with ENOMEM given the DESTROY flag */ > > -BUG_ON(__mem_sharing_unshare_page(d, gfn, > > - MEM_SHARING_DESTROY_GFN)); > > +BUG_ON(__mem_sharing_unshare_page(d, gfn, true)); > > /* > >* Clear out the p2m entry so no one else may try to > >* unshare. Must succeed: we just read the old entry and > > diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c > > index baea632acc..53ea44fe3c 100644 > > --- a/xen/arch/x86/mm/p2m.c > > +++ b/xen/arch/x86/mm/p2m.c > > @@ -517,6 +517,7 @@ mfn_t __get_gfn_type_access(struct p2m_domain *p2m, > > unsigned long gfn_l, > >*/ > > if ( mem_sharing_unshare_page(p2m->domain, gfn_l) < 0 ) > > mem_sharing_notify_enomem(p2m->domain, gfn_l, false); > > + > > This line looks spurious. Yeap. > > > mfn = p2m->get_entry(p2m, gfn, t, a, q, page_order, NULL); > > } > > > > diff --git a/xen/common/memory.c b/xen/common/memory.c > > index 309e872edf..c7d2bac452 100644 > > --- a/xen/common/memory.c > > +++ b/xen/common/memory.c > > @@ -352,7 +352,7 @@ int guest_remove_page(struct domain *d, unsigned long > > gmfn) > >* might be the only one using this shared page, and we need to > >* trigger proper cleanup. Once done, this is like any other page. > >*/ > > -rc = mem_sharing_unshare_page(d, gmfn, 0); > > +rc = mem_sharing_unshare_page(d, gmfn); > > AFAICT, this patch does not reduce the number of parameters for > mem_sharing_unshare_page(). Did you intend to make this change in > another patch? Ah yea, it should have been dropped in patch 6 of the series. > > > if ( rc ) > > { > > mem_sharing_notify_enomem(d, gmfn, false); > > diff --git a/xen/include/asm-x86/mem_sharing.h > > b/xen/include/asm-x86/mem_sharing.h > > index 89cdaccea0..4b982a4803 100644 > > --- a/xen/include/asm-x86/mem_sharing.h > > +++ b/xen/include/asm-x86/mem_sharing.h > > @@ -76,17 +76,16 @@ struct page_sharing_info > > unsigned int mem_sharing_get_nr_saved_mfns(void); > > unsigned int mem_sharing_get_nr_shared_mfns(void); > > > > -#define MEM_SHARING_DESTROY_GFN (1<<1) > > /* Only fails with -ENOMEM. Enforce it with a BUG_ON wrapper. */ > > int __mem_sharing_unshare_page(struct domain *d, > > unsigned long gfn, > > - uint16_t flags); > > + bool destroy); > > > > static inline > > int mem_sharing_unshare_page(struct domain *d, > >unsigned long gfn) > > { > > -int rc = __mem_sharing_unshare_page(d, gfn, 0); > > +int rc = __mem_sharing_unshare_page(d, gfn, false); > > BUG_ON(rc && (rc != -ENOMEM)); > > return rc; > > } > > > > Cheers, Thanks, Tamas ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Xen ARM Dom0less passthrough without IOMMU
On Wed, 18 Dec 2019, Julien Grall wrote: > Hi Stefano, > > On 17/12/2019 18:28, Stefano Stabellini wrote: > > > Then I tried to passthrough the eMMC, but I got the following > > > error: > > > (XEN) DOM1: [0.879151] sdhci-esdhc-imx 4005d000.usdhc: can't request > > > region for resource [mem 0x4005d000-0x4005dfff] > > > (XEN) DOM1: [0.891137] sdhci-esdhc-imx 4005d000.usdhc: > > > sdhci_pltfm_init failed -16 > > > (XEN) DOM1: [0.900249] sdhci-esdhc-imx: probe of 4005d000.usdhc failed > > > with error -16 > > > > > > Where 0x4005d000 is the physical address of the uSDHC(eMMC) node in the > > > DT. > > > It seems that the DomU1 kernel does not have access to that memory zone. > > > > It looks like drivers/mmc/host/sdhci-pltfm.c:sdhci_pltfm_init failed, > > but I cannot see a simple reason why it would. As Julien mentioned the > > device tree snippet would be useful. Also the domU config and the full > > device tree would be useful. i.e. did you add "xen,passthrough;" under > > the related uSDHC node on the host device tree? > > The only purpose of "xen,passthrough" is to mark the device as disabled in > Dom0 DT. It will not affect how device will be passthrough to a guest. > > In this case, I don't believe the problem is DT related because Linux is able > to find the regions. If the region were not mapped to the guest, then it would > be likely result to a data abort later on. > > Looking at Andrei's e-mail again, he doesn't mention anything about the 1:1 > mapping. So I assume, he is still using the guest memory layout. The physical > address 0x4005d000 which is roughly 372KB into the first RAM bank for the > guest. > > > > I'm trying to passthrough the eMMC in order to mount DomU1's root > > > on a SDCard partition, because I couldn't get to DomU1's Linux prompt > > > when I tried to boot with a ramdisk module. I always get this error: > > > (XEN) DOM1: [1.544199] RAMDISK: Couldn't find valid RAM disk image > > > starting at 0. > > > > > > Could this be because the ramdisk is too big? The smallest I've tried with > > > Is approximately 60MB in size. What size are the ramdisks that you > > > are using in your dom0less booting demos? > > > > I don't think so, I could boot with ramdisk 120MB in size or even > > larger. It is probably an address calculation error: it is easy to make > > a small mistake in the addresses so that they end up overlapping. > > Sometimes it is even U-Boot that causes the overlaps. > > > > I would suggest to use ImageBuilder to create the U-Boot boot script to > > load all the binaries and boot the system. Have a look at > > uboot-script-gen in particular: > > > > https://gitlab.com/ViryaOS/imagebuilder/blob/master/scripts/uboot-script-gen > > Nice script, but it seems to contain hardcoded value (see offset and memaddr > override), does not take into account reserved region and assume where > U-boot/ATF may be loaded. So it may require some work before it can be used on > NXP board... Yes, you are right about that. The script doesn't understand reserved-memory today and it will just start loading binaries at 2MB after "MEMORY_START" as specified in the config file, assuming that it is safe to do so. Andrei, if you end up using it and it doesn't work, please let me know. I am interested in understanding any failures and might be able to improve the script or take patches for it. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] REGRESSION: Xen 4.13 RC5 fails to bootstrap Dom0 on ARM
Hi Roman, On 18/12/2019 17:03, Roman Shaposhnik wrote: On Wed, Dec 18, 2019 at 3:50 AM Julien Grall wrote: So -- nothing boots directly by UEFI -- everything goes through GRUB. However, my understanding is that GRUB will detect devicetree information provided by UEFI (even though devicetree command is supposed to completely replace that). Hence it is possible that Linux relies on some residuals left in memory by GRUB that Xen doesn't pay attention to (but this is a pretty wild speculation only). While it goes through GRUB, it is a bootloader and will just act as a proxy for EFI. So EFI application such as Xen/Linux can still be loaded and take advantage of runtime servies if present/implemented. In fact most of people on Arm are using GRUB rather than EFI directly as this is more friendly to use. Regarding the devicetree, Xen and Linux will completely ignore the memory nodes in Xen if using EFI. This because the EFI memory map will give you an overview of the platform with the EFI regions included. Cheers, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2 19/20] x86/mem_sharing: reset a fork
Hi Tamas, On 18/12/2019 19:40, Tamas K Lengyel wrote: Implement hypercall that allows a fork to shed all memory that got allocated for it during its execution and re-load its vCPU context from the parent VM. This allows the forked VM to reset into the same state the parent VM is in a faster way then creating a new fork would be. Measurements show about a 2x speedup during normal fuzzing operations. Performance may vary depending how much memory got allocated for the forked VM. If it has been completely deduplicated from the parent VM then creating a new fork would likely be more performant. Signed-off-by: Tamas K Lengyel --- xen/arch/x86/mm/mem_sharing.c | 105 ++ xen/include/public/memory.h | 1 + 2 files changed, 106 insertions(+) diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c index e93ad2ec5a..4735a334b9 100644 --- a/xen/arch/x86/mm/mem_sharing.c +++ b/xen/arch/x86/mm/mem_sharing.c @@ -1622,6 +1622,87 @@ static int mem_sharing_fork(struct domain *d, struct domain *cd) return 0; } +struct gfn_free; +struct gfn_free { +struct gfn_free *next; +struct page_info *page; +gfn_t gfn; +}; + +static int mem_sharing_fork_reset(struct domain *d, struct domain *cd) +{ +int rc; + +struct p2m_domain* p2m = p2m_get_hostp2m(cd); +struct gfn_free *list = NULL; +struct page_info *page; + +page_list_for_each(page, >page_list) AFAICT, your domain is not paused, so it would be possible to have page added/remove in that list behind your back. You also have multiple loop on the page_list in this function. Given the number of page_list can be quite big, this is a call for hogging the pCPU and an RCU lock on the domain vCPU running this call. +{ +mfn_t mfn = page_to_mfn(page); +if ( mfn_valid(mfn) ) +{ +p2m_type_t p2mt; +p2m_access_t p2ma; +gfn_t gfn = mfn_to_gfn(cd, mfn); +mfn = __get_gfn_type_access(p2m, gfn_x(gfn), , , +0, NULL, false); +if ( p2m_is_ram(p2mt) ) +{ +struct gfn_free *gfn_free; +if ( !get_page(page, cd) ) +goto err_reset; + +/* + * We can't free the page while iterating over the page_list + * so we build a separate list to loop over. + * + * We want to iterate over the page_list instead of checking + * gfn from 0 to max_gfn because this is ~10x faster. + */ +gfn_free = xmalloc(struct gfn_free); If I did the math right, for a 4G guest this will require at ~24MB of memory. Actually, is it really necessary to do the allocation for a short period of time? What are you trying to achieve by iterating twice on the GFN? Wouldn't it be easier to pause the domain? +if ( !gfn_free ) +goto err_reset; + +gfn_free->gfn = gfn; +gfn_free->page = page; +gfn_free->next = list; +list = gfn_free; +} +} +} + +while ( list ) +{ +struct gfn_free *next = list->next; + +rc = p2m->set_entry(p2m, list->gfn, INVALID_MFN, PAGE_ORDER_4K, +p2m_invalid, p2m_access_rwx, -1); +put_page_alloc_ref(list->page); +put_page(list->page); + +xfree(list); +list = next; + +ASSERT(!rc); +} + +if ( (rc = fork_hvm(d, cd)) ) +return rc; + + err_reset: +while ( list ) +{ +struct gfn_free *next = list->next; + +put_page(list->page); +xfree(list); +list = next; +} + +return 0; +} + Cheers, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2 11/20] x86/mem_sharing: Convert MEM_SHARING_DESTROY_GFN to a bool
Hi Tamas, On 18/12/2019 19:40, Tamas K Lengyel wrote: MEM_SHARING_DESTROY_GFN is used on the 'flags' bitfield during unsharing. However, the bitfield is not used for anything else, so just convert it to a bool instead. Signed-off-by: Tamas K Lengyel --- xen/arch/x86/mm/mem_sharing.c | 7 +++ xen/arch/x86/mm/p2m.c | 1 + xen/common/memory.c | 2 +- xen/include/asm-x86/mem_sharing.h | 5 ++--- 4 files changed, 7 insertions(+), 8 deletions(-) diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c index fc1d8be1eb..6e81e1a895 100644 --- a/xen/arch/x86/mm/mem_sharing.c +++ b/xen/arch/x86/mm/mem_sharing.c @@ -1175,7 +1175,7 @@ err_out: */ int __mem_sharing_unshare_page(struct domain *d, unsigned long gfn, - uint16_t flags) + bool destroy) { p2m_type_t p2mt; mfn_t mfn; @@ -1231,7 +1231,7 @@ int __mem_sharing_unshare_page(struct domain *d, * If the GFN is getting destroyed drop the references to MFN * (possibly freeing the page), and exit early. */ -if ( flags & MEM_SHARING_DESTROY_GFN ) +if ( destroy ) { if ( !last_gfn ) mem_sharing_gfn_destroy(page, d, gfn_info); @@ -1321,8 +1321,7 @@ int relinquish_shared_pages(struct domain *d) if ( mfn_valid(mfn) && p2m_is_shared(t) ) { /* Does not fail with ENOMEM given the DESTROY flag */ -BUG_ON(__mem_sharing_unshare_page(d, gfn, - MEM_SHARING_DESTROY_GFN)); +BUG_ON(__mem_sharing_unshare_page(d, gfn, true)); /* * Clear out the p2m entry so no one else may try to * unshare. Must succeed: we just read the old entry and diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c index baea632acc..53ea44fe3c 100644 --- a/xen/arch/x86/mm/p2m.c +++ b/xen/arch/x86/mm/p2m.c @@ -517,6 +517,7 @@ mfn_t __get_gfn_type_access(struct p2m_domain *p2m, unsigned long gfn_l, */ if ( mem_sharing_unshare_page(p2m->domain, gfn_l) < 0 ) mem_sharing_notify_enomem(p2m->domain, gfn_l, false); + This line looks spurious. mfn = p2m->get_entry(p2m, gfn, t, a, q, page_order, NULL); } diff --git a/xen/common/memory.c b/xen/common/memory.c index 309e872edf..c7d2bac452 100644 --- a/xen/common/memory.c +++ b/xen/common/memory.c @@ -352,7 +352,7 @@ int guest_remove_page(struct domain *d, unsigned long gmfn) * might be the only one using this shared page, and we need to * trigger proper cleanup. Once done, this is like any other page. */ -rc = mem_sharing_unshare_page(d, gmfn, 0); +rc = mem_sharing_unshare_page(d, gmfn); AFAICT, this patch does not reduce the number of parameters for mem_sharing_unshare_page(). Did you intend to make this change in another patch? if ( rc ) { mem_sharing_notify_enomem(d, gmfn, false); diff --git a/xen/include/asm-x86/mem_sharing.h b/xen/include/asm-x86/mem_sharing.h index 89cdaccea0..4b982a4803 100644 --- a/xen/include/asm-x86/mem_sharing.h +++ b/xen/include/asm-x86/mem_sharing.h @@ -76,17 +76,16 @@ struct page_sharing_info unsigned int mem_sharing_get_nr_saved_mfns(void); unsigned int mem_sharing_get_nr_shared_mfns(void); -#define MEM_SHARING_DESTROY_GFN (1<<1) /* Only fails with -ENOMEM. Enforce it with a BUG_ON wrapper. */ int __mem_sharing_unshare_page(struct domain *d, unsigned long gfn, - uint16_t flags); + bool destroy); static inline int mem_sharing_unshare_page(struct domain *d, unsigned long gfn) { -int rc = __mem_sharing_unshare_page(d, gfn, 0); +int rc = __mem_sharing_unshare_page(d, gfn, false); BUG_ON(rc && (rc != -ENOMEM)); return rc; } Cheers, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v4 1/6] arm/arm64/xen: hypercall.h add includes guards
> > /* > > - * Whenever we re-enter userspace, the domains should always be > > + * Whenever we re-enter kernel, the domains should always be > > This feels unrelated from the rest of the patch and probably want an > explanation. So I think this want to be in a separate patch. I will simply remove this comment fix, since I do not change anything else in this file anymore. > The rest of the patch looks good to me. Thank you Julien. > > Cheers, > > -- > Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v4 2/6] arm/arm64/xen: use C inlines for privcmd_call
On Mon, Dec 16, 2019 at 3:41 PM Julien Grall wrote: > > Hello, > > On 04/12/2019 23:20, Pavel Tatashin wrote: > > privcmd_call requires to enable access to userspace for the > > duration of the hypercall. > > > > Currently, this is done via assembly macros. Change it to C > > inlines instead. > > > > Signed-off-by: Pavel Tatashin > > Acked-by: Stefano Stabellini > > Reviewed-by: Julien Grall Great, thank you! Pasha ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [xen-4.13-testing test] 144932: tolerable FAIL - PUSHED
flight 144932 xen-4.13-testing real [real] http://logs.test-lab.xenproject.org/osstest/logs/144932/ Failures :-/ but no regressions. Regressions which are regarded as allowable (not blocking): test-armhf-armhf-xl-rtds 12 guest-start fail REGR. vs. 144774 Tests which did not succeed, but are not blocking: test-amd64-i386-xl-pvshim12 guest-start fail never pass test-amd64-amd64-libvirt 13 migrate-support-checkfail never pass test-arm64-arm64-xl-seattle 13 migrate-support-checkfail never pass test-arm64-arm64-xl-seattle 14 saverestore-support-checkfail never pass test-amd64-i386-libvirt-xsm 13 migrate-support-checkfail never pass test-amd64-i386-libvirt 13 migrate-support-checkfail never pass test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2 fail never pass test-arm64-arm64-xl-xsm 13 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 14 saverestore-support-checkfail never pass test-arm64-arm64-xl 13 migrate-support-checkfail never pass test-arm64-arm64-xl 14 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit2 13 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 14 saverestore-support-checkfail never pass test-arm64-arm64-libvirt-xsm 13 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 14 saverestore-support-checkfail never pass test-arm64-arm64-xl-thunderx 13 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 14 saverestore-support-checkfail never pass test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stop fail never pass test-amd64-amd64-libvirt-vhd 12 migrate-support-checkfail never pass test-arm64-arm64-xl-credit1 13 migrate-support-checkfail never pass test-arm64-arm64-xl-credit1 14 saverestore-support-checkfail never pass test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop fail never pass test-amd64-i386-xl-qemut-win7-amd64 17 guest-stop fail never pass test-armhf-armhf-xl 13 migrate-support-checkfail never pass test-armhf-armhf-xl 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit2 13 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-cubietruck 13 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 13 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-multivcpu 14 saverestore-support-checkfail never pass test-armhf-armhf-libvirt 13 migrate-support-checkfail never pass test-armhf-armhf-libvirt 14 saverestore-support-checkfail never pass test-amd64-i386-xl-qemut-ws16-amd64 17 guest-stop fail never pass test-amd64-amd64-xl-qemut-win7-amd64 17 guest-stop fail never pass test-armhf-armhf-libvirt-raw 12 migrate-support-checkfail never pass test-armhf-armhf-libvirt-raw 13 saverestore-support-checkfail never pass test-armhf-armhf-xl-vhd 12 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 13 saverestore-support-checkfail never pass test-amd64-i386-xl-qemuu-ws16-amd64 17 guest-stop fail never pass test-amd64-amd64-xl-qemuu-ws16-amd64 17 guest-stop fail never pass test-armhf-armhf-xl-credit1 13 migrate-support-checkfail never pass test-armhf-armhf-xl-credit1 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-arndale 13 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 14 saverestore-support-checkfail never pass test-amd64-amd64-xl-qemut-ws16-amd64 17 guest-stop fail never pass version targeted for testing: xen a2e84d8e42c9e878fff17b738d8e5c5d83888f31 baseline version: xen ddccd9f87ef8accdff518dc2ebb64c05f55cd278 Last test of basis 144774 2019-12-12 22:39:31 Z5 days Testing same since 144932 2019-12-18 12:06:15 Z0 days1 attempts People who touched revisions under test: Ian Jackson jobs: build-amd64-xsm pass build-arm64-xsm pass build-i386-xsm pass build-amd64-xtf pass
Re: [Xen-devel] [PATCH v2 6/6] x86: implement Hyper-V clock source
From: Durrant, Paul Sent: Wednesday, December 18, 2019 7:24 AM > > From: Wei Liu On Behalf Of Wei Liu > > Sent: 18 December 2019 14:43 [snip] > > + > > +static inline uint64_t read_hyperv_timer(void) > > +{ > > +uint64_t scale, offset, ret, tsc; > > +uint32_t seq; > > +const struct ms_hyperv_tsc_page *tsc_page = hyperv_tsc; > > + > > +do { > > +seq = tsc_page->tsc_sequence; > > + > > +/* Seq 0 is special. It means the TSC enlightenment is not > > + * available at the moment. The reference time can only be > > + * obtained from the Reference Counter MSR. > > + */ > > +if ( seq == 0 ) > > Older versions of the spec used to use 0x I think, although when I > look again they > seem to have been retro-actively fixed. In any case I think you should treat > both > 0x and 0 as invalid. FWIW, the 0x was just a bug in the spec. Hyper-V implementations only set the value to 0 to indicate invalid. The equivalent Linux code checks only for 0. Michael ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] x86/save: reserve HVM save record numbers that have been consumed...
On 18/12/2019 16:09, Paul Durrant wrote: > ...for patches not (yet) upstream. > > This patch is simply reserving save record number space to avoid the > risk of clashes between existent downstream changes made by Amazon and > future upstream changes which may be incompatible. > > Signed-off-by: Paul Durrant Is this "you've already used some of these", or you plan to? ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v2 14/20] x86/mem_sharing: Enable mem_sharing on first memop
It is wasteful to require separate hypercalls to enable sharing on both the parent and the client domain during VM forking. To speed things up we enable sharing on the first memop in case it wasn't already enabled. Signed-off-by: Tamas K Lengyel --- xen/arch/x86/mm/mem_sharing.c | 39 +-- 1 file changed, 23 insertions(+), 16 deletions(-) diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c index e5c1424f9b..48809a5349 100644 --- a/xen/arch/x86/mm/mem_sharing.c +++ b/xen/arch/x86/mm/mem_sharing.c @@ -1402,6 +1402,24 @@ static int range_share(struct domain *d, struct domain *cd, return rc; } +static inline int mem_sharing_control(struct domain *d, bool enable) +{ +if ( enable ) +{ +if ( unlikely(!is_hvm_domain(d)) ) +return -ENOSYS; + +if ( unlikely(!hap_enabled(d)) ) +return -ENODEV; + +if ( unlikely(is_iommu_enabled(d)) ) +return -EXDEV; +} + +d->arch.hvm.mem_sharing.enabled = enable; +return 0; +} + int mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg) { int rc; @@ -1423,10 +1441,8 @@ int mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg) if ( rc ) goto out; -/* Only HAP is supported */ -rc = -ENODEV; -if ( !mem_sharing_enabled(d) ) -goto out; +if ( !mem_sharing_enabled(d) && (rc = mem_sharing_control(d, true)) ) +return rc; switch ( mso.op ) { @@ -1675,24 +1691,15 @@ int mem_sharing_domctl(struct domain *d, struct xen_domctl_mem_sharing_op *mec) { int rc; -/* Only HAP is supported */ -if ( !hap_enabled(d) ) - return -ENODEV; - switch(mec->op) { case XEN_DOMCTL_MEM_SHARING_CONTROL: -{ -rc = 0; -if ( unlikely(is_iommu_enabled(d) && mec->u.enable) ) -rc = -EXDEV; -else -d->arch.hvm.mem_sharing.enabled = mec->u.enable; -} -break; +rc = mem_sharing_control(d, mec->u.enable); +break; default: rc = -ENOSYS; +break; } return rc; -- 2.20.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v2 15/20] x86/mem_sharing: Skip xen heap pages in memshr nominate
Trying to share these would fail anyway, better to skip them early. Signed-off-by: Tamas K Lengyel --- xen/arch/x86/mm/mem_sharing.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c index 48809a5349..b3607b1bce 100644 --- a/xen/arch/x86/mm/mem_sharing.c +++ b/xen/arch/x86/mm/mem_sharing.c @@ -852,6 +852,11 @@ static int nominate_page(struct domain *d, gfn_t gfn, if ( !p2m_is_sharable(p2mt) ) goto out; +/* Skip xen heap pages */ +page = mfn_to_page(mfn); +if ( !page || is_xen_heap_page(page) ) +goto out; + /* Check if there are mem_access/remapped altp2m entries for this page */ if ( altp2m_active(d) ) { @@ -882,7 +887,6 @@ static int nominate_page(struct domain *d, gfn_t gfn, } /* Try to convert the mfn to the sharable type */ -page = mfn_to_page(mfn); ret = page_make_sharable(d, page, expected_refcnt); if ( ret ) goto out; -- 2.20.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v2 20/20] xen/tools: VM forking toolstack side
Add necessary bits to implement "xl fork-vm", "xl fork-launch-dm" and "xl fork-reset" commands. The process is split in two to allow tools needing access to the new VM as fast as possible after it was forked. It is expected that under certain use-cases the second command that launches QEMU will be skipped entirely. Signed-off-by: Tamas K Lengyel --- tools/libxc/include/xenctrl.h | 6 + tools/libxc/xc_memshr.c | 22 tools/libxl/libxl.h | 7 + tools/libxl/libxl_create.c| 237 +++--- tools/libxl/libxl_dm.c| 2 +- tools/libxl/libxl_dom.c | 83 tools/libxl/libxl_internal.h | 1 + tools/libxl/libxl_types.idl | 1 + tools/xl/xl.h | 5 + tools/xl/xl_cmdtable.c| 22 tools/xl/xl_saverestore.c | 96 ++ tools/xl/xl_vmcontrol.c | 8 ++ 12 files changed, 386 insertions(+), 104 deletions(-) diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h index b5ffa53d55..39afdb9b33 100644 --- a/tools/libxc/include/xenctrl.h +++ b/tools/libxc/include/xenctrl.h @@ -2221,6 +2221,12 @@ int xc_memshr_range_share(xc_interface *xch, uint64_t first_gfn, uint64_t last_gfn); +int xc_memshr_fork(xc_interface *xch, + uint32_t source_domain, + uint32_t client_domain); + +int xc_memshr_fork_reset(xc_interface *xch, uint32_t forked_domain); + /* Debug calls: return the number of pages referencing the shared frame backing * the input argument. Should be one or greater. * diff --git a/tools/libxc/xc_memshr.c b/tools/libxc/xc_memshr.c index 5ef56a6933..ef5a5ee6a4 100644 --- a/tools/libxc/xc_memshr.c +++ b/tools/libxc/xc_memshr.c @@ -237,6 +237,28 @@ int xc_memshr_debug_gref(xc_interface *xch, return xc_memshr_memop(xch, domid, ); } +int xc_memshr_fork(xc_interface *xch, uint32_t pdomid, uint32_t domid) +{ +xen_mem_sharing_op_t mso; + +memset(, 0, sizeof(mso)); + +mso.op = XENMEM_sharing_op_fork; +mso.u.fork.parent_domain = pdomid; + +return xc_memshr_memop(xch, domid, ); +} + +int xc_memshr_fork_reset(xc_interface *xch, uint32_t domid) +{ +xen_mem_sharing_op_t mso; + +memset(, 0, sizeof(mso)); +mso.op = XENMEM_sharing_op_fork_reset; + +return xc_memshr_memop(xch, domid, ); +} + int xc_memshr_audit(xc_interface *xch) { xen_mem_sharing_op_t mso; diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h index 54abb9db1f..75cb070587 100644 --- a/tools/libxl/libxl.h +++ b/tools/libxl/libxl.h @@ -1536,6 +1536,13 @@ int libxl_domain_create_new(libxl_ctx *ctx, libxl_domain_config *d_config, const libxl_asyncop_how *ao_how, const libxl_asyncprogress_how *aop_console_how) LIBXL_EXTERNAL_CALLERS_ONLY; +int libxl_domain_fork_vm(libxl_ctx *ctx, uint32_t pdomid, uint32_t *domid) + LIBXL_EXTERNAL_CALLERS_ONLY; +int libxl_domain_fork_launch_dm(libxl_ctx *ctx, libxl_domain_config *d_config, +uint32_t domid, +const libxl_asyncprogress_how *aop_console_how) +LIBXL_EXTERNAL_CALLERS_ONLY; +int libxl_domain_fork_reset(libxl_ctx *ctx, uint32_t domid); int libxl_domain_create_restore(libxl_ctx *ctx, libxl_domain_config *d_config, uint32_t *domid, int restore_fd, int send_back_fd, diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c index 32d45dcef0..e0d219596c 100644 --- a/tools/libxl/libxl_create.c +++ b/tools/libxl/libxl_create.c @@ -536,12 +536,12 @@ out: return ret; } -int libxl__domain_make(libxl__gc *gc, libxl_domain_config *d_config, - libxl__domain_build_state *state, - uint32_t *domid) +static int libxl__domain_make_xs_entries(libxl__gc *gc, libxl_domain_config *d_config, + libxl__domain_build_state *state, + uint32_t domid) { libxl_ctx *ctx = libxl__gc_owner(gc); -int ret, rc, nb_vm; +int rc, nb_vm; const char *dom_type; char *uuid_string; char *dom_path, *vm_path, *libxl_path; @@ -553,7 +553,6 @@ int libxl__domain_make(libxl__gc *gc, libxl_domain_config *d_config, /* convenience aliases */ libxl_domain_create_info *info = _config->c_info; -libxl_domain_build_info *b_info = _config->b_info; uuid_string = libxl__uuid2string(gc, info->uuid); if (!uuid_string) { @@ -561,64 +560,7 @@ int libxl__domain_make(libxl__gc *gc, libxl_domain_config *d_config, goto out; } -/* Valid domid here means we're soft resetting. */ -if (!libxl_domid_valid_guest(*domid)) { -struct xen_domctl_createdomain create = { -.ssidref = info->ssidref, -
[Xen-devel] [PATCH v2 19/20] x86/mem_sharing: reset a fork
Implement hypercall that allows a fork to shed all memory that got allocated for it during its execution and re-load its vCPU context from the parent VM. This allows the forked VM to reset into the same state the parent VM is in a faster way then creating a new fork would be. Measurements show about a 2x speedup during normal fuzzing operations. Performance may vary depending how much memory got allocated for the forked VM. If it has been completely deduplicated from the parent VM then creating a new fork would likely be more performant. Signed-off-by: Tamas K Lengyel --- xen/arch/x86/mm/mem_sharing.c | 105 ++ xen/include/public/memory.h | 1 + 2 files changed, 106 insertions(+) diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c index e93ad2ec5a..4735a334b9 100644 --- a/xen/arch/x86/mm/mem_sharing.c +++ b/xen/arch/x86/mm/mem_sharing.c @@ -1622,6 +1622,87 @@ static int mem_sharing_fork(struct domain *d, struct domain *cd) return 0; } +struct gfn_free; +struct gfn_free { +struct gfn_free *next; +struct page_info *page; +gfn_t gfn; +}; + +static int mem_sharing_fork_reset(struct domain *d, struct domain *cd) +{ +int rc; + +struct p2m_domain* p2m = p2m_get_hostp2m(cd); +struct gfn_free *list = NULL; +struct page_info *page; + +page_list_for_each(page, >page_list) +{ +mfn_t mfn = page_to_mfn(page); +if ( mfn_valid(mfn) ) +{ +p2m_type_t p2mt; +p2m_access_t p2ma; +gfn_t gfn = mfn_to_gfn(cd, mfn); +mfn = __get_gfn_type_access(p2m, gfn_x(gfn), , , +0, NULL, false); +if ( p2m_is_ram(p2mt) ) +{ +struct gfn_free *gfn_free; +if ( !get_page(page, cd) ) +goto err_reset; + +/* + * We can't free the page while iterating over the page_list + * so we build a separate list to loop over. + * + * We want to iterate over the page_list instead of checking + * gfn from 0 to max_gfn because this is ~10x faster. + */ +gfn_free = xmalloc(struct gfn_free); +if ( !gfn_free ) +goto err_reset; + +gfn_free->gfn = gfn; +gfn_free->page = page; +gfn_free->next = list; +list = gfn_free; +} +} +} + +while ( list ) +{ +struct gfn_free *next = list->next; + +rc = p2m->set_entry(p2m, list->gfn, INVALID_MFN, PAGE_ORDER_4K, +p2m_invalid, p2m_access_rwx, -1); +put_page_alloc_ref(list->page); +put_page(list->page); + +xfree(list); +list = next; + +ASSERT(!rc); +} + +if ( (rc = fork_hvm(d, cd)) ) +return rc; + + err_reset: +while ( list ) +{ +struct gfn_free *next = list->next; + +put_page(list->page); +xfree(list); +list = next; +} + +return 0; +} + int mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg) { int rc; @@ -1905,6 +1986,30 @@ int mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg) rcu_unlock_domain(pd); break; } + +case XENMEM_sharing_op_fork_reset: +{ +struct domain *pd; + +rc = -EINVAL; +if ( mso.u.fork._pad[0] || mso.u.fork._pad[1] || + mso.u.fork._pad[2] ) + goto out; + +rc = -ENOSYS; +if ( !d->parent ) +goto out; + +rc = rcu_lock_live_remote_domain_by_id(d->parent->domain_id, ); +if ( rc ) +goto out; + +rc = mem_sharing_fork_reset(pd, d); + +rcu_unlock_domain(pd); +break; +} + default: rc = -ENOSYS; break; diff --git a/xen/include/public/memory.h b/xen/include/public/memory.h index 90a3f4498e..e3d063e22e 100644 --- a/xen/include/public/memory.h +++ b/xen/include/public/memory.h @@ -483,6 +483,7 @@ DEFINE_XEN_GUEST_HANDLE(xen_mem_access_op_t); #define XENMEM_sharing_op_audit 7 #define XENMEM_sharing_op_range_share 8 #define XENMEM_sharing_op_fork 9 +#define XENMEM_sharing_op_fork_reset10 #define XENMEM_SHARING_OP_S_HANDLE_INVALID (-10) #define XENMEM_SHARING_OP_C_HANDLE_INVALID (-9) -- 2.20.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v2 18/20] xen/mem_access: Use __get_gfn_type_access in set_mem_access
Use __get_gfn_type_access instead of p2m->get_entry to trigger page-forking when the mem_access permission is being set on a page that has not yet been copied over from the parent. Signed-off-by: Tamas K Lengyel --- xen/arch/x86/mm/mem_access.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/xen/arch/x86/mm/mem_access.c b/xen/arch/x86/mm/mem_access.c index 320b9fe621..9caf08a5b2 100644 --- a/xen/arch/x86/mm/mem_access.c +++ b/xen/arch/x86/mm/mem_access.c @@ -303,11 +303,10 @@ static int set_mem_access(struct domain *d, struct p2m_domain *p2m, ASSERT(!ap2m); #endif { -mfn_t mfn; p2m_access_t _a; p2m_type_t t; - -mfn = p2m->get_entry(p2m, gfn, , &_a, 0, NULL, NULL); +mfn_t mfn = __get_gfn_type_access(p2m, gfn_x(gfn), , &_a, + P2M_ALLOC, NULL, false); rc = p2m->set_entry(p2m, gfn, mfn, PAGE_ORDER_4K, t, a, -1); } -- 2.20.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v2 13/20] x86/mem_sharing: ASSERT that p2m_set_entry succeeds
Signed-off-by: Tamas K Lengyel --- xen/arch/x86/mm/mem_sharing.c | 46 +-- 1 file changed, 22 insertions(+), 24 deletions(-) diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c index 90b6371e2f..e5c1424f9b 100644 --- a/xen/arch/x86/mm/mem_sharing.c +++ b/xen/arch/x86/mm/mem_sharing.c @@ -1113,39 +1113,37 @@ int add_to_physmap(struct domain *sd, unsigned long sgfn, shr_handle_t sh, goto err_unlock; } +/* + * Must succeed, we just read the entry and hold the p2m lock + * via get_two_gfns. + */ ret = p2m_set_entry(p2m, _gfn(cgfn), smfn, PAGE_ORDER_4K, p2m_ram_shared, a); +ASSERT(!ret); -/* Tempted to turn this into an assert */ -if ( ret ) +/* + * There is a chance we're plugging a hole where a paged out + * page was. + */ +if ( p2m_is_paging(cmfn_type) && (cmfn_type != p2m_ram_paging_out) ) { -mem_sharing_gfn_destroy(spage, cd, gfn_info); -put_page_and_type(spage); -} else { +atomic_dec(>paged_pages); /* - * There is a chance we're plugging a hole where a paged out - * page was. + * Further, there is a chance this was a valid page. + * Don't leak it. */ -if ( p2m_is_paging(cmfn_type) && (cmfn_type != p2m_ram_paging_out) ) +if ( mfn_valid(cmfn) ) { -atomic_dec(>paged_pages); -/* - * Further, there is a chance this was a valid page. - * Don't leak it. - */ -if ( mfn_valid(cmfn) ) -{ -struct page_info *cpage = mfn_to_page(cmfn); +struct page_info *cpage = mfn_to_page(cmfn); -if ( !get_page(cpage, cd) ) -{ -domain_crash(cd); -ret = -EOVERFLOW; -goto err_unlock; -} -put_page_alloc_ref(cpage); -put_page(cpage); +if ( !get_page(cpage, cd) ) +{ +domain_crash(cd); +ret = -EOVERFLOW; +goto err_unlock; } +put_page_alloc_ref(cpage); +put_page(cpage); } } -- 2.20.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v2 09/20] x86/mem_sharing: Use INVALID_MFN and p2m_is_shared in relinquish_shared_pages
While using _mfn(0) is of no consequence during teardown, INVALID_MFN is the correct value that should be used. Signed-off-by: Tamas K Lengyel --- xen/arch/x86/mm/mem_sharing.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c index 5d81730315..1b7b520ccf 100644 --- a/xen/arch/x86/mm/mem_sharing.c +++ b/xen/arch/x86/mm/mem_sharing.c @@ -1317,7 +1317,7 @@ int relinquish_shared_pages(struct domain *d) break; mfn = p2m->get_entry(p2m, _gfn(gfn), , , 0, NULL, NULL); -if ( mfn_valid(mfn) && t == p2m_ram_shared ) +if ( mfn_valid(mfn) && p2m_is_shared(t) ) { /* Does not fail with ENOMEM given the DESTROY flag */ BUG_ON(__mem_sharing_unshare_page(d, gfn, @@ -1327,7 +1327,7 @@ int relinquish_shared_pages(struct domain *d) * unshare. Must succeed: we just read the old entry and * we hold the p2m lock. */ -set_rc = p2m->set_entry(p2m, _gfn(gfn), _mfn(0), PAGE_ORDER_4K, +set_rc = p2m->set_entry(p2m, _gfn(gfn), INVALID_MFN, PAGE_ORDER_4K, p2m_invalid, p2m_access_rwx, -1); ASSERT(!set_rc); count += 0x10; -- 2.20.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v2 07/20] x86/mem_sharing: don't try to unshare twice during page fault
The page was already tried to be unshared in get_gfn_type_access. If that didn't work, then trying again is pointless. Don't try to send vm_event again either, simply check if there is a ring or not. Signed-off-by: Tamas K Lengyel --- xen/arch/x86/hvm/hvm.c | 26 +- 1 file changed, 17 insertions(+), 9 deletions(-) diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c index e055114922..8f90841813 100644 --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -38,6 +38,7 @@ #include #include #include +#include #include #include #include @@ -1706,11 +1707,14 @@ int hvm_hap_nested_page_fault(paddr_t gpa, unsigned long gla, struct domain *currd = curr->domain; struct p2m_domain *p2m, *hostp2m; int rc, fall_through = 0, paged = 0; -int sharing_enomem = 0; vm_event_request_t *req_ptr = NULL; bool sync = false; unsigned int page_order; +#ifdef CONFIG_MEM_SHARING +bool sharing_enomem = false; +#endif + /* On Nested Virtualization, walk the guest page table. * If this succeeds, all is fine. * If this fails, inject a nested page fault into the guest. @@ -1898,14 +1902,16 @@ int hvm_hap_nested_page_fault(paddr_t gpa, unsigned long gla, if ( p2m_is_paged(p2mt) || (p2mt == p2m_ram_paging_out) ) paged = 1; -/* Mem sharing: unshare the page and try again */ -if ( npfec.write_access && (p2mt == p2m_ram_shared) ) +#ifdef CONFIG_MEM_SHARING +/* Mem sharing: if still shared on write access then its enomem */ +if ( npfec.write_access && p2m_is_shared(p2mt) ) { ASSERT(p2m_is_hostp2m(p2m)); -sharing_enomem = mem_sharing_unshare_page(currd, gfn); +sharing_enomem = true; rc = 1; goto out_put_gfn; } +#endif /* Spurious fault? PoD and log-dirty also take this path. */ if ( p2m_is_ram(p2mt) ) @@ -1959,19 +1965,21 @@ int hvm_hap_nested_page_fault(paddr_t gpa, unsigned long gla, */ if ( paged ) p2m_mem_paging_populate(currd, gfn); + +#ifdef CONFIG_MEM_SHARING if ( sharing_enomem ) { -int rv; - -if ( (rv = mem_sharing_notify_enomem(currd, gfn, true)) < 0 ) +if ( !vm_event_check_ring(currd->vm_event_share) ) { gdprintk(XENLOG_ERR, "Domain %hu attempt to unshare " - "gfn %lx, ENOMEM and no helper (rc %d)\n", - currd->domain_id, gfn, rv); + "gfn %lx, ENOMEM and no helper\n", + currd->domain_id, gfn); /* Crash the domain */ rc = 0; } } +#endif + if ( req_ptr ) { if ( monitor_traps(curr, sync, req_ptr) < 0 ) -- 2.20.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v2 11/20] x86/mem_sharing: Convert MEM_SHARING_DESTROY_GFN to a bool
MEM_SHARING_DESTROY_GFN is used on the 'flags' bitfield during unsharing. However, the bitfield is not used for anything else, so just convert it to a bool instead. Signed-off-by: Tamas K Lengyel --- xen/arch/x86/mm/mem_sharing.c | 7 +++ xen/arch/x86/mm/p2m.c | 1 + xen/common/memory.c | 2 +- xen/include/asm-x86/mem_sharing.h | 5 ++--- 4 files changed, 7 insertions(+), 8 deletions(-) diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c index fc1d8be1eb..6e81e1a895 100644 --- a/xen/arch/x86/mm/mem_sharing.c +++ b/xen/arch/x86/mm/mem_sharing.c @@ -1175,7 +1175,7 @@ err_out: */ int __mem_sharing_unshare_page(struct domain *d, unsigned long gfn, - uint16_t flags) + bool destroy) { p2m_type_t p2mt; mfn_t mfn; @@ -1231,7 +1231,7 @@ int __mem_sharing_unshare_page(struct domain *d, * If the GFN is getting destroyed drop the references to MFN * (possibly freeing the page), and exit early. */ -if ( flags & MEM_SHARING_DESTROY_GFN ) +if ( destroy ) { if ( !last_gfn ) mem_sharing_gfn_destroy(page, d, gfn_info); @@ -1321,8 +1321,7 @@ int relinquish_shared_pages(struct domain *d) if ( mfn_valid(mfn) && p2m_is_shared(t) ) { /* Does not fail with ENOMEM given the DESTROY flag */ -BUG_ON(__mem_sharing_unshare_page(d, gfn, - MEM_SHARING_DESTROY_GFN)); +BUG_ON(__mem_sharing_unshare_page(d, gfn, true)); /* * Clear out the p2m entry so no one else may try to * unshare. Must succeed: we just read the old entry and diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c index baea632acc..53ea44fe3c 100644 --- a/xen/arch/x86/mm/p2m.c +++ b/xen/arch/x86/mm/p2m.c @@ -517,6 +517,7 @@ mfn_t __get_gfn_type_access(struct p2m_domain *p2m, unsigned long gfn_l, */ if ( mem_sharing_unshare_page(p2m->domain, gfn_l) < 0 ) mem_sharing_notify_enomem(p2m->domain, gfn_l, false); + mfn = p2m->get_entry(p2m, gfn, t, a, q, page_order, NULL); } diff --git a/xen/common/memory.c b/xen/common/memory.c index 309e872edf..c7d2bac452 100644 --- a/xen/common/memory.c +++ b/xen/common/memory.c @@ -352,7 +352,7 @@ int guest_remove_page(struct domain *d, unsigned long gmfn) * might be the only one using this shared page, and we need to * trigger proper cleanup. Once done, this is like any other page. */ -rc = mem_sharing_unshare_page(d, gmfn, 0); +rc = mem_sharing_unshare_page(d, gmfn); if ( rc ) { mem_sharing_notify_enomem(d, gmfn, false); diff --git a/xen/include/asm-x86/mem_sharing.h b/xen/include/asm-x86/mem_sharing.h index 89cdaccea0..4b982a4803 100644 --- a/xen/include/asm-x86/mem_sharing.h +++ b/xen/include/asm-x86/mem_sharing.h @@ -76,17 +76,16 @@ struct page_sharing_info unsigned int mem_sharing_get_nr_saved_mfns(void); unsigned int mem_sharing_get_nr_shared_mfns(void); -#define MEM_SHARING_DESTROY_GFN (1<<1) /* Only fails with -ENOMEM. Enforce it with a BUG_ON wrapper. */ int __mem_sharing_unshare_page(struct domain *d, unsigned long gfn, - uint16_t flags); + bool destroy); static inline int mem_sharing_unshare_page(struct domain *d, unsigned long gfn) { -int rc = __mem_sharing_unshare_page(d, gfn, 0); +int rc = __mem_sharing_unshare_page(d, gfn, false); BUG_ON(rc && (rc != -ENOMEM)); return rc; } -- 2.20.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v2 02/20] xen/x86: Make hap_get_allocation accessible
During VM forking we'll copy the parent domain's parameters to the client, including the HAP shadow memory setting that is used for storing the domain's EPT. We'll copy this in the hypervisor instead doing it during toolstack launch to allow the domain to start executing and unsharing memory before (or even completely without) the toolstack. Signed-off-by: Tamas K Lengyel --- xen/arch/x86/mm/hap/hap.c | 3 +-- xen/include/asm-x86/hap.h | 1 + 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/xen/arch/x86/mm/hap/hap.c b/xen/arch/x86/mm/hap/hap.c index 3d93f3451c..c7c7ff6e99 100644 --- a/xen/arch/x86/mm/hap/hap.c +++ b/xen/arch/x86/mm/hap/hap.c @@ -321,8 +321,7 @@ static void hap_free_p2m_page(struct domain *d, struct page_info *pg) } /* Return the size of the pool, rounded up to the nearest MB */ -static unsigned int -hap_get_allocation(struct domain *d) +unsigned int hap_get_allocation(struct domain *d) { unsigned int pg = d->arch.paging.hap.total_pages + d->arch.paging.hap.p2m_pages; diff --git a/xen/include/asm-x86/hap.h b/xen/include/asm-x86/hap.h index b94bfb4ed0..1bf07e49fe 100644 --- a/xen/include/asm-x86/hap.h +++ b/xen/include/asm-x86/hap.h @@ -45,6 +45,7 @@ int hap_track_dirty_vram(struct domain *d, extern const struct paging_mode *hap_paging_get_mode(struct vcpu *); int hap_set_allocation(struct domain *d, unsigned int pages, bool *preempted); +unsigned int hap_get_allocation(struct domain *d); #endif /* XEN_HAP_H */ -- 2.20.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v2 05/20] x86/mem_sharing: make get_two_gfns take locks conditionally
During VM forking the client lock will already be taken. Signed-off-by: Tamas K Lengyel --- xen/arch/x86/mm/mem_sharing.c | 11 ++- xen/include/asm-x86/p2m.h | 10 +- 2 files changed, 11 insertions(+), 10 deletions(-) diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c index 319aaf3074..c0e305ad71 100644 --- a/xen/arch/x86/mm/mem_sharing.c +++ b/xen/arch/x86/mm/mem_sharing.c @@ -954,7 +954,7 @@ static int share_pages(struct domain *sd, gfn_t sgfn, shr_handle_t sh, unsigned long put_count = 0; get_two_gfns(sd, sgfn, _type, NULL, , - cd, cgfn, _type, NULL, , 0, ); + cd, cgfn, _type, NULL, , 0, , true); /* * This tricky business is to avoid two callers deadlocking if @@ -1068,7 +1068,7 @@ err_out: } int mem_sharing_add_to_physmap(struct domain *sd, unsigned long sgfn, shr_handle_t sh, - struct domain *cd, unsigned long cgfn) + struct domain *cd, unsigned long cgfn, bool lock) { struct page_info *spage; int ret = -EINVAL; @@ -1080,7 +1080,7 @@ int mem_sharing_add_to_physmap(struct domain *sd, unsigned long sgfn, shr_handle struct two_gfns tg; get_two_gfns(sd, _gfn(sgfn), _type, NULL, , - cd, _gfn(cgfn), _type, , , 0, ); + cd, _gfn(cgfn), _type, , , 0, , lock); /* Get the source shared page, check and lock */ ret = XENMEM_SHARING_OP_S_HANDLE_INVALID; @@ -1155,7 +1155,8 @@ int mem_sharing_add_to_physmap(struct domain *sd, unsigned long sgfn, shr_handle err_unlock: mem_sharing_page_unlock(spage); err_out: -put_two_gfns(); +if ( lock ) +put_two_gfns(); return ret; } @@ -1574,7 +1575,7 @@ int mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg) sh = mso.u.share.source_handle; cgfn= mso.u.share.client_gfn; -rc = mem_sharing_add_to_physmap(d, sgfn, sh, cd, cgfn); +rc = mem_sharing_add_to_physmap(d, sgfn, sh, cd, cgfn, true); rcu_unlock_domain(cd); } diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h index 94285db1b4..7399c4a897 100644 --- a/xen/include/asm-x86/p2m.h +++ b/xen/include/asm-x86/p2m.h @@ -539,7 +539,7 @@ struct two_gfns { static inline void get_two_gfns(struct domain *rd, gfn_t rgfn, p2m_type_t *rt, p2m_access_t *ra, mfn_t *rmfn, struct domain *ld, gfn_t lgfn, p2m_type_t *lt, p2m_access_t *la, mfn_t *lmfn, -p2m_query_t q, struct two_gfns *rval) +p2m_query_t q, struct two_gfns *rval, bool lock) { mfn_t *first_mfn, *second_mfn, scratch_mfn; p2m_access_t*first_a, *second_a, scratch_a; @@ -569,10 +569,10 @@ do {\ #undef assign_pointers /* Now do the gets */ -*first_mfn = get_gfn_type_access(p2m_get_hostp2m(rval->first_domain), - gfn_x(rval->first_gfn), first_t, first_a, q, NULL); -*second_mfn = get_gfn_type_access(p2m_get_hostp2m(rval->second_domain), - gfn_x(rval->second_gfn), second_t, second_a, q, NULL); +*first_mfn = __get_gfn_type_access(p2m_get_hostp2m(rval->first_domain), +gfn_x(rval->first_gfn), first_t, first_a, q, NULL, lock); +*second_mfn = __get_gfn_type_access(p2m_get_hostp2m(rval->second_domain), +gfn_x(rval->second_gfn), second_t, second_a, q, NULL, lock); } static inline void put_two_gfns(struct two_gfns *arg) -- 2.20.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v2 12/20] x86/mem_sharing: Replace MEM_SHARING_DEBUG with gdprintk
Using XENLOG_ERR level since this is only used in debug paths (ie. it's expected the user already has loglvl=all set). Signed-off-by: Tamas K Lengyel --- xen/arch/x86/mm/mem_sharing.c | 81 ++- 1 file changed, 41 insertions(+), 40 deletions(-) diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c index 6e81e1a895..90b6371e2f 100644 --- a/xen/arch/x86/mm/mem_sharing.c +++ b/xen/arch/x86/mm/mem_sharing.c @@ -49,9 +49,6 @@ typedef struct pg_lock_data { static DEFINE_PER_CPU(pg_lock_data_t, __pld); -#define MEM_SHARING_DEBUG(_f, _a...) \ -debugtrace_printk("mem_sharing_debug: %s(): " _f, __func__, ##_a) - /* Reverse map defines */ #define RMAP_HASHTAB_ORDER 0 #define RMAP_HASHTAB_SIZE \ @@ -491,8 +488,9 @@ static int audit(void) /* If we can't lock it, it's definitely not a shared page */ if ( !mem_sharing_page_lock(pg) ) { - MEM_SHARING_DEBUG("mfn %lx in audit list, but cannot be locked (%lx)!\n", - mfn_x(mfn), pg->u.inuse.type_info); + gdprintk(XENLOG_ERR, +"mfn %lx in audit list, but cannot be locked (%lx)!\n", +mfn_x(mfn), pg->u.inuse.type_info); errors++; continue; } @@ -500,8 +498,9 @@ static int audit(void) /* Check if the MFN has correct type, owner and handle. */ if ( (pg->u.inuse.type_info & PGT_type_mask) != PGT_shared_page ) { - MEM_SHARING_DEBUG("mfn %lx in audit list, but not PGT_shared_page (%lx)!\n", - mfn_x(mfn), pg->u.inuse.type_info & PGT_type_mask); + gdprintk(XENLOG_ERR, +"mfn %lx in audit list, but not PGT_shared_page (%lx)!\n", +mfn_x(mfn), pg->u.inuse.type_info & PGT_type_mask); errors++; continue; } @@ -509,24 +508,24 @@ static int audit(void) /* Check the page owner. */ if ( page_get_owner(pg) != dom_cow ) { - MEM_SHARING_DEBUG("mfn %lx shared, but wrong owner (%hu)!\n", - mfn_x(mfn), page_get_owner(pg)->domain_id); + gdprintk(XENLOG_ERR, "mfn %lx shared, but wrong owner (%hu)!\n", +mfn_x(mfn), page_get_owner(pg)->domain_id); errors++; } /* Check the m2p entry */ if ( !SHARED_M2P(get_gpfn_from_mfn(mfn_x(mfn))) ) { - MEM_SHARING_DEBUG("mfn %lx shared, but wrong m2p entry (%lx)!\n", - mfn_x(mfn), get_gpfn_from_mfn(mfn_x(mfn))); + gdprintk(XENLOG_ERR, "mfn %lx shared, but wrong m2p entry (%lx)!\n", +mfn_x(mfn), get_gpfn_from_mfn(mfn_x(mfn))); errors++; } /* Check we have a list */ if ( (!pg->sharing) || !rmap_has_entries(pg) ) { - MEM_SHARING_DEBUG("mfn %lx shared, but empty gfn list!\n", - mfn_x(mfn)); + gdprintk(XENLOG_ERR, "mfn %lx shared, but empty gfn list!\n", +mfn_x(mfn)); errors++; continue; } @@ -545,24 +544,26 @@ static int audit(void) d = get_domain_by_id(g->domain); if ( d == NULL ) { -MEM_SHARING_DEBUG("Unknown dom: %hu, for PFN=%lx, MFN=%lx\n", - g->domain, g->gfn, mfn_x(mfn)); +gdprintk(XENLOG_ERR, + "Unknown dom: %hu, for PFN=%lx, MFN=%lx\n", + g->domain, g->gfn, mfn_x(mfn)); errors++; continue; } o_mfn = get_gfn_query_unlocked(d, g->gfn, ); if ( !mfn_eq(o_mfn, mfn) ) { -MEM_SHARING_DEBUG("Incorrect P2M for d=%hu, PFN=%lx." - "Expecting MFN=%lx, got %lx\n", - g->domain, g->gfn, mfn_x(mfn), mfn_x(o_mfn)); +gdprintk(XENLOG_ERR, "Incorrect P2M for d=%hu, PFN=%lx." + "Expecting MFN=%lx, got %lx\n", + g->domain, g->gfn, mfn_x(mfn), mfn_x(o_mfn)); errors++; } if ( t != p2m_ram_shared ) { -MEM_SHARING_DEBUG("Incorrect P2M type for d=%hu, PFN=%lx MFN=%lx." - "Expecting t=%d, got %d\n", - g->domain, g->gfn, mfn_x(mfn), p2m_ram_shared, t); +gdprintk(XENLOG_ERR, + "Incorrect P2M type for d=%hu, PFN=%lx MFN=%lx." + "Expecting t=%d, got %d\n", + g->domain, g->gfn, mfn_x(mfn), p2m_ram_shared, t); errors++; } put_domain(d); @@ -571,10 +572,10 @@ static int audit(void) /* The type
[Xen-devel] [PATCH v2 16/20] x86/mem_sharing: check page type count earlier
Signed-off-by: Tamas K Lengyel --- xen/arch/x86/mm/mem_sharing.c | 13 ++--- 1 file changed, 6 insertions(+), 7 deletions(-) diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c index b3607b1bce..c44e7f2299 100644 --- a/xen/arch/x86/mm/mem_sharing.c +++ b/xen/arch/x86/mm/mem_sharing.c @@ -649,19 +649,18 @@ static int page_make_sharable(struct domain *d, return -EBUSY; } -/* Change page type and count atomically */ -if ( !get_page_and_type(page, d, PGT_shared_page) ) +/* Check if page is already typed and bail early if it is */ +if ( (page->u.inuse.type_info & PGT_count_mask) != 1 ) { spin_unlock(>page_alloc_lock); -return -EINVAL; +return -EEXIST; } -/* Check it wasn't already sharable and undo if it was */ -if ( (page->u.inuse.type_info & PGT_count_mask) != 1 ) +/* Change page type and count atomically */ +if ( !get_page_and_type(page, d, PGT_shared_page) ) { spin_unlock(>page_alloc_lock); -put_page_and_type(page); -return -EEXIST; +return -EINVAL; } /* -- 2.20.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v2 08/20] x86/mem_sharing: define mem_sharing_domain to hold some scattered variables
Create struct mem_sharing_domain under hvm_domain and move mem sharing variables into it from p2m_domain and hvm_domain. Expose the mem_sharing_enabled macro to be used consistently across Xen. Remove some duplicate calls to mem_sharing_enabled in mem_sharing.c Signed-off-by: Tamas K Lengyel --- xen/arch/x86/mm/mem_sharing.c | 30 +- xen/drivers/passthrough/pci.c | 3 +-- xen/include/asm-x86/hvm/domain.h | 6 +- xen/include/asm-x86/mem_sharing.h | 16 xen/include/asm-x86/p2m.h | 4 5 files changed, 27 insertions(+), 32 deletions(-) diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c index c0e305ad71..5d81730315 100644 --- a/xen/arch/x86/mm/mem_sharing.c +++ b/xen/arch/x86/mm/mem_sharing.c @@ -197,9 +197,6 @@ static inline shr_handle_t get_next_handle(void) return x + 1; } -#define mem_sharing_enabled(d) \ -(is_hvm_domain(d) && (d)->arch.hvm.mem_sharing_enabled) - static atomic_t nr_saved_mfns = ATOMIC_INIT(0); static atomic_t nr_shared_mfns = ATOMIC_INIT(0); @@ -1300,6 +1297,7 @@ int __mem_sharing_unshare_page(struct domain *d, int relinquish_shared_pages(struct domain *d) { int rc = 0; +struct mem_sharing_domain *msd = >arch.hvm.mem_sharing; struct p2m_domain *p2m = p2m_get_hostp2m(d); unsigned long gfn, count = 0; @@ -1307,7 +1305,7 @@ int relinquish_shared_pages(struct domain *d) return 0; p2m_lock(p2m); -for ( gfn = p2m->next_shared_gfn_to_relinquish; +for ( gfn = msd->next_shared_gfn_to_relinquish; gfn <= p2m->max_mapped_pfn; gfn++ ) { p2m_access_t a; @@ -1342,7 +1340,7 @@ int relinquish_shared_pages(struct domain *d) { if ( hypercall_preempt_check() ) { -p2m->next_shared_gfn_to_relinquish = gfn + 1; +msd->next_shared_gfn_to_relinquish = gfn + 1; rc = -ERESTART; break; } @@ -1428,7 +1426,7 @@ int mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg) /* Only HAP is supported */ rc = -ENODEV; -if ( !hap_enabled(d) || !d->arch.hvm.mem_sharing_enabled ) +if ( !mem_sharing_enabled(d) ) goto out; switch ( mso.op ) @@ -1437,10 +1435,6 @@ int mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg) { shr_handle_t handle; -rc = -EINVAL; -if ( !mem_sharing_enabled(d) ) -goto out; - rc = nominate_page(d, _gfn(mso.u.nominate.u.gfn), 0, ); mso.u.nominate.handle = handle; } @@ -1452,9 +1446,6 @@ int mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg) gfn_t gfn; shr_handle_t handle; -rc = -EINVAL; -if ( !mem_sharing_enabled(d) ) -goto out; rc = mem_sharing_gref_to_gfn(d->grant_table, gref, , NULL); if ( rc < 0 ) goto out; @@ -1470,10 +1461,6 @@ int mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg) struct domain *cd; shr_handle_t sh, ch; -rc = -EINVAL; -if ( !mem_sharing_enabled(d) ) -goto out; - rc = rcu_lock_live_remote_domain_by_id(mso.u.share.client_domain, ); if ( rc ) @@ -1540,10 +1527,6 @@ int mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg) struct domain *cd; shr_handle_t sh; -rc = -EINVAL; -if ( !mem_sharing_enabled(d) ) -goto out; - rc = rcu_lock_live_remote_domain_by_id(mso.u.share.client_domain, ); if ( rc ) @@ -1602,9 +1585,6 @@ int mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg) mso.u.range.opaque > mso.u.range.last_gfn) ) goto out; -if ( !mem_sharing_enabled(d) ) -goto out; - rc = rcu_lock_live_remote_domain_by_id(mso.u.range.client_domain, ); if ( rc ) @@ -1708,7 +1688,7 @@ int mem_sharing_domctl(struct domain *d, struct xen_domctl_mem_sharing_op *mec) if ( unlikely(is_iommu_enabled(d) && mec->u.enable) ) rc = -EXDEV; else -d->arch.hvm.mem_sharing_enabled = mec->u.enable; +d->arch.hvm.mem_sharing.enabled = mec->u.enable; } break; diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c index c07a63981a..65d1d457ff 100644 --- a/xen/drivers/passthrough/pci.c +++ b/xen/drivers/passthrough/pci.c @@ -1498,8 +1498,7 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag) /* Prevent device
[Xen-devel] [PATCH v2 17/20] xen/mem_sharing: VM forking
VM forking is the process of creating a domain with an empty memory space and a parent domain specified from which to populate the memory when necessary. For the new domain to be functional the VM state is copied over as part of the fork operation (HVM params, hap allocation, etc). Signed-off-by: Tamas K Lengyel --- xen/arch/x86/hvm/hvm.c| 2 +- xen/arch/x86/mm/mem_sharing.c | 228 ++ xen/arch/x86/mm/p2m.c | 11 +- xen/include/asm-x86/mem_sharing.h | 20 ++- xen/include/public/memory.h | 5 + xen/include/xen/sched.h | 1 + 6 files changed, 263 insertions(+), 4 deletions(-) diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c index 8f90841813..cafd07c67d 100644 --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -1913,7 +1913,7 @@ int hvm_hap_nested_page_fault(paddr_t gpa, unsigned long gla, } #endif -/* Spurious fault? PoD and log-dirty also take this path. */ +/* Spurious fault? PoD, log-dirty and VM forking also take this path. */ if ( p2m_is_ram(p2mt) ) { rc = 1; diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c index c44e7f2299..e93ad2ec5a 100644 --- a/xen/arch/x86/mm/mem_sharing.c +++ b/xen/arch/x86/mm/mem_sharing.c @@ -22,11 +22,13 @@ #include #include +#include #include #include #include #include #include +#include #include #include #include @@ -36,6 +38,9 @@ #include #include #include +#include +#include +#include #include #include "mm-locks.h" @@ -1423,6 +1428,200 @@ static inline int mem_sharing_control(struct domain *d, bool enable) return 0; } +/* + * Forking a page only gets called when the VM faults due to no entry being + * in the EPT for the access. Depending on the type of access we either + * populate the physmap with a shared entry for read-only access or + * fork the page if its a write access. + * + * The client p2m is already locked so we only need to lock + * the parent's here. + */ +int mem_sharing_fork_page(struct domain *d, gfn_t gfn, bool unsharing) +{ +int rc = -ENOENT; +shr_handle_t handle; +struct domain *parent; +struct p2m_domain *p2m; +unsigned long gfn_l = gfn_x(gfn); +mfn_t mfn, new_mfn; +p2m_type_t p2mt; +struct page_info *page; + +if ( !mem_sharing_is_fork(d) ) +return -ENOENT; + +parent = d->parent; + +if ( !unsharing ) +{ +/* For read-only accesses we just add a shared entry to the physmap */ +while ( parent ) +{ +if ( !(rc = nominate_page(parent, gfn, 0, )) ) +break; + +parent = parent->parent; +} + +if ( !rc ) +{ +/* The client's p2m is already locked */ +struct p2m_domain *pp2m = p2m_get_hostp2m(parent); + +p2m_lock(pp2m); +rc = add_to_physmap(parent, gfn_l, handle, d, gfn_l, false); +p2m_unlock(pp2m); + +if ( !rc ) +return 0; +} +} + +/* + * If it's a write access (ie. unsharing) or if adding a shared entry to + * the physmap failed we'll fork the page directly. + */ +p2m = p2m_get_hostp2m(d); +parent = d->parent; + +while ( parent ) +{ +mfn = get_gfn_query(parent, gfn_l, ); + +if ( mfn_valid(mfn) && p2m_is_any_ram(p2mt) ) +break; + +put_gfn(parent, gfn_l); +parent = parent->parent; +} + +if ( !parent ) +return -ENOENT; + +if ( !(page = alloc_domheap_page(d, 0)) ) +{ +put_gfn(parent, gfn_l); +return -ENOMEM; +} + +new_mfn = page_to_mfn(page); +copy_domain_page(new_mfn, mfn); +set_gpfn_from_mfn(mfn_x(new_mfn), gfn_l); + +put_gfn(parent, gfn_l); + +return p2m->set_entry(p2m, gfn, new_mfn, PAGE_ORDER_4K, p2m_ram_rw, + p2m->default_access, -1); +} + +static int bring_up_vcpus(struct domain *cd, struct cpupool *cpupool) +{ +int ret; +unsigned int i; + +if ( (ret = cpupool_move_domain(cd, cpupool)) ) +return ret; + +for ( i = 0; i < cd->max_vcpus; i++ ) +{ +if ( cd->vcpu[i] ) +continue; + +if ( !vcpu_create(cd, i) ) +return -EINVAL; +} + +domain_update_node_affinity(cd); +return 0; +} + +static int fork_hap_allocation(struct domain *d, struct domain *cd) +{ +int rc; +bool preempted; +unsigned long mb = hap_get_allocation(d); + +if ( mb == hap_get_allocation(cd) ) +return 0; + +paging_lock(cd); +rc = hap_set_allocation(cd, mb << (20 - PAGE_SHIFT), ); +paging_unlock(cd); + +if ( rc ) +return rc; + +if ( preempted ) +return -ERESTART; + +return 0; +} + +static int fork_hvm(struct domain *d, struct domain *cd) +{ +int rc, i; +struct hvm_domain_context c = { 0 }; +uint32_t tsc_mode; +uint32_t gtsc_khz; +
[Xen-devel] [PATCH v2 06/20] x86/mem_sharing: drop flags from mem_sharing_unshare_page
All callers pass 0 in. Signed-off-by: Tamas K Lengyel Reviewed-by: Wei Liu --- xen/arch/x86/hvm/hvm.c| 2 +- xen/arch/x86/mm/p2m.c | 5 ++--- xen/include/asm-x86/mem_sharing.h | 8 +++- 3 files changed, 6 insertions(+), 9 deletions(-) diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c index 1e888b403b..e055114922 100644 --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -1902,7 +1902,7 @@ int hvm_hap_nested_page_fault(paddr_t gpa, unsigned long gla, if ( npfec.write_access && (p2mt == p2m_ram_shared) ) { ASSERT(p2m_is_hostp2m(p2m)); -sharing_enomem = mem_sharing_unshare_page(currd, gfn, 0); +sharing_enomem = mem_sharing_unshare_page(currd, gfn); rc = 1; goto out_put_gfn; } diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c index 3119269073..baea632acc 100644 --- a/xen/arch/x86/mm/p2m.c +++ b/xen/arch/x86/mm/p2m.c @@ -515,7 +515,7 @@ mfn_t __get_gfn_type_access(struct p2m_domain *p2m, unsigned long gfn_l, * Try to unshare. If we fail, communicate ENOMEM without * sleeping. */ -if ( mem_sharing_unshare_page(p2m->domain, gfn_l, 0) < 0 ) +if ( mem_sharing_unshare_page(p2m->domain, gfn_l) < 0 ) mem_sharing_notify_enomem(p2m->domain, gfn_l, false); mfn = p2m->get_entry(p2m, gfn, t, a, q, page_order, NULL); } @@ -896,8 +896,7 @@ guest_physmap_add_entry(struct domain *d, gfn_t gfn, mfn_t mfn, { /* Do an unshare to cleanly take care of all corner cases. */ int rc; -rc = mem_sharing_unshare_page(p2m->domain, - gfn_x(gfn_add(gfn, i)), 0); +rc = mem_sharing_unshare_page(p2m->domain, gfn_x(gfn_add(gfn, i))); if ( rc ) { p2m_unlock(p2m); diff --git a/xen/include/asm-x86/mem_sharing.h b/xen/include/asm-x86/mem_sharing.h index 7d40e38563..0a9192d0e2 100644 --- a/xen/include/asm-x86/mem_sharing.h +++ b/xen/include/asm-x86/mem_sharing.h @@ -70,10 +70,9 @@ int __mem_sharing_unshare_page(struct domain *d, static inline int mem_sharing_unshare_page(struct domain *d, - unsigned long gfn, - uint16_t flags) + unsigned long gfn) { -int rc = __mem_sharing_unshare_page(d, gfn, flags); +int rc = __mem_sharing_unshare_page(d, gfn, 0); BUG_ON(rc && (rc != -ENOMEM)); return rc; } @@ -117,8 +116,7 @@ static inline unsigned int mem_sharing_get_nr_shared_mfns(void) } static inline -int mem_sharing_unshare_page(struct domain *d, unsigned long gfn, - uint16_t flags) +int mem_sharing_unshare_page(struct domain *d, unsigned long gfn) { ASSERT_UNREACHABLE(); return -EOPNOTSUPP; -- 2.20.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v2 04/20] x86/mem_sharing: cleanup code and comments in various locations
No functional changes. Signed-off-by: Tamas K Lengyel --- xen/arch/x86/hvm/hvm.c| 11 +- xen/arch/x86/mm/mem_sharing.c | 342 +- xen/arch/x86/mm/p2m.c | 17 +- xen/include/asm-x86/mem_sharing.h | 51 +++-- 4 files changed, 236 insertions(+), 185 deletions(-) diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c index 5a3a962fbb..1e888b403b 100644 --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -1902,12 +1902,11 @@ int hvm_hap_nested_page_fault(paddr_t gpa, unsigned long gla, if ( npfec.write_access && (p2mt == p2m_ram_shared) ) { ASSERT(p2m_is_hostp2m(p2m)); -sharing_enomem = -(mem_sharing_unshare_page(currd, gfn, 0) < 0); +sharing_enomem = mem_sharing_unshare_page(currd, gfn, 0); rc = 1; goto out_put_gfn; } - + /* Spurious fault? PoD and log-dirty also take this path. */ if ( p2m_is_ram(p2mt) ) { @@ -1953,9 +1952,11 @@ int hvm_hap_nested_page_fault(paddr_t gpa, unsigned long gla, __put_gfn(p2m, gfn); __put_gfn(hostp2m, gfn); out: -/* All of these are delayed until we exit, since we might +/* + * All of these are delayed until we exit, since we might * sleep on event ring wait queues, and we must not hold - * locks in such circumstance */ + * locks in such circumstance. + */ if ( paged ) p2m_mem_paging_populate(currd, gfn); if ( sharing_enomem ) diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c index efb8821768..319aaf3074 100644 --- a/xen/arch/x86/mm/mem_sharing.c +++ b/xen/arch/x86/mm/mem_sharing.c @@ -59,8 +59,10 @@ static DEFINE_PER_CPU(pg_lock_data_t, __pld); #define RMAP_USES_HASHTAB(page) \ ((page)->sharing->hash_table.flag == NULL) #define RMAP_HEAVY_SHARED_PAGE RMAP_HASHTAB_SIZE -/* A bit of hysteresis. We don't want to be mutating between list and hash - * table constantly. */ +/* + * A bit of hysteresis. We don't want to be mutating between list and hash + * table constantly. + */ #define RMAP_LIGHT_SHARED_PAGE (RMAP_HEAVY_SHARED_PAGE >> 2) #if MEM_SHARING_AUDIT @@ -88,7 +90,7 @@ static inline void page_sharing_dispose(struct page_info *page) { /* Unlikely given our thresholds, but we should be careful. */ if ( unlikely(RMAP_USES_HASHTAB(page)) ) -free_xenheap_pages(page->sharing->hash_table.bucket, +free_xenheap_pages(page->sharing->hash_table.bucket, RMAP_HASHTAB_ORDER); spin_lock(_audit_lock); @@ -105,7 +107,7 @@ static inline void page_sharing_dispose(struct page_info *page) { /* Unlikely given our thresholds, but we should be careful. */ if ( unlikely(RMAP_USES_HASHTAB(page)) ) -free_xenheap_pages(page->sharing->hash_table.bucket, +free_xenheap_pages(page->sharing->hash_table.bucket, RMAP_HASHTAB_ORDER); xfree(page->sharing); } @@ -122,8 +124,8 @@ static inline void page_sharing_dispose(struct page_info *page) * Nesting may happen when sharing (and locking) two pages. * Deadlock is avoided by locking pages in increasing order. * All memory sharing code paths take the p2m lock of the affected gfn before - * taking the lock for the underlying page. We enforce ordering between page_lock - * and p2m_lock using an mm-locks.h construct. + * taking the lock for the underlying page. We enforce ordering between + * page_lock and p2m_lock using an mm-locks.h construct. * * TODO: Investigate if PGT_validated is necessary. */ @@ -168,7 +170,7 @@ static inline bool mem_sharing_page_lock(struct page_info *pg) if ( rc ) { preempt_disable(); -page_sharing_mm_post_lock(>mm_unlock_level, +page_sharing_mm_post_lock(>mm_unlock_level, >recurse_count); } return rc; @@ -178,7 +180,7 @@ static inline void mem_sharing_page_unlock(struct page_info *pg) { pg_lock_data_t *pld = &(this_cpu(__pld)); -page_sharing_mm_unlock(pld->mm_unlock_level, +page_sharing_mm_unlock(pld->mm_unlock_level, >recurse_count); preempt_enable(); _page_unlock(pg); @@ -186,7 +188,7 @@ static inline void mem_sharing_page_unlock(struct page_info *pg) static inline shr_handle_t get_next_handle(void) { -/* Get the next handle get_page style */ +/* Get the next handle get_page style */ uint64_t x, y = next_handle; do { x = y; @@ -198,24 +200,26 @@ static inline shr_handle_t get_next_handle(void) #define mem_sharing_enabled(d) \ (is_hvm_domain(d) && (d)->arch.hvm.mem_sharing_enabled) -static atomic_t nr_saved_mfns = ATOMIC_INIT(0); +static atomic_t nr_saved_mfns = ATOMIC_INIT(0); static atomic_t nr_shared_mfns = ATOMIC_INIT(0); -/** Reverse map **/ -/* Every shared frame keeps a reverse map (rmap) of tuples that +/* + * Reverse map + * + * Every
[Xen-devel] [PATCH v2 00/20] VM forking
The following series implements VM forking for Intel HVM guests to allow for the fast creation of identical VMs without the assosciated high startup costs of booting or restoring the VM from a savefile. JIRA issue: https://xenproject.atlassian.net/browse/XEN-89 The main design goal with this series has been to reduce the time of creating the VM fork as much as possible. To achieve this the VM forking process is split into two steps: 1) forking the VM on the hypervisor side; 2) starting QEMU to handle the backed for emulated devices. Step 1) involves creating a VM using the new "xl fork-vm" command. The parent VM is expected to remain paused after forks are created from it (which is different then what process forking normally entails). During this forking operation the HVM context and VM settings are copied over to the new forked VM. This operation is fast and it allows the forked VM to be unpaused and to be monitored and accessed via VMI. Note however that without its device model running (depending on what is executing in the VM) it is bound to misbehave/crash when its trying to access devices that would be emulated by QEMU. We anticipate that for certain use-cases this would be an acceptable situation, in case for example when fuzzing is performed of code segments that don't access such devices. Step 2) involves launching QEMU to support the forked VM, which requires the QEMU Xen savefile to be generated manually from the parent VM. This can be accomplished simply by connecting to its QMP socket and issuing the "xen-save-devices-state" command as documented by QEMU: https://github.com/qemu/qemu/blob/master/docs/xen-save-devices-state.txt Once the QEMU Xen savefile is generated the new "xl fork-launch-dm" command is used to launch QEMU and load the specified savefile for it. At runtime the forked VM starts running with an empty p2m which gets lazily populated when the VM generates EPT faults, similar to how altp2m views are populated. If the memory access is a read-only access, the p2m entry is populated with a memory shared entry with its parent. For write memory accesses or in case memory sharing wasn't possible (for example in case a reference is held by a third party), a new page is allocated and the page contents are copied over from the parent VM. Forks can be further forked if needed, thus allowing for further memory savings. A VM fork reset hypercall is also added that allows the fork to be reset to the state it was just after a fork. This is an optimization for cases where the forks are very short-lived and run without a device model, so resetting saves some time compared to creating a brand new fork. The series has been tested with both Linux and Windows VMs and functions as expected. VM forking time has been measured to be 0.018s, device model launch to be around 1s depending largely on the number of devices being emulated. Patches 1-2 implement changes to existing internal Xen APIs to make VM forking possible. Patches 3-4 are simple code-formatting fixes for the toolstack and Xen for the memory sharing paths with no functional changes. Patches 5-16 are code-cleanups and adjustments of to Xen memory sharing subsystem with no functional changes. Patch 17 adds the hypervisor-side code implementing VM forking. Patch 18 is integration of mem_access with forked VMs. Patch 19 implements the VM fork reset operation hypervisor side bits. Patch 20 adds the toolstack-side code implementing VM forking and reset. Tamas K Lengyel (20): x86: make hvm_{get/set}_param accessible xen/x86: Make hap_get_allocation accessible tools/libxc: clean up memory sharing files x86/mem_sharing: cleanup code and comments in various locations x86/mem_sharing: make get_two_gfns take locks conditionally x86/mem_sharing: drop flags from mem_sharing_unshare_page x86/mem_sharing: don't try to unshare twice during page fault x86/mem_sharing: define mem_sharing_domain to hold some scattered variables x86/mem_sharing: Use INVALID_MFN and p2m_is_shared in relinquish_shared_pages x86/mem_sharing: Make add_to_physmap static and shorten name x86/mem_sharing: Convert MEM_SHARING_DESTROY_GFN to a bool x86/mem_sharing: Replace MEM_SHARING_DEBUG with gdprintk x86/mem_sharing: ASSERT that p2m_set_entry succeeds x86/mem_sharing: Enable mem_sharing on first memop x86/mem_sharing: Skip xen heap pages in memshr nominate x86/mem_sharing: check page type count earlier xen/mem_sharing: VM forking xen/mem_access: Use __get_gfn_type_access in set_mem_access x86/mem_sharing: reset a fork xen/tools: VM forking toolstack side tools/libxc/include/xenctrl.h | 30 +- tools/libxc/xc_memshr.c | 34 +- tools/libxl/libxl.h | 7 + tools/libxl/libxl_create.c| 237 +--- tools/libxl/libxl_dm.c| 2 +- tools/libxl/libxl_dom.c | 83 ++- tools/libxl/libxl_internal.h | 1 + tools/libxl/libxl_types.idl | 1 +
[Xen-devel] [PATCH v2 03/20] tools/libxc: clean up memory sharing files
No functional changes. Signed-off-by: Tamas K Lengyel Acked-by: Wei Liu --- tools/libxc/include/xenctrl.h | 24 tools/libxc/xc_memshr.c | 12 ++-- 2 files changed, 18 insertions(+), 18 deletions(-) diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h index f4431687b3..b5ffa53d55 100644 --- a/tools/libxc/include/xenctrl.h +++ b/tools/libxc/include/xenctrl.h @@ -2060,7 +2060,7 @@ int xc_monitor_emulate_each_rep(xc_interface *xch, uint32_t domain_id, * * Sharing is supported only on the x86 architecture in 64 bit mode, with * Hardware-Assisted Paging (i.e. Intel EPT, AMD NPT). Moreover, AMD NPT - * support is considered experimental. + * support is considered experimental. * Calls below return ENOSYS if not in the x86_64 architecture. * Calls below return ENODEV if the domain does not support HAP. @@ -2107,13 +2107,13 @@ int xc_memshr_control(xc_interface *xch, * EINVAL or EACCESS if the request is denied by the security policy */ -int xc_memshr_ring_enable(xc_interface *xch, +int xc_memshr_ring_enable(xc_interface *xch, uint32_t domid, uint32_t *port); /* Disable the ring for ENOMEM communication. * May fail with EINVAL if the ring was not enabled in the first place. */ -int xc_memshr_ring_disable(xc_interface *xch, +int xc_memshr_ring_disable(xc_interface *xch, uint32_t domid); /* @@ -2126,7 +2126,7 @@ int xc_memshr_ring_disable(xc_interface *xch, int xc_memshr_domain_resume(xc_interface *xch, uint32_t domid); -/* Select a page for sharing. +/* Select a page for sharing. * * A 64 bit opaque handle will be stored in handle. The hypervisor ensures * that if the page is modified, the handle will be invalidated, and future @@ -2155,7 +2155,7 @@ int xc_memshr_nominate_gref(xc_interface *xch, /* The three calls below may fail with * 10 (or -XENMEM_SHARING_OP_S_HANDLE_INVALID) if the handle passed as source - * is invalid. + * is invalid. * 9 (or -XENMEM_SHARING_OP_C_HANDLE_INVALID) if the handle passed as client is * invalid. */ @@ -2168,7 +2168,7 @@ int xc_memshr_nominate_gref(xc_interface *xch, * * After successful sharing, the client handle becomes invalid. Both tuples point to the same mfn with the same handle, the one specified as - * source. Either 3-tuple can be specified later for further re-sharing. + * source. Either 3-tuple can be specified later for further re-sharing. */ int xc_memshr_share_gfns(xc_interface *xch, uint32_t source_domain, @@ -2193,7 +2193,7 @@ int xc_memshr_share_grefs(xc_interface *xch, /* Allows to add to the guest physmap of the client domain a shared frame * directly. * - * May additionally fail with + * May additionally fail with * 9 (-XENMEM_SHARING_OP_C_HANDLE_INVALID) if the physmap entry for the gfn is * not suitable. * ENOMEM if internal data structures cannot be allocated. @@ -,7 +,7 @@ int xc_memshr_range_share(xc_interface *xch, uint64_t last_gfn); /* Debug calls: return the number of pages referencing the shared frame backing - * the input argument. Should be one or greater. + * the input argument. Should be one or greater. * * May fail with EINVAL if there is no backing shared frame for the input * argument. @@ -2235,9 +2235,9 @@ int xc_memshr_debug_gref(xc_interface *xch, uint32_t domid, grant_ref_t gref); -/* Audits the share subsystem. - * - * Returns ENOSYS if not supported (may not be compiled into the hypervisor). +/* Audits the share subsystem. + * + * Returns ENOSYS if not supported (may not be compiled into the hypervisor). * * Returns the number of errors found during auditing otherwise. May be (should * be!) zero. @@ -2273,7 +2273,7 @@ long xc_sharing_freed_pages(xc_interface *xch); * should return 1. (And dominfo(d) for each of the two domains should return 1 * as well). * - * Note that some of these sharing_used_frames may be referenced by + * Note that some of these sharing_used_frames may be referenced by * a single domain page, and thus not realize any savings. The same * applies to some of the pages counted in dominfo(d)->shr_pages. */ diff --git a/tools/libxc/xc_memshr.c b/tools/libxc/xc_memshr.c index d5e135e0d9..5ef56a6933 100644 --- a/tools/libxc/xc_memshr.c +++ b/tools/libxc/xc_memshr.c @@ -41,7 +41,7 @@ int xc_memshr_control(xc_interface *xch, return do_domctl(xch, ); } -int xc_memshr_ring_enable(xc_interface *xch, +int xc_memshr_ring_enable(xc_interface *xch, uint32_t domid, uint32_t *port) { @@ -57,7 +57,7 @@ int xc_memshr_ring_enable(xc_interface *xch, port); } -int xc_memshr_ring_disable(xc_interface *xch, +int xc_memshr_ring_disable(xc_interface *xch,
[Xen-devel] [PATCH v2 10/20] x86/mem_sharing: Make add_to_physmap static and shorten name
It's not being called from outside mem_sharing.c Signed-off-by: Tamas K Lengyel --- xen/arch/x86/mm/mem_sharing.c | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c index 1b7b520ccf..fc1d8be1eb 100644 --- a/xen/arch/x86/mm/mem_sharing.c +++ b/xen/arch/x86/mm/mem_sharing.c @@ -1064,8 +1064,9 @@ err_out: return ret; } -int mem_sharing_add_to_physmap(struct domain *sd, unsigned long sgfn, shr_handle_t sh, - struct domain *cd, unsigned long cgfn, bool lock) +static +int add_to_physmap(struct domain *sd, unsigned long sgfn, shr_handle_t sh, + struct domain *cd, unsigned long cgfn, bool lock) { struct page_info *spage; int ret = -EINVAL; @@ -1558,7 +1559,7 @@ int mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg) sh = mso.u.share.source_handle; cgfn= mso.u.share.client_gfn; -rc = mem_sharing_add_to_physmap(d, sgfn, sh, cd, cgfn, true); +rc = add_to_physmap(d, sgfn, sh, cd, cgfn, true); rcu_unlock_domain(cd); } -- 2.20.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v2 01/20] x86: make hvm_{get/set}_param accessible
Currently the hvm parameters are only accessible via the HVMOP hypercalls. By exposing hvm_{get/set}_param it will be possible for VM forking to copy the parameters directly into the clone domain. Signed-off-by: Tamas K Lengyel --- xen/arch/x86/hvm/hvm.c| 169 -- xen/include/asm-x86/hvm/hvm.h | 4 + 2 files changed, 106 insertions(+), 67 deletions(-) diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c index 614ed60fe4..5a3a962fbb 100644 --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -4072,16 +4072,17 @@ static int hvmop_set_evtchn_upcall_vector( } static int hvm_allow_set_param(struct domain *d, - const struct xen_hvm_param *a) + uint32_t index, + uint64_t new_value) { -uint64_t value = d->arch.hvm.params[a->index]; +uint64_t value = d->arch.hvm.params[index]; int rc; rc = xsm_hvm_param(XSM_TARGET, d, HVMOP_set_param); if ( rc ) return rc; -switch ( a->index ) +switch ( index ) { /* The following parameters can be set by the guest. */ case HVM_PARAM_CALLBACK_IRQ: @@ -4114,7 +4115,7 @@ static int hvm_allow_set_param(struct domain *d, if ( rc ) return rc; -switch ( a->index ) +switch ( index ) { /* The following parameters should only be changed once. */ case HVM_PARAM_VIRIDIAN: @@ -4124,7 +4125,7 @@ static int hvm_allow_set_param(struct domain *d, case HVM_PARAM_NR_IOREQ_SERVER_PAGES: case HVM_PARAM_ALTP2M: case HVM_PARAM_MCA_CAP: -if ( value != 0 && a->value != value ) +if ( value != 0 && new_value != value ) rc = -EEXIST; break; default: @@ -4134,13 +4135,11 @@ static int hvm_allow_set_param(struct domain *d, return rc; } -static int hvmop_set_param( +int hvmop_set_param( XEN_GUEST_HANDLE_PARAM(xen_hvm_param_t) arg) { -struct domain *curr_d = current->domain; struct xen_hvm_param a; struct domain *d; -struct vcpu *v; int rc; if ( copy_from_guest(, arg, 1) ) @@ -4160,23 +4159,42 @@ static int hvmop_set_param( if ( !is_hvm_domain(d) ) goto out; -rc = hvm_allow_set_param(d, ); +rc = hvm_set_param(d, a.index, a.value); + + out: +rcu_unlock_domain(d); +return rc; +} + +int hvm_set_param( +struct domain *d, +uint32_t index, +uint64_t value) +{ +struct domain *curr_d = current->domain; +int rc; +struct vcpu *v; + +if ( index >= HVM_NR_PARAMS ) +return -EINVAL; + +rc = hvm_allow_set_param(d, index, value); if ( rc ) goto out; -switch ( a.index ) +switch ( index ) { case HVM_PARAM_CALLBACK_IRQ: -hvm_set_callback_via(d, a.value); +hvm_set_callback_via(d, value); hvm_latch_shinfo_size(d); break; case HVM_PARAM_TIMER_MODE: -if ( a.value > HVMPTM_one_missed_tick_pending ) +if ( value > HVMPTM_one_missed_tick_pending ) rc = -EINVAL; break; case HVM_PARAM_VIRIDIAN: -if ( (a.value & ~HVMPV_feature_mask) || - !(a.value & HVMPV_base_freq) ) +if ( (value & ~HVMPV_feature_mask) || + !(value & HVMPV_base_freq) ) rc = -EINVAL; break; case HVM_PARAM_IDENT_PT: @@ -4186,7 +4204,7 @@ static int hvmop_set_param( */ if ( !paging_mode_hap(d) || !cpu_has_vmx ) { -d->arch.hvm.params[a.index] = a.value; +d->arch.hvm.params[index] = value; break; } @@ -4201,7 +4219,7 @@ static int hvmop_set_param( rc = 0; domain_pause(d); -d->arch.hvm.params[a.index] = a.value; +d->arch.hvm.params[index] = value; for_each_vcpu ( d, v ) paging_update_cr3(v, false); domain_unpause(d); @@ -4210,23 +4228,23 @@ static int hvmop_set_param( break; case HVM_PARAM_DM_DOMAIN: /* The only value this should ever be set to is DOMID_SELF */ -if ( a.value != DOMID_SELF ) +if ( value != DOMID_SELF ) rc = -EINVAL; -a.value = curr_d->domain_id; +value = curr_d->domain_id; break; case HVM_PARAM_ACPI_S_STATE: rc = 0; -if ( a.value == 3 ) +if ( value == 3 ) hvm_s3_suspend(d); -else if ( a.value == 0 ) +else if ( value == 0 ) hvm_s3_resume(d); else rc = -EINVAL; break; case HVM_PARAM_ACPI_IOPORTS_LOCATION: -rc = pmtimer_change_ioport(d, a.value); +rc = pmtimer_change_ioport(d, value); break; case HVM_PARAM_MEMORY_EVENT_CR0: case HVM_PARAM_MEMORY_EVENT_CR3: @@ -4241,24 +4259,24 @@ static int hvmop_set_param( rc = xsm_hvm_param_nested(XSM_PRIV, d); if ( rc ) break; -if (
Re: [Xen-devel] [PATCH] x86/save: reserve HVM save record numbers that have been consumed...
On Wed, Dec 18, 2019 at 04:09:25PM +, Paul Durrant wrote: > ...for patches not (yet) upstream. > > This patch is simply reserving save record number space to avoid the > risk of clashes between existent downstream changes made by Amazon and > future upstream changes which may be incompatible. > > Signed-off-by: Paul Durrant Reviewed-by: Wei Liu ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [ANNOUNCEMENT] Xen 4.13 is released
On 18/12/2019 18:00, Juergen Gross wrote: > Dear community members, > > I'm pleased to announce that Xen 4.13.0 is released. > > Thanks everyone who contributed to this release. This release would > not have happened without all the awesome contributions from around > the globe. > > Regards, > > Juergen Gross (on behalf of the Xen Project Hypervisor team) Thanks for your work as release manager ! -- Sander ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v13 5/5] xen/blkback: Consistently insert one empty line between functions
From: SeongJae Park The number of empty lines between functions in the xenbus.c is inconsistent. This trivial style cleanup commit fixes the file to consistently place only one empty line. Acked-by: Roger Pau Monné Signed-off-by: SeongJae Park --- drivers/block/xen-blkback/xenbus.c | 7 ++- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c index 24172c180f5f..c7f820db190a 100644 --- a/drivers/block/xen-blkback/xenbus.c +++ b/drivers/block/xen-blkback/xenbus.c @@ -432,7 +432,6 @@ static void xenvbd_sysfs_delif(struct xenbus_device *dev) device_remove_file(>dev, _attr_physical_device); } - static void xen_vbd_free(struct xen_vbd *vbd) { if (vbd->bdev) @@ -489,6 +488,7 @@ static int xen_vbd_create(struct xen_blkif *blkif, blkif_vdev_t handle, handle, blkif->domid); return 0; } + static int xen_blkbk_remove(struct xenbus_device *dev) { struct backend_info *be = dev_get_drvdata(>dev); @@ -572,6 +572,7 @@ static void xen_blkbk_discard(struct xenbus_transaction xbt, struct backend_info if (err) dev_warn(>dev, "writing feature-discard (%d)", err); } + int xen_blkbk_barrier(struct xenbus_transaction xbt, struct backend_info *be, int state) { @@ -656,7 +657,6 @@ static int xen_blkbk_probe(struct xenbus_device *dev, return err; } - /* * Callback received when the hotplug scripts have placed the physical-device * node. Read it and the mode node, and create a vbd. If the frontend is @@ -748,7 +748,6 @@ static void backend_changed(struct xenbus_watch *watch, } } - /* * Callback received when the frontend's state changes. */ @@ -823,7 +822,6 @@ static void frontend_changed(struct xenbus_device *dev, } } - /* Once a memory pressure is detected, squeeze free page pools for a while. */ static unsigned int buffer_squeeze_duration_ms = 10; module_param_named(buffer_squeeze_duration_ms, @@ -846,7 +844,6 @@ static void reclaim_memory(struct xenbus_device *dev) /* ** Connection ** */ - /* * Write the physical details regarding the block device to the store, and * switch to Connected state. -- 2.17.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v13 4/5] xen/blkback: Remove unnecessary static variable name prefixes
From: SeongJae Park A few of static variables in blkback have 'xen_blkif_' prefix, though it is unnecessary for static variables. This commit removes such prefixes. Reviewed-by: Roger Pau Monné Signed-off-by: SeongJae Park --- drivers/block/xen-blkback/blkback.c | 37 + 1 file changed, 17 insertions(+), 20 deletions(-) diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c index 79f677aeb5cc..fbd67f8e4e4e 100644 --- a/drivers/block/xen-blkback/blkback.c +++ b/drivers/block/xen-blkback/blkback.c @@ -62,8 +62,8 @@ * IO workloads. */ -static int xen_blkif_max_buffer_pages = 1024; -module_param_named(max_buffer_pages, xen_blkif_max_buffer_pages, int, 0644); +static int max_buffer_pages = 1024; +module_param_named(max_buffer_pages, max_buffer_pages, int, 0644); MODULE_PARM_DESC(max_buffer_pages, "Maximum number of free pages to keep in each block backend buffer"); @@ -78,8 +78,8 @@ MODULE_PARM_DESC(max_buffer_pages, * algorithm. */ -static int xen_blkif_max_pgrants = 1056; -module_param_named(max_persistent_grants, xen_blkif_max_pgrants, int, 0644); +static int max_pgrants = 1056; +module_param_named(max_persistent_grants, max_pgrants, int, 0644); MODULE_PARM_DESC(max_persistent_grants, "Maximum number of grants to map persistently"); @@ -88,8 +88,8 @@ MODULE_PARM_DESC(max_persistent_grants, * use. The time is in seconds, 0 means indefinitely long. */ -static unsigned int xen_blkif_pgrant_timeout = 60; -module_param_named(persistent_grant_unused_seconds, xen_blkif_pgrant_timeout, +static unsigned int pgrant_timeout = 60; +module_param_named(persistent_grant_unused_seconds, pgrant_timeout, uint, 0644); MODULE_PARM_DESC(persistent_grant_unused_seconds, "Time in seconds an unused persistent grant is allowed to " @@ -137,9 +137,8 @@ module_param(log_stats, int, 0644); static inline bool persistent_gnt_timeout(struct persistent_gnt *persistent_gnt) { - return xen_blkif_pgrant_timeout && - (jiffies - persistent_gnt->last_used >= - HZ * xen_blkif_pgrant_timeout); + return pgrant_timeout && (jiffies - persistent_gnt->last_used >= + HZ * pgrant_timeout); } static inline int get_free_page(struct xen_blkif_ring *ring, struct page **page) @@ -234,7 +233,7 @@ static int add_persistent_gnt(struct xen_blkif_ring *ring, struct persistent_gnt *this; struct xen_blkif *blkif = ring->blkif; - if (ring->persistent_gnt_c >= xen_blkif_max_pgrants) { + if (ring->persistent_gnt_c >= max_pgrants) { if (!blkif->vbd.overflow_max_grants) blkif->vbd.overflow_max_grants = 1; return -EBUSY; @@ -397,14 +396,13 @@ static void purge_persistent_gnt(struct xen_blkif_ring *ring) goto out; } - if (ring->persistent_gnt_c < xen_blkif_max_pgrants || - (ring->persistent_gnt_c == xen_blkif_max_pgrants && + if (ring->persistent_gnt_c < max_pgrants || + (ring->persistent_gnt_c == max_pgrants && !ring->blkif->vbd.overflow_max_grants)) { num_clean = 0; } else { - num_clean = (xen_blkif_max_pgrants / 100) * LRU_PERCENT_CLEAN; - num_clean = ring->persistent_gnt_c - xen_blkif_max_pgrants + - num_clean; + num_clean = (max_pgrants / 100) * LRU_PERCENT_CLEAN; + num_clean = ring->persistent_gnt_c - max_pgrants + num_clean; num_clean = min(ring->persistent_gnt_c, num_clean); pr_debug("Going to purge at least %u persistent grants\n", num_clean); @@ -599,8 +597,7 @@ static void print_stats(struct xen_blkif_ring *ring) current->comm, ring->st_oo_req, ring->st_rd_req, ring->st_wr_req, ring->st_f_req, ring->st_ds_req, -ring->persistent_gnt_c, -xen_blkif_max_pgrants); +ring->persistent_gnt_c, max_pgrants); ring->st_print = jiffies + msecs_to_jiffies(10 * 1000); ring->st_rd_req = 0; ring->st_wr_req = 0; @@ -660,7 +657,7 @@ int xen_blkif_schedule(void *arg) if (time_before(jiffies, blkif->buffer_squeeze_end)) shrink_free_pagepool(ring, 0); else - shrink_free_pagepool(ring, xen_blkif_max_buffer_pages); + shrink_free_pagepool(ring, max_buffer_pages); if (log_stats && time_after(jiffies, ring->st_print)) print_stats(ring); @@ -887,7 +884,7 @@ static int xen_blkbk_map(struct xen_blkif_ring *ring, continue; } if (use_persistent_gnts && - ring->persistent_gnt_c < xen_blkif_max_pgrants) { +
[Xen-devel] [PATCH v13 0/5] xenbus/backend: Add memory pressure handler callback
Granting pages consumes backend system memory. In systems configured with insufficient spare memory for those pages, it can cause a memory pressure situation. However, finding the optimal amount of the spare memory is challenging for large systems having dynamic resource utilization patterns. Also, such a static configuration might lack flexibility. To mitigate such problems, this patchset adds a memory reclaim callback to 'xenbus_driver' (patch 1) and then introduce a lock for race condition avoidance (patch 2). After that, patch 3 applies the callback mechanism to mitigate the problem in 'xen-blkback'. The fourth and fifth patches are trivial cleanups; those fix nits we found during the development of this patchset. Note that patches 1, 4, and 5 are not changed since v9. Base Version This patch is based on v5.4. A complete tree is also available at my public git repo: https://github.com/sjp38/linux/tree/patches/blkback/buffer_squeeze/v13 Patch History - Changes from v12 (https://lore.kernel.org/xen-devel/20191218104232.9606-1-sjp...@amazon.com/) - Do not unnecessarily disable interrupts (suggested by Juergen) - Hold lock from xenbus side (suggested by Juergen) Changes from v11 (https://lore.kernel.org/xen-devel/20191217160748.693-2-sjp...@amazon.com/) - Fix wrong trylock use (reported by Juergen) - Merge patch 3 and 4 (suggested by Juergen) - Update test result Changes from v10 (https://lore.kernel.org/xen-devel/20191216124527.30306-1-sjp...@amazon.com/) - Fix race condition (reported by SeongJae, suggested by Juergen) Changes from v9 (https://lore.kernel.org/xen-devel/20191213153546.17425-1-sjp...@amazon.de/) - Add 'Reviewed-by' and 'Acked-by' from Roger Pau Monné - Update the commit message for overhead test of the 2nd path Changes from v8 (https://lore.kernel.org/xen-devel/20191213130211.24011-1-sjp...@amazon.de/) - Drop 'Reviewed-by: Juergen' from the second patch (suggested by Roger Pau Monné) - Update contact of the new module param to SeongJae Park (suggested by Roger Pau Monné) - Wordsmith the description of the parameter (suggested by Roger Pau Monné) - Fix dumb bugs (suggested by Roger Pau Monné) - Move module param definition to xenbus.c and reduce the number of lines for this change (suggested by Roger Pau Monné) - Add a comment for the new callback, reclaim_memory, as other callbacks also have - Add another trivial cleanup of xenbus.c file (4th patch) Changes from v7 (https://lore.kernel.org/xen-devel/20191211181016.14366-1-sjp...@amazon.de/) - Update sysfs-driver-xen-blkback for new parameter (suggested by Roger Pau Monné) - Use per-xen_blkif buffer_squeeze_end instead of global variable (suggested by Roger Pau Monné) Changes from v6 (https://lore.kernel.org/linux-block/20191211042428.5961-1-sjp...@amazon.de/) - Remove more unnecessary prefixes (suggested by Roger Pau Monné) - Constify a variable (suggested by Roger Pau Monné) - Rename 'reclaim' into 'reclaim_memory' (suggested by Roger Pau Monné) - More wordsmith of the commit message (suggested by Roger Pau Monné) Changes from v5 (https://lore.kernel.org/linux-block/20191210080628.5264-1-sjp...@amazon.de/) - Wordsmith the commit messages (suggested by Roger Pau Monné) - Change the reclaim callback return type (suggested by Roger Pau Monné) - Change the type of the blkback squeeze duration variable (suggested by Roger Pau Monné) - Add a patch for removal of unnecessary static variable name prefixes (suggested by Roger Pau Monné) - Fix checkpatch.pl warnings Changes from v4 (https://lore.kernel.org/xen-devel/20191209194305.20828-1-sjp...@amazon.com/) - Remove domain id parameter from the callback (suggested by Juergen Gross) - Rename xen-blkback module parameter (suggested by Stefan Nuernburger) Changes from v3 (https://lore.kernel.org/xen-devel/20191209085839.21215-1-sjp...@amazon.com/) - Add general callback in xen_driver and use it (suggested by Juergen Gross) Changes from v2 (https://lore.kernel.org/linux-block/af195033-23d5-38ed-b73b-f6e2e3b34...@amazon.com) - Rename the module parameter and variables for brevity (aggressive shrinking -> squeezing) Changes from v1 (https://lore.kernel.org/xen-devel/20191204113419.2298-1-sjp...@amazon.com/) - Adjust the description to not use the term, `arbitrarily` (suggested by Paul Durrant) - Specify time unit of the duration in the parameter description, (suggested by Maximilian Heyne) - Change default aggressive shrinking duration from 1ms to 10ms - Merge two patches into one single patch SeongJae Park (5): xenbus/backend: Add memory pressure handler callback xenbus/backend: Protect xenbus callback with lock xen/blkback: Squeeze page pools if a memory pressure is detected xen/blkback: Remove unnecessary static variable name prefixes xen/blkback: Consistently insert one empty line between functions .../ABI/testing/sysfs-driver-xen-blkback | 10
[Xen-devel] [PATCH v13 1/5] xenbus/backend: Add memory pressure handler callback
From: SeongJae Park Granting pages consumes backend system memory. In systems configured with insufficient spare memory for those pages, it can cause a memory pressure situation. However, finding the optimal amount of the spare memory is challenging for large systems having dynamic resource utilization patterns. Also, such a static configuration might lack flexibility. To mitigate such problems, this commit adds a memory reclaim callback to 'xenbus_driver'. If a memory pressure is detected, 'xenbus' requests every backend driver to volunarily release its memory. Note that it would be able to improve the callback facility for more sophisticated handlings of general pressures. For example, it would be possible to monitor the memory consumption of each device and issue the release requests to only devices which causing the pressure. Also, the callback could be extended to handle not only memory, but general resources. Nevertheless, this version of the implementation defers such sophisticated goals as a future work. Reviewed-by: Juergen Gross Reviewed-by: Roger Pau Monné Signed-off-by: SeongJae Park --- drivers/xen/xenbus/xenbus_probe_backend.c | 32 +++ include/xen/xenbus.h | 1 + 2 files changed, 33 insertions(+) diff --git a/drivers/xen/xenbus/xenbus_probe_backend.c b/drivers/xen/xenbus/xenbus_probe_backend.c index b0bed4faf44c..7e78ebef7c54 100644 --- a/drivers/xen/xenbus/xenbus_probe_backend.c +++ b/drivers/xen/xenbus/xenbus_probe_backend.c @@ -248,6 +248,35 @@ static int backend_probe_and_watch(struct notifier_block *notifier, return NOTIFY_DONE; } +static int backend_reclaim_memory(struct device *dev, void *data) +{ + const struct xenbus_driver *drv; + + if (!dev->driver) + return 0; + drv = to_xenbus_driver(dev->driver); + if (drv && drv->reclaim_memory) + drv->reclaim_memory(to_xenbus_device(dev)); + return 0; +} + +/* + * Returns 0 always because we are using shrinker to only detect memory + * pressure. + */ +static unsigned long backend_shrink_memory_count(struct shrinker *shrinker, + struct shrink_control *sc) +{ + bus_for_each_dev(_backend.bus, NULL, NULL, + backend_reclaim_memory); + return 0; +} + +static struct shrinker backend_memory_shrinker = { + .count_objects = backend_shrink_memory_count, + .seeks = DEFAULT_SEEKS, +}; + static int __init xenbus_probe_backend_init(void) { static struct notifier_block xenstore_notifier = { @@ -264,6 +293,9 @@ static int __init xenbus_probe_backend_init(void) register_xenstore_notifier(_notifier); + if (register_shrinker(_memory_shrinker)) + pr_warn("shrinker registration failed\n"); + return 0; } subsys_initcall(xenbus_probe_backend_init); diff --git a/include/xen/xenbus.h b/include/xen/xenbus.h index 869c816d5f8c..c861cfb6f720 100644 --- a/include/xen/xenbus.h +++ b/include/xen/xenbus.h @@ -104,6 +104,7 @@ struct xenbus_driver { struct device_driver driver; int (*read_otherend_details)(struct xenbus_device *dev); int (*is_ready)(struct xenbus_device *dev); + void (*reclaim_memory)(struct xenbus_device *dev); }; static inline struct xenbus_driver *to_xenbus_driver(struct device_driver *drv) -- 2.17.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v13 4/5] xen/blkback: Remove unnecessary static variable name prefixes
From: SeongJae Park A few of static variables in blkback have 'xen_blkif_' prefix, though it is unnecessary for static variables. This commit removes such prefixes. Reviewed-by: Roger Pau Monné Signed-off-by: SeongJae Park --- drivers/block/xen-blkback/blkback.c | 37 + 1 file changed, 17 insertions(+), 20 deletions(-) diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c index 79f677aeb5cc..fbd67f8e4e4e 100644 --- a/drivers/block/xen-blkback/blkback.c +++ b/drivers/block/xen-blkback/blkback.c @@ -62,8 +62,8 @@ * IO workloads. */ -static int xen_blkif_max_buffer_pages = 1024; -module_param_named(max_buffer_pages, xen_blkif_max_buffer_pages, int, 0644); +static int max_buffer_pages = 1024; +module_param_named(max_buffer_pages, max_buffer_pages, int, 0644); MODULE_PARM_DESC(max_buffer_pages, "Maximum number of free pages to keep in each block backend buffer"); @@ -78,8 +78,8 @@ MODULE_PARM_DESC(max_buffer_pages, * algorithm. */ -static int xen_blkif_max_pgrants = 1056; -module_param_named(max_persistent_grants, xen_blkif_max_pgrants, int, 0644); +static int max_pgrants = 1056; +module_param_named(max_persistent_grants, max_pgrants, int, 0644); MODULE_PARM_DESC(max_persistent_grants, "Maximum number of grants to map persistently"); @@ -88,8 +88,8 @@ MODULE_PARM_DESC(max_persistent_grants, * use. The time is in seconds, 0 means indefinitely long. */ -static unsigned int xen_blkif_pgrant_timeout = 60; -module_param_named(persistent_grant_unused_seconds, xen_blkif_pgrant_timeout, +static unsigned int pgrant_timeout = 60; +module_param_named(persistent_grant_unused_seconds, pgrant_timeout, uint, 0644); MODULE_PARM_DESC(persistent_grant_unused_seconds, "Time in seconds an unused persistent grant is allowed to " @@ -137,9 +137,8 @@ module_param(log_stats, int, 0644); static inline bool persistent_gnt_timeout(struct persistent_gnt *persistent_gnt) { - return xen_blkif_pgrant_timeout && - (jiffies - persistent_gnt->last_used >= - HZ * xen_blkif_pgrant_timeout); + return pgrant_timeout && (jiffies - persistent_gnt->last_used >= + HZ * pgrant_timeout); } static inline int get_free_page(struct xen_blkif_ring *ring, struct page **page) @@ -234,7 +233,7 @@ static int add_persistent_gnt(struct xen_blkif_ring *ring, struct persistent_gnt *this; struct xen_blkif *blkif = ring->blkif; - if (ring->persistent_gnt_c >= xen_blkif_max_pgrants) { + if (ring->persistent_gnt_c >= max_pgrants) { if (!blkif->vbd.overflow_max_grants) blkif->vbd.overflow_max_grants = 1; return -EBUSY; @@ -397,14 +396,13 @@ static void purge_persistent_gnt(struct xen_blkif_ring *ring) goto out; } - if (ring->persistent_gnt_c < xen_blkif_max_pgrants || - (ring->persistent_gnt_c == xen_blkif_max_pgrants && + if (ring->persistent_gnt_c < max_pgrants || + (ring->persistent_gnt_c == max_pgrants && !ring->blkif->vbd.overflow_max_grants)) { num_clean = 0; } else { - num_clean = (xen_blkif_max_pgrants / 100) * LRU_PERCENT_CLEAN; - num_clean = ring->persistent_gnt_c - xen_blkif_max_pgrants + - num_clean; + num_clean = (max_pgrants / 100) * LRU_PERCENT_CLEAN; + num_clean = ring->persistent_gnt_c - max_pgrants + num_clean; num_clean = min(ring->persistent_gnt_c, num_clean); pr_debug("Going to purge at least %u persistent grants\n", num_clean); @@ -599,8 +597,7 @@ static void print_stats(struct xen_blkif_ring *ring) current->comm, ring->st_oo_req, ring->st_rd_req, ring->st_wr_req, ring->st_f_req, ring->st_ds_req, -ring->persistent_gnt_c, -xen_blkif_max_pgrants); +ring->persistent_gnt_c, max_pgrants); ring->st_print = jiffies + msecs_to_jiffies(10 * 1000); ring->st_rd_req = 0; ring->st_wr_req = 0; @@ -660,7 +657,7 @@ int xen_blkif_schedule(void *arg) if (time_before(jiffies, blkif->buffer_squeeze_end)) shrink_free_pagepool(ring, 0); else - shrink_free_pagepool(ring, xen_blkif_max_buffer_pages); + shrink_free_pagepool(ring, max_buffer_pages); if (log_stats && time_after(jiffies, ring->st_print)) print_stats(ring); @@ -887,7 +884,7 @@ static int xen_blkbk_map(struct xen_blkif_ring *ring, continue; } if (use_persistent_gnts && - ring->persistent_gnt_c < xen_blkif_max_pgrants) { +
[Xen-devel] [PATCH v13 3/5] xen/blkback: Squeeze page pools if a memory pressure is detected
From: SeongJae Park Each `blkif` has a free pages pool for the grant mapping. The size of the pool starts from zero and is increased on demand while processing the I/O requests. If current I/O requests handling is finished or 100 milliseconds has passed since last I/O requests handling, it checks and shrinks the pool to not exceed the size limit, `max_buffer_pages`. Therefore, host administrators can cause memory pressure in blkback by attaching a large number of block devices and inducing I/O. Such problematic situations can be avoided by limiting the maximum number of devices that can be attached, but finding the optimal limit is not so easy. Improper set of the limit can results in memory pressure or a resource underutilization. This commit avoids such problematic situations by squeezing the pools (returns every free page in the pool to the system) for a while (users can set this duration via a module parameter) if memory pressure is detected. Discussions === The `blkback`'s original shrinking mechanism returns only pages in the pool which are not currently be used by `blkback` to the system. In other words, the pages that are not mapped with granted pages. Because this commit is changing only the shrink limit but still uses the same freeing mechanism it does not touch pages which are currently mapping grants. Once memory pressure is detected, this commit keeps the squeezing limit for a user-specified time duration. The duration should be neither too long nor too short. If it is too long, the squeezing incurring overhead can reduce the I/O performance. If it is too short, `blkback` will not free enough pages to reduce the memory pressure. This commit sets the value as `10 milliseconds` by default because it is a short time in terms of I/O while it is a long time in terms of memory operations. Also, as the original shrinking mechanism works for at least every 100 milliseconds, this could be a somewhat reasonable choice. I also tested other durations (refer to the below section for more details) and confirmed that 10 milliseconds is the one that works best with the test. That said, the proper duration depends on actual configurations and workloads. That's why this commit allows users to set the duration as a module parameter. Memory Pressure Test To show how this commit fixes the memory pressure situation well, I configured a test environment on a xen-running virtualization system. On the `blkfront` running guest instances, I attach a large number of network-backed volume devices and induce I/O to those. Meanwhile, I measure the number of pages that swapped in (pswpin) and out (pswpout) on the `blkback` running guest. The test ran twice, once for the `blkback` before this commit and once for that after this commit. As shown below, this commit has dramatically reduced the memory pressure: pswpin pswpout before 76,672 185,799 after 8673,967 Optimal Aggressive Shrinking Duration - To find a best squeezing duration, I repeated the test with three different durations (1ms, 10ms, and 100ms). The results are as below: durationpswpin pswpout 1 707 5,095 10 867 3,967 100 362 3,348 As expected, the memory pressure decreases as the duration increases, but the reduction become slow from the `10ms`. Based on this results, I chose the default duration as 10ms. Performance Overhead Test = This commit could incur I/O performance degradation under severe memory pressure because the squeezing will require more page allocations per I/O. To show the overhead, I artificially made a worst-case squeezing situation and measured the I/O performance of a `blkfront` running guest. For the artificial squeezing, I set the `blkback.max_buffer_pages` using the `/sys/module/xen_blkback/parameters/max_buffer_pages` file. In this test, I set the value to `1024` and `0`. The `1024` is the default value. Setting the value as `0` is same to a situation doing the squeezing always (worst-case). If the underlying block device is slow enough, the squeezing overhead could be hidden. For the reason, I use a fast block device, namely the rbd[1]: # xl block-attach guest phy:/dev/ram0 xvdb w For the I/O performance measurement, I run a simple `dd` command 5 times directly to the device as below and collect the 'MB/s' results. $ for i in {1..5}; do dd if=/dev/zero of=/dev/xvdb \ bs=4k count=$((256*512)); sync; done The results are as below. 'max_pgs' represents the value of the `blkback.max_buffer_pages` parameter. max_pgs Min Max Median AvgStddev 0 417 423 420419.4 2.5099801 1024 414 425 416417.8 4.4384682 No difference proven at 95.0% confidence In short, even worst case squeezing on
[Xen-devel] [PATCH v13 2/5] xenbus/backend: Protect xenbus callback with lock
From: SeongJae Park A driver's 'reclaim_memory' callback can race with 'probe' or 'remove' because it will be called whenever memory pressure is detected. To avoid such race, this commit embeds a spinlock in each 'xenbus_device' and make 'xenbus' to hold the lock while the corresponded callbacks are running. Signed-off-by: SeongJae Park --- drivers/xen/xenbus/xenbus_probe.c | 8 +++- drivers/xen/xenbus/xenbus_probe_backend.c | 10 -- include/xen/xenbus.h | 1 + 3 files changed, 16 insertions(+), 3 deletions(-) diff --git a/drivers/xen/xenbus/xenbus_probe.c b/drivers/xen/xenbus/xenbus_probe.c index 5b471889d723..9ed556ba4fd4 100644 --- a/drivers/xen/xenbus/xenbus_probe.c +++ b/drivers/xen/xenbus/xenbus_probe.c @@ -232,7 +232,9 @@ int xenbus_dev_probe(struct device *_dev) return err; } + spin_lock(>reclaim_lock); err = drv->probe(dev, id); + spin_unlock(>reclaim_lock); if (err) goto fail; @@ -260,8 +262,11 @@ int xenbus_dev_remove(struct device *_dev) free_otherend_watch(dev); - if (drv->remove) + if (drv->remove) { + spin_lock(>reclaim_lock); drv->remove(dev); + spin_unlock(>reclaim_lock); + } free_otherend_details(dev); @@ -472,6 +477,7 @@ int xenbus_probe_node(struct xen_bus_type *bus, goto fail; dev_set_name(>dev, "%s", devname); + spin_lock_init(>reclaim_lock); /* Register with generic device framework. */ err = device_register(>dev); diff --git a/drivers/xen/xenbus/xenbus_probe_backend.c b/drivers/xen/xenbus/xenbus_probe_backend.c index 7e78ebef7c54..bc61372e00a1 100644 --- a/drivers/xen/xenbus/xenbus_probe_backend.c +++ b/drivers/xen/xenbus/xenbus_probe_backend.c @@ -251,12 +251,18 @@ static int backend_probe_and_watch(struct notifier_block *notifier, static int backend_reclaim_memory(struct device *dev, void *data) { const struct xenbus_driver *drv; + struct xenbus_device *xdev; if (!dev->driver) return 0; drv = to_xenbus_driver(dev->driver); - if (drv && drv->reclaim_memory) - drv->reclaim_memory(to_xenbus_device(dev)); + if (drv && drv->reclaim_memory) { + xdev = to_xenbus_device(dev); + if (!spin_trylock(>reclaim_lock)) + return 0; + drv->reclaim_memory(xdev); + spin_unlock(>reclaim_lock); + } return 0; } diff --git a/include/xen/xenbus.h b/include/xen/xenbus.h index c861cfb6f720..45cd61cb6e86 100644 --- a/include/xen/xenbus.h +++ b/include/xen/xenbus.h @@ -76,6 +76,7 @@ struct xenbus_device { enum xenbus_state state; struct completion down; struct work_struct work; + spinlock_t reclaim_lock; }; static inline struct xenbus_device *to_xenbus_device(struct device *dev) -- 2.17.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [xen-unstable-smoke test] 144934: tolerable all pass - PUSHED
flight 144934 xen-unstable-smoke real [real] http://logs.test-lab.xenproject.org/osstest/logs/144934/ Failures :-/ but no regressions. Tests which did not succeed, but are not blocking: test-amd64-amd64-libvirt 13 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 13 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 14 saverestore-support-checkfail never pass test-armhf-armhf-xl 13 migrate-support-checkfail never pass test-armhf-armhf-xl 14 saverestore-support-checkfail never pass version targeted for testing: xen 5c13ed79f3cba200f21e7dfd6ed7f3aa08e4dada baseline version: xen 0e7c69bd3c0b35a677d73843b39522787ccf5a3f Last test of basis 144931 2019-12-18 12:00:25 Z0 days Testing same since 144934 2019-12-18 15:01:21 Z0 days1 attempts People who touched revisions under test: Andrew Cooper Jan Beulich jobs: build-arm64-xsm pass build-amd64 pass build-armhf pass build-amd64-libvirt pass test-armhf-armhf-xl pass test-arm64-arm64-xl-xsm pass test-amd64-amd64-xl-qemuu-debianhvm-amd64pass test-amd64-amd64-libvirt pass sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Pushing revision : To xenbits.xen.org:/home/xen/git/xen.git 0e7c69bd3c..5c13ed79f3 5c13ed79f3cba200f21e7dfd6ed7f3aa08e4dada -> smoke ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [qemu-mainline test] 144925: regressions - FAIL
flight 144925 qemu-mainline real [real] http://logs.test-lab.xenproject.org/osstest/logs/144925/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-i386-freebsd10-i386 14 guest-saverestore fail REGR. vs. 144861 test-amd64-i386-freebsd10-amd64 14 guest-saverestore fail REGR. vs. 144861 test-amd64-amd64-xl-qemuu-debianhvm-amd64 13 guest-saverestore fail REGR. vs. 144861 test-amd64-amd64-xl-qemuu-win7-amd64 13 guest-saverestore fail REGR. vs. 144861 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 guest-saverestore fail REGR. vs. 144861 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 guest-saverestore fail REGR. vs. 144861 test-amd64-amd64-xl-qemuu-debianhvm-i386-xsm 13 guest-saverestore fail REGR. vs. 144861 test-amd64-i386-xl-qemuu-debianhvm-amd64-shadow 13 guest-saverestore fail REGR. vs. 144861 test-amd64-amd64-xl-qemuu-debianhvm-amd64-shadow 13 guest-saverestore fail REGR. vs. 144861 test-amd64-amd64-xl-qemuu-ovmf-amd64 13 guest-saverestore fail REGR. vs. 144861 test-amd64-i386-xl-qemuu-debianhvm-amd64 13 guest-saverestore fail REGR. vs. 144861 test-amd64-i386-xl-qemuu-ovmf-amd64 13 guest-saverestore fail REGR. vs. 144861 test-amd64-i386-xl-qemuu-debianhvm-i386-xsm 13 guest-saverestore fail REGR. vs. 144861 test-amd64-i386-xl-qemuu-win7-amd64 13 guest-saverestore fail REGR. vs. 144861 test-amd64-amd64-xl-qemuu-ws16-amd64 13 guest-saverestore fail REGR. vs. 144861 test-amd64-i386-xl-qemuu-ws16-amd64 13 guest-saverestore fail REGR. vs. 144861 Regressions which are regarded as allowable (not blocking): test-armhf-armhf-xl-rtds16 guest-start/debian.repeat fail REGR. vs. 144861 Tests which did not succeed, but are not blocking: test-amd64-amd64-xl-rtds 18 guest-localmigrate/x10 fail like 144861 test-armhf-armhf-libvirt 14 saverestore-support-checkfail like 144861 test-armhf-armhf-libvirt-raw 13 saverestore-support-checkfail like 144861 test-amd64-amd64-libvirt 13 migrate-support-checkfail never pass test-amd64-i386-libvirt 13 migrate-support-checkfail never pass test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail never pass test-arm64-arm64-xl-seattle 13 migrate-support-checkfail never pass test-arm64-arm64-xl-seattle 14 saverestore-support-checkfail never pass test-amd64-i386-libvirt-xsm 13 migrate-support-checkfail never pass test-amd64-i386-xl-pvshim12 guest-start fail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-arm64-arm64-xl-xsm 13 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 14 saverestore-support-checkfail never pass test-arm64-arm64-xl 13 migrate-support-checkfail never pass test-arm64-arm64-xl 14 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit2 13 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 14 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit1 13 migrate-support-checkfail never pass test-arm64-arm64-xl-credit1 14 saverestore-support-checkfail never pass test-arm64-arm64-xl-thunderx 13 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 14 saverestore-support-checkfail never pass test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2 fail never pass test-arm64-arm64-libvirt-xsm 13 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 14 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-vhd 12 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 13 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-multivcpu 13 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit2 13 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-cubietruck 13 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit1 13 migrate-support-checkfail never pass test-armhf-armhf-xl-credit1 14 saverestore-support-checkfail never pass test-armhf-armhf-libvirt 13 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 13 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 14 saverestore-support-checkfail never pass test-armhf-armhf-xl 13 migrate-support-checkfail never pass
Re: [Xen-devel] [PATCH v12 2/5] xenbus/backend: Protect xenbus callback with lock
On Wed, 18 Dec 2019 16:11:51 +0100 "Jürgen Groß" wrote: > On 18.12.19 15:40, SeongJae Park wrote: > > On Wed, 18 Dec 2019 14:30:44 +0100 "Jürgen Groß" wrote: > > > >> On 18.12.19 13:42, SeongJae Park wrote: > >>> On Wed, 18 Dec 2019 13:27:37 +0100 "Jürgen Groß" wrote: > >>> > On 18.12.19 11:42, SeongJae Park wrote: > > From: SeongJae Park > > > > 'reclaim_memory' callback can race with a driver code as this callback > > will be called from any memory pressure detected context. To deal with > > the case, this commit adds a spinlock in the 'xenbus_device'. Whenever > > 'reclaim_memory' callback is called, the lock of the device which passed > > to the callback as its argument is locked. Thus, drivers registering > > their 'reclaim_memory' callback should protect the data that might race > > with the callback with the lock by themselves. > > Any reason you don't take the lock around the .probe() and .remove() > calls of the backend (xenbus_dev_probe() and xenbus_dev_remove())? This > would eliminate the need to do that in each backend instead. > >>> > >>> First of all, I would like to keep the critical section as small as > >>> possible. > >>> With my small test, I could see slightly increasing memory pressure as the > >>> critical section becomes wider. Also, some drivers might share the data > >>> their > >>> 'reclaim_memory' callback touches with other functions. I think only the > >>> driver owners can know what data is shared and what is the minimum > >>> critical > >>> section to protect it. > >> > >> But this kind of serialization can still be added on top. > > > > I'm still worrying about the unnecessarily large critical section, but it > > might > > be small enough to be ignored. If no others have strong objection, I will > > take > > the lock around the '->probe()' and '->remove()'. > > The lock is per device, so contention is possible only for the > reclaim case. In case probe or remove are running reclaim will have > nothing to free (in probe case nothing is allocated yet, in remove > case everything should be freed anyway). So the larger critical section > is no problem at all IMO. Agreed. I think I was worried about nothing really existing now. > > >> And with the trylock in the reclaim path I believe you can even avoid > >> the irq variants of the spinlock. But I might be wrong, so you should > >> try that with lockdep enabled. If it is working there is no harm done > >> when making the critical section larger, as memory allocations will > >> work as before. > > > > Yes, you're right. I will try test with lockdep. > > Thanks, Good news, lockdep says it's okay :) Will post next version soon. Thanks, SeongJae Park > > > Juergen ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] REGRESSION: Xen 4.13 RC5 fails to bootstrap Dom0 on ARM
Hi, On Wed, Dec 18, 2019 at 4:56 AM Julien Grall wrote: > > So that is, in fact, my first question -- why is Xen not showing > > available memory in xl info? > > I am not entirely sure what exact information you want. > > The output you dumped above contain the available memory for the memory > (see "free_memory"). > > Are you looking from something different? Just to be clear: I was giving 2G via devicetrees (the same device trees that would make Linux detect 2G of RAM) hence I was expecting xl info to show that. Instead I only got 1120M shown by xl info. > On 18/12/2019 00:04, Roman Shaposhnik wrote: > > memory { > > device_type = "memory"; > > reg = <0x0 0x0 0x0 0x5e0 0x0 0x5f0 0x0 0x1000 > > 0x0 0x5f02000 0x0 0xefd000 0x0 0x6e0 0x0 0x60f000 0x0 0x741 > > 0x0 0x1aaf 0x0 0x21f0 0x0 0x10 0x0 0x2200 0x0 > > 0x1c00>; > > }; > > > > reserved-memory { > > ranges; > > #size-cells = <0x2>; > > #address-cells = <0x2>; > > > > ramoops@21f0 { > > ftrace-size = <0x2>; > > console-size = <0x2>; > > reg = <0x0 0x21f0 0x0 0x10>; > > record-size = <0x2>; > > compatible = "ramoops"; > > }; > > > > linux,cma { > > linux,cma-default; > > reusable; > > size = <0x0 0x800>; > > compatible = "shared-dma-pool"; > > }; > > }; > > > > If you look at the REG -- it does now add up to 2Gb, but booting Xen > > with it has exactly the > > same effect as booting it with: reg = <0x0 0x0 0x0 0x8000>;\ > > If you boot Xen using EFI, the memory information wil come from EFI and > the DT node will be ignored. So unless UEFI is able to pick up the > modification of the DT memory node, modifying the DT is not going to > affect anything. That's a good point, but given that I always go through GRUB, I was expecting devicetree command to completely overshadow whatever information UEFI may have. Am I wrong? > > I am attaching a full log, and I see the following in the logs: > > > > (XEN) Allocating 1:1 mappings totalling 720MB for dom0: > > (XEN) BANK[0] 0x000800-0x001c00 (320MB) > > (XEN) BANK[1] 0x004000-0x005800 (384MB) > > (XEN) BANK[2] 0x007b00-0x007c00 (16MB) > > > > Which sort of makes sense, I guess -- but I still don't understand > > where all these ranges > > are coming from and how come Xen doesn't see the full 2Gb even with various > > devicetrees I tried. > > The range aboves describe the memory range given to Dom0. For all the > memory given to Xen,m you want to look at the top of your log: > > (XEN) Checking for initrd in /chosen > (XEN) RAM: - 05df > (XEN) RAM: 05f0 - 06dfefff > (XEN) RAM: 06e0 - 0740efff > (XEN) RAM: 0741 - 1db8dfff > (XEN) RAM: 350f - 3dbd2fff > (XEN) RAM: 3dbd3000 - 3dff > (XEN) RAM: 4000 - 5a653fff > (XEN) RAM: 7ada - 7ada3fff > (XEN) RAM: 7aea8000 - 7afa9fff > (XEN) RAM: 7afaa000 - 7ec73fff > (XEN) RAM: 7ec74000 - 7fdddfff > (XEN) RAM: 7fdde000 - 7fea5fff > (XEN) RAM: 7fea6000 - 7ff6dfff > (XEN) RAM: 7000 - 7fff > > Looking at the differences with the Linux logs, there is indeed some > memory not detected by Xen. > > On Xen, we only consider usuable memory any EFI description with > EfiConventionalMemory, EfiBootServicesCode and EfiBootServicesData. > > Linux include more type here, so this may explain why we see a difference. > > While Looking at it, I have also noticed that we don't seem to care > about the memory attribute. I suspect this could be another latent issue > in Xen if the attribute does not match. Anything I can do to help debug this? I can run any kind of debug builds, etc. if needed. I mean -- at this point it would be really great to get HiKey back to the status of Xen-on-ARM developer board. > > Any ideas here would be greatly apprecaited! > > > > Thanks, > > Roman. > > > > P.S. Any guess at what these mean? > > > > (XEN) traps.c:1973:d0v0 HSR=0x93880006 pc=0x008738 > > gva=0x872f2000 gpa=0x0f > > (XEN) traps.c:1973:d0v0 HSR=0x93880006 pc=0x00b734e558 > > gva=0xb72eb000 gpa=0x0f > > (XEN) traps.c:1973:d0v0 HSR=0x93880006 pc=0x008f9d2558 > > gva=0x8f96f000 gpa=0x0f > > It means that Linux has tried to access something that has not been > mapped in stage-2. As Dom0 is mapped 1:1, the GPA also give you the host > physical address.
Re: [Xen-devel] [PATCH v3 5/7] Add Code Review Guide
On 18/12/2019, 14:29, "Julien Grall" wrote: Hi Lars, On 12/12/2019 21:14, Lars Kurth wrote: > +### Workflow from an Author's Perspective > + > +When code authors receive feedback on their patches, they typically first try > +to clarify feedback they do not understand. For smaller patches or patch series > +it makes sense to wait until receiving feedback on the entire series before > +sending out a new version addressing the changes. For larger series, it may > +make sense to send out a new revision earlier. > + > +As a reviewer, you need some system that he;ps ensure that you address all Just a small typo: I think you meant "helps" rather than "he;ps". Cheers, Thank you: fixed in my working copy. One thing which occurred to me for reviews like these, where there is no ACK's or Reviewed-by's is that I don't actually know whether you as reviewer is otherwise happy with the remainder of patch. Normally the ACKed-by or Reviewed-by is a signal that it is I am assuming it is, but I think it may be worthwhile pointing this out in the document, that unless stated otherwise, the reviewer is happy with the patch Regards Lars ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] REGRESSION: Xen 4.13 RC5 fails to bootstrap Dom0 on ARM
On Wed, Dec 18, 2019 at 3:50 AM Julien Grall wrote: > > Hi, > > On 18/12/2019 07:36, Roman Shaposhnik wrote: > > On Tue, Dec 17, 2019 at 6:56 PM Roman Shaposhnik wrote: > >> Exactly! That's the other surprising bit -- I noticed that too -- its not > >> like > >> Xen doesn't see any of the memory above 1G -- it just doesn't see enough > >> of it. > >> > >> So the question is -- what is Linux doing that Xen doesn't? > > > > By the way, speaking of running Xen under ARM/qemu -- here's an interesting > > observation: when I run qemu-system-aarch64 with -m 4096 option it seems > > that, again, Linux kernel is perfectly content with having access to 4G of > > RAM, > > while Xen only sees about 2G. > > Linux and Xen should see close to the same amount as memory as long as > you are using the same bootloader... Thanks for confirming. This is what I'm trying to get to on this thread. Any help would be greatly appreciated! > > This may actually have something to do with UEFI I guess. > > ... could you confirm whether you are booting Linux using UEFI or not? The boot sequence in both cases is: HiKey l-loader HiKey Tianocore EDK2 – UEFI GRUB (as a UEFI payload) Xen | Linux GRUB's commands for booting Xen + Dom0: xen_hypervisor /boot/xen.efi console=dtuart dom0_mem=640M dom0_max_vcpus=1 dom0_vcpus_pin xen_module /boot/kernel console=hvc0 root=(hd1,gpt1)/rootfs.img text devicetree (hd1,gpt4)/eve.dtb xen_module (hd1,gpt1)/initrd.img GRUB's commands for booting Linux only: linux /boot/kernel console=ttyAMA0 console=ttyAMA1 console=ttyAMA2 console=ttyAMA3 root=PARTUUID=f71bd987-d99a-4c88-9781-cf4c26cae55e rootdelay=3 devicetree (hd1,gpt4)/eve.dtb So -- nothing boots directly by UEFI -- everything goes through GRUB. However, my understanding is that GRUB will detect devicetree information provided by UEFI (even though devicetree command is supposed to completely replace that). Hence it is possible that Linux relies on some residuals left in memory by GRUB that Xen doesn't pay attention to (but this is a pretty wild speculation only). Thanks, Roman. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [ANNOUNCEMENT] Xen 4.13 is released
Dear community members, I'm pleased to announce that Xen 4.13.0 is released. Please find the tarball and its signature at: https://downloads.xenproject.org/release/xen/4.13.0/ You can also check out the tag in xen.git: https://xenbits.xen.org/git-http/xen.git RELEASE-4.13.0 Git checkout and build instructions can be found at: https://wiki.xenproject.org/wiki/Xen_Project_4.13_Release_Notes#Build_Requirements Release notes can be found at: https://wiki.xenproject.org/wiki/Xen_Project_4.13_Release_Notes A summary for 4.13 release documents can be found at: https://wiki.xenproject.org/wiki/Category:Xen_4.13 Technical blog post for 4.13 can be found at: https://xenproject.org/2019/12/18/whats-new-in-xen-4-13/ Thanks everyone who contributed to this release. This release would not have happened without all the awesome contributions from around the globe. Regards, Juergen Gross (on behalf of the Xen Project Hypervisor team) ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] clock source in PV Linux
On 12/18/19 12:36 AM, Roman Shaposhnik wrote: On Wed, Dec 11, 2019 at 12:41 AM Jan Beulich wrote: On 11.12.2019 09:16, Jürgen Groß wrote: On 11.12.19 08:28, Jan Beulich wrote: Jürgen, Boris, I've noticed <6>clocksource: Switched to clocksource tsc as the final clocksource related boot message in a PV Dom0's log with 5.4.2. Is it intentional that it's not the "xen" one that gets used by default? I think this is fine. I just tested it and I'm seeing the same in dom0, while in a PV domU "xen" is used per default. In dom0 "tsc" should be okay in case it is stable. Or are you expecting problems with that setting? Well, first of all I found this surprising. Whether there are problems to be expected largely depends on the reliability of the "stable" detection in PV Dom0. Related question: does this mean that tsc is now default for PVH as well? The reason I'm asking is because I'm still a bit worried about the clock drift with tsc. dom0 will use TSC for either PV or PVH: xen_time_init(): /* As Dom0 is never moved, no penalty on using TSC there */ if (xen_initial_domain()) xen_clocksource.rating = 275; But as far as TSC stability I'd think it should be sufficiently checked by generic TSC init code? -boris ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH] x86/save: reserve HVM save record numbers that have been consumed...
...for patches not (yet) upstream. This patch is simply reserving save record number space to avoid the risk of clashes between existent downstream changes made by Amazon and future upstream changes which may be incompatible. Signed-off-by: Paul Durrant --- Cc: Jan Beulich Cc: Andrew Cooper Cc: Wei Liu Cc: "Roger Pau Monné" --- xen/include/public/arch-x86/hvm/save.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/xen/include/public/arch-x86/hvm/save.h b/xen/include/public/arch-x86/hvm/save.h index b2ad3fcd74..9c7b86678e 100644 --- a/xen/include/public/arch-x86/hvm/save.h +++ b/xen/include/public/arch-x86/hvm/save.h @@ -639,10 +639,12 @@ struct hvm_msr { #define CPU_MSR_CODE 20 +/* Range 22 - 40 reserved for Amazon */ + /* * Largest type-code in use */ -#define HVM_SAVE_CODE_MAX 20 +#define HVM_SAVE_CODE_MAX 40 #endif /* __XEN_PUBLIC_HVM_SAVE_X86_H__ */ -- 2.20.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [xen-unstable test] 144924: regressions - FAIL
flight 144924 xen-unstable real [real] http://logs.test-lab.xenproject.org/osstest/logs/144924/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-i386-xl-qemuu-ovmf-amd64 13 guest-saverestore fail REGR. vs. 144905 test-amd64-amd64-i386-pvgrub 17 guest-localmigrate/x10 fail REGR. vs. 144905 Regressions which are regarded as allowable (not blocking): test-amd64-amd64-xl-rtds 17 guest-saverestore.2 fail REGR. vs. 144905 test-armhf-armhf-xl-rtds 12 guest-start fail REGR. vs. 144905 Tests which did not succeed, but are not blocking: test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stopfail like 144905 test-armhf-armhf-libvirt 14 saverestore-support-checkfail like 144905 test-amd64-amd64-xl-qemut-win7-amd64 17 guest-stopfail like 144905 test-amd64-i386-xl-qemut-win7-amd64 17 guest-stop fail like 144905 test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop fail like 144905 test-armhf-armhf-libvirt-raw 13 saverestore-support-checkfail like 144905 test-amd64-amd64-xl-qemuu-ws16-amd64 17 guest-stopfail like 144905 test-amd64-amd64-xl-qemut-ws16-amd64 17 guest-stopfail like 144905 test-amd64-i386-xl-qemuu-ws16-amd64 17 guest-stop fail like 144905 test-amd64-i386-xl-pvshim12 guest-start fail never pass test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail never pass test-amd64-amd64-libvirt 13 migrate-support-checkfail never pass test-amd64-i386-libvirt-xsm 13 migrate-support-checkfail never pass test-amd64-i386-libvirt 13 migrate-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-arm64-arm64-xl-credit2 13 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 14 saverestore-support-checkfail never pass test-arm64-arm64-xl-thunderx 13 migrate-support-checkfail never pass test-arm64-arm64-xl-credit1 13 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 14 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit1 14 saverestore-support-checkfail never pass test-arm64-arm64-xl 13 migrate-support-checkfail never pass test-arm64-arm64-xl 14 saverestore-support-checkfail never pass test-arm64-arm64-libvirt-xsm 13 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 14 saverestore-support-checkfail never pass test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2 fail never pass test-armhf-armhf-xl-arndale 13 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 14 saverestore-support-checkfail never pass test-arm64-arm64-xl-xsm 13 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 14 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-vhd 12 migrate-support-checkfail never pass test-armhf-armhf-xl 13 migrate-support-checkfail never pass test-armhf-armhf-xl 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-multivcpu 13 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-cubietruck 13 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 14 saverestore-support-checkfail never pass test-armhf-armhf-libvirt 13 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 13 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 14 saverestore-support-checkfail never pass test-arm64-arm64-xl-seattle 13 migrate-support-checkfail never pass test-arm64-arm64-xl-seattle 14 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-raw 12 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 12 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 13 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit1 13 migrate-support-checkfail never pass test-armhf-armhf-xl-credit1 14 saverestore-support-checkfail never pass test-amd64-i386-xl-qemut-ws16-amd64 17 guest-stop fail never pass version targeted for testing: xen 704fa1532801bc02c4500462f0b913b3c137db4d baseline version: xen f50a4f6e244cfc8e773300c03aaf4db391f3028a Last test of basis 144905 2019-12-17 18:36:21 Z0 days Testing same since 144924 2019-12-18 06:43:35 Z0 days1 attempts People who touched revisions under test:
Re: [Xen-devel] [PATCH] [tools/hotplug] Use ip on systems where brctl is not available
Steven Haigh writes ("[PATCH] [tools/hotplug] Use ip on systems where brctl is not available"): > Newer distros like CentOS 8 do not have brctl available. As such, we > can't use it to configure networking anymore. > > This patch will fall back to 'ip' or 'bridge' commands if brctl is not > available in the working PATH. This looks good to me at least in the brctl case. I have two minor comments. For the avoidance of doubt, I guess you have tested this in the `ip'/`bridge' case ? How thoroughly ? :-) > -if [ -z "$bridge" ] > -then > - bridge=$(brctl show | awk 'NR==2{print$1}') > - > +if [ -z "$bridge" ]; then The presumably-unintentional style change makes the review slightly harder... > -bridge=$(brctl show | cut -d " > +if which brctl >&/dev/null; then Maybe introduce have_brctl () { ... } so we can say if have_brctl; then ? Regards, Ian. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] tools/python: Drop test.py
On 18/12/2019, 13:50, "Andrew Cooper" wrote: This file hasn't been touched since it was introduced in 2005 (c/s 0c6f36628) and has a wildly obsolete shebang for Python 2.3. Most importantly for us is that it isn't Python 3 compatible. Drop the file entirely. Since the 2.3 days, automatic discovery of tests has been included in standard functionality. Rewrite the test rule to use "$(PYTHON) -m unittest discover" which is equivelent. Dropping test.py drops the only piece of ZPL-2.0 code in the tree. Drop the ancillary files, and adjust COPYING to match. Signed-off-by: Andrew Cooper --- CC: Ian Jackson CC: Wei Liu CC: Lars Kurth This wants backporting to 4.13 as soon as practical. Reviewed-by: Lars Kurth (lars.ku...@citrix.com) - from a licensing perspective ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2 6/6] x86: implement Hyper-V clock source
> -Original Message- > From: Wei Liu On Behalf Of Wei Liu > Sent: 18 December 2019 14:43 > To: Xen Development List > Cc: Michael Kelley ; Durrant, Paul > ; Wei Liu ; Jan Beulich > ; Andrew Cooper ; Wei Liu > ; Roger Pau Monné > Subject: [PATCH v2 6/6] x86: implement Hyper-V clock source > > Implement a clock source using Hyper-V's reference TSC page. > > Signed-off-by: Wei Liu > --- > v2: > 1. Address Jan's comments. > > Relevant spec: > > https://github.com/MicrosoftDocs/Virtualization- > Documentation/raw/live/tlfs/Hypervisor%20Top%20Level%20Functional%20Specif > ication%20v5.0C.pdf > > Section 12.6. > --- > xen/arch/x86/time.c | 101 > 1 file changed, 101 insertions(+) > > diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c > index 216169a025..8b96b2e9a5 100644 > --- a/xen/arch/x86/time.c > +++ b/xen/arch/x86/time.c > @@ -31,6 +31,7 @@ > #include > #include > #include > +#include > #include > #include > #include > @@ -644,6 +645,103 @@ static struct platform_timesource __initdata > plt_xen_timer = > }; > #endif > > +#ifdef CONFIG_HYPERV_GUEST > +/ > + * HYPER-V REFERENCE TSC > + */ > + > +static struct ms_hyperv_tsc_page *hyperv_tsc; > +static struct page_info *hyperv_tsc_page; > + > +static int64_t __init init_hyperv_timer(struct platform_timesource *pts) > +{ > +paddr_t maddr; > +uint64_t tsc_msr, freq; > + > +if ( !(ms_hyperv.features & HV_MSR_REFERENCE_TSC_AVAILABLE) ) > +return 0; > + > +hyperv_tsc_page = alloc_domheap_page(NULL, 0); > +if ( !hyperv_tsc_page ) > +return 0; > + > +hyperv_tsc = __map_domain_page_global(hyperv_tsc_page); > +if ( !hyperv_tsc ) > +{ > +free_domheap_page(hyperv_tsc_page); > +hyperv_tsc_page = NULL; > +return 0; > +} > + > +maddr = page_to_maddr(hyperv_tsc_page); > + > +/* > + * Per Hyper-V TLFS: > + * 1. Read existing MSR value > + * 2. Preserve bits [11:1] > + * 3. Set bits [63:12] to be guest physical address of tsc page > + * 4. Set enabled bit (0) > + * 5. Write back new MSR value > + */ > +rdmsrl(HV_X64_MSR_REFERENCE_TSC, tsc_msr); > +tsc_msr &= 0xffeULL; > +tsc_msr |= maddr | 1 /* enabled */; > +wrmsrl(HV_X64_MSR_REFERENCE_TSC, tsc_msr); > + You need to check for the HV_X64_ACCESS_FREQUENCY_MSRS feature or you risk a #GP below I think. > +/* Get TSC frequency from Hyper-V */ > +rdmsrl(HV_X64_MSR_TSC_FREQUENCY, freq); > +pts->frequency = freq; > + > +return freq; > +} > + > +static inline uint64_t read_hyperv_timer(void) > +{ > +uint64_t scale, offset, ret, tsc; > +uint32_t seq; > +const struct ms_hyperv_tsc_page *tsc_page = hyperv_tsc; > + > +do { > +seq = tsc_page->tsc_sequence; > + > +/* Seq 0 is special. It means the TSC enlightenment is not > + * available at the moment. The reference time can only be > + * obtained from the Reference Counter MSR. > + */ > +if ( seq == 0 ) Older versions of the spec used to use 0x I think, although when I look again they seem to have been retro-actively fixed. In any case I think you should treat both 0x and 0 as invalid. > +{ > +rdmsrl(HV_X64_MSR_TIME_REF_COUNT, ret); > +return ret; > +} > + > +/* rdtsc_ordered already contains a load fence */ > +tsc = rdtsc_ordered(); > +scale = tsc_page->tsc_scale; > +offset = tsc_page->tsc_offset; > + > +smp_rmb(); > + > +} while (tsc_page->tsc_sequence != seq); > + > +/* ret = ((tsc * scale) >> 64) + offset; */ > +asm ( "mul %[scale]; add %[offset], %[ret]" > + : "+a" (tsc), [ret] "=d" (ret) > + : [scale] "rm" (scale), [offset] "rm" (offset) ); > + It would be nice to common this up with scale_tsc() in viridian/time.c. Paul > +return ret; > +} > + > +static struct platform_timesource __initdata plt_hyperv_timer = > +{ > +.id = "hyperv", > +.name = "HYPER-V REFERENCE TSC", > +.read_counter = read_hyperv_timer, > +.init = init_hyperv_timer, > +/* See TSC time source for why counter_bits is set to 63 */ > +.counter_bits = 63, > +}; > +#endif > + > / > * GENERIC PLATFORM TIMER INFRASTRUCTURE > */ > @@ -793,6 +891,9 @@ static u64 __init init_platform_timer(void) > static struct platform_timesource * __initdata plt_timers[] = { > #ifdef CONFIG_XEN_GUEST > _xen_timer, > +#endif > +#ifdef CONFIG_HYPERV_GUEST > +_hyperv_timer, > #endif > _hpet, _pmtimer, _pit > }; > -- > 2.20.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v12 2/5] xenbus/backend: Protect xenbus callback with lock
On 18.12.19 15:40, SeongJae Park wrote: On Wed, 18 Dec 2019 14:30:44 +0100 "Jürgen Groß" wrote: On 18.12.19 13:42, SeongJae Park wrote: On Wed, 18 Dec 2019 13:27:37 +0100 "Jürgen Groß" wrote: On 18.12.19 11:42, SeongJae Park wrote: From: SeongJae Park 'reclaim_memory' callback can race with a driver code as this callback will be called from any memory pressure detected context. To deal with the case, this commit adds a spinlock in the 'xenbus_device'. Whenever 'reclaim_memory' callback is called, the lock of the device which passed to the callback as its argument is locked. Thus, drivers registering their 'reclaim_memory' callback should protect the data that might race with the callback with the lock by themselves. Any reason you don't take the lock around the .probe() and .remove() calls of the backend (xenbus_dev_probe() and xenbus_dev_remove())? This would eliminate the need to do that in each backend instead. First of all, I would like to keep the critical section as small as possible. With my small test, I could see slightly increasing memory pressure as the critical section becomes wider. Also, some drivers might share the data their 'reclaim_memory' callback touches with other functions. I think only the driver owners can know what data is shared and what is the minimum critical section to protect it. But this kind of serialization can still be added on top. I'm still worrying about the unnecessarily large critical section, but it might be small enough to be ignored. If no others have strong objection, I will take the lock around the '->probe()' and '->remove()'. The lock is per device, so contention is possible only for the reclaim case. In case probe or remove are running reclaim will have nothing to free (in probe case nothing is allocated yet, in remove case everything should be freed anyway). So the larger critical section is no problem at all IMO. And with the trylock in the reclaim path I believe you can even avoid the irq variants of the spinlock. But I might be wrong, so you should try that with lockdep enabled. If it is working there is no harm done when making the critical section larger, as memory allocations will work as before. Yes, you're right. I will try test with lockdep. Thanks, Juergen ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2 4/6] x86/viridian: drop private copy of HV_REFERENCE_TSC_PAGE in time.c
> -Original Message- > From: Wei Liu On Behalf Of Wei Liu > Sent: 18 December 2019 14:43 > To: Xen Development List > Cc: Michael Kelley ; Durrant, Paul > ; Wei Liu ; Paul Durrant > ; Jan Beulich ; Andrew Cooper > ; Wei Liu ; Roger Pau Monné > > Subject: [PATCH v2 4/6] x86/viridian: drop private copy of > HV_REFERENCE_TSC_PAGE in time.c > > Use the one defined in hyperv-tlfs.h instead. No functional change > intended. > > Signed-off-by: Wei Liu > --- > xen/arch/x86/hvm/viridian/time.c | 30 +++--- > 1 file changed, 11 insertions(+), 19 deletions(-) > > diff --git a/xen/arch/x86/hvm/viridian/time.c > b/xen/arch/x86/hvm/viridian/time.c > index 6ddca29b29..33c15782e4 100644 > --- a/xen/arch/x86/hvm/viridian/time.c > +++ b/xen/arch/x86/hvm/viridian/time.c > @@ -13,19 +13,11 @@ > > #include > #include > +#include > #include > > #include "private.h" > > -typedef struct _HV_REFERENCE_TSC_PAGE > -{ > -uint32_t TscSequence; > -uint32_t Reserved1; > -uint64_t TscScale; > -int64_t TscOffset; > -uint64_t Reserved2[509]; > -} HV_REFERENCE_TSC_PAGE, *PHV_REFERENCE_TSC_PAGE; > - > static void update_reference_tsc(const struct domain *d, bool initialize) > { > struct viridian_domain *vd = d->arch.hvm.viridian; > @@ -41,18 +33,18 @@ static void update_reference_tsc(const struct domain > *d, bool initialize) > * This enlightenment must be disabled is the host TSC is not > invariant. > * However it is also disabled if vtsc is true (which means rdtsc is > * being emulated). This generally happens when guest TSC freq and > host > - * TSC freq don't match. The TscScale value could be adjusted to cope > + * TSC freq don't match. The tsc_scale value could be adjusted to > cope > * with this, allowing vtsc to be turned off, but support for this is > * not yet present in the hypervisor. Thus is it is possible that > * migrating a Windows VM between hosts of differing TSC frequencies > * may result in large differences in guest performance. Any jump in > * TSC due to migration down-time can, however, be compensated for by > - * setting the TscOffset value (see below). > + * setting the tsc_offset value (see below). > */ > if ( !host_tsc_is_safe() || d->arch.vtsc ) > { > /* > - * The specification states that valid values of TscSequence > range > + * The specification states that valid values of tsc_sequence > range > * from 0 to 0xFFFE. The value 0x is used to indicate > * this mechanism is no longer a reliable source of time and that > * the VM should fall back to a different source. > @@ -61,7 +53,7 @@ static void update_reference_tsc(const struct domain *d, > bool initialize) > * violate the spec. and rely on a value of 0 to indicate that > this > * enlightenment should no longer be used. > */ > -p->TscSequence = 0; > +p->tsc_sequence = 0; > > printk(XENLOG_G_INFO "d%d: VIRIDIAN REFERENCE_TSC: > invalidated\n", > d->domain_id); > @@ -72,29 +64,29 @@ static void update_reference_tsc(const struct domain > *d, bool initialize) > * The guest will calculate reference time according to the following > * formula: > * > - * ReferenceTime = ((RDTSC() * TscScale) >> 64) + TscOffset > + * ReferenceTime = ((RDTSC() * tsc_scale) >> 64) + tsc_offset > * > * Windows uses a 100ns tick, so we need a scale which is cpu > * ticks per 100ns shifted left by 64. > * The offset value is calculated on restore after migration and > * ensures that Windows will not see a large jump in ReferenceTime. > */ > -p->TscScale = ((1ul << 32) / d->arch.tsc_khz) << 32; > -p->TscOffset = trc->off; > +p->tsc_scale = ((1ul << 32) / d->arch.tsc_khz) << 32; > +p->tsc_offset = trc->off; > smp_wmb(); > > -seq = p->TscSequence + 1; > +seq = p->tsc_sequence + 1; > if ( seq == 0x || seq == 0 ) /* Avoid both 'invalid' values > */ > seq = 1; > > -p->TscSequence = seq; > +p->tsc_sequence = seq; > } > > /* > * The specification says: "The partition reference time is computed > * by the following formula: > * > - * ReferenceTime = ((VirtualTsc * TscScale) >> 64) + TscOffset > + * ReferenceTime = ((VirtualTsc * tsc_scale) >> 64) + tsc_offset I'd prefer keeping the CamelCase here as it's text lifted from the TLFS and not reliant on the header definitions. Paul > * > * The multiplication is a 64 bit multiplication, which results in a > * 128 bit number which is then shifted 64 times to the right to obtain > -- > 2.20.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH] tools/python: Python 3 compatibility
convert-legacy-stream is only used for incomming migration from pre Xen 4.7, and verify-stream-v2 appears to only be used by me during migration development - it is little surprise that they missed the main converstion effort in Xen 4.13. Fix it all up. Move open_file_or_fd() into a new util.py to avoid duplication, making it a more generic wrapper around open() or fdopen(). Signed-off-by: Andrew Cooper --- CC: Ian Jackson CC: Wei Liu This needs backporting to 4.13 ASAP --- tools/python/scripts/convert-legacy-stream | 49 +++--- tools/python/scripts/verify-stream-v2 | 43 +- tools/python/xen/migration/libxc.py| 2 +- tools/python/xen/migration/libxl.py| 2 +- tools/python/xen/migration/verify.py | 4 +-- tools/python/xen/util.py | 23 ++ 6 files changed, 46 insertions(+), 77 deletions(-) create mode 100644 tools/python/xen/util.py diff --git a/tools/python/scripts/convert-legacy-stream b/tools/python/scripts/convert-legacy-stream index 5f80f13654..b0d81aa92e 100755 --- a/tools/python/scripts/convert-legacy-stream +++ b/tools/python/scripts/convert-legacy-stream @@ -5,6 +5,8 @@ Convert a legacy migration stream to a v2 stream. """ +from __future__ import print_function + import sys import os, os.path import syslog @@ -12,6 +14,7 @@ import traceback from struct import calcsize, unpack, pack +from xen.util import open_file_or_fd as open_file_or_fd from xen.migration import legacy, public, libxc, libxl, xl __version__ = 1 @@ -39,16 +42,16 @@ def info(msg): for line in msg.split("\n"): syslog.syslog(syslog.LOG_INFO, line) else: -print msg +print(msg) def err(msg): """Error message, routed to appropriate destination""" if log_to_syslog: for line in msg.split("\n"): syslog.syslog(syslog.LOG_ERR, line) -print >> sys.stderr, msg +print(msg, file = sys.stderr) -class StreamError(StandardError): +class StreamError(Exception): """Error with the incoming migration stream""" pass @@ -70,7 +73,7 @@ class VM(object): # libxl self.libxl = fmt == "libxl" -self.emu_xenstore = "" # NUL terminated key pairs from "toolstack" records +self.emu_xenstore = b"" # NUL terminated key pairs from "toolstack" records def write_libxc_ihdr(): stream_write(pack(libxc.IHDR_FORMAT, @@ -336,7 +339,7 @@ def read_libxl_toolstack(vm, data): if twidth == 64: name = name[:-4] -if name[-1] != '\x00': +if name[-1] != b'\x00': raise StreamError("physmap name not NUL terminated") root = "physmap/%x" % (phys,) @@ -347,7 +350,7 @@ def read_libxl_toolstack(vm, data): for key, val in zip(kv[0::2], kv[1::2]): info("'%s' = '%s'" % (key, val)) -vm.emu_xenstore += '\x00'.join(kv) + '\x00' +vm.emu_xenstore += b'\x00'.join(kv) + b'\x00' def read_chunks(vm): @@ -534,7 +537,7 @@ def read_qemu(vm): sig, = unpack("21s", rawsig) info("Qemu signature: %s" % (sig, )) -if sig == "DeviceModelRecord0002": +if sig == b"DeviceModelRecord0002": rawsz = rdexact(4) sz, = unpack("I", rawsz) qdata = rdexact(sz) @@ -617,36 +620,6 @@ def read_legacy_stream(vm): return 2 return 0 -def open_file_or_fd(val, mode): -""" -If 'val' looks like a decimal integer, open it as an fd. If not, try to -open it as a regular file. -""" - -fd = -1 -try: -# Does it look like an integer? -try: -fd = int(val, 10) -except ValueError: -pass - -# Try to open it... -if fd != -1: -return os.fdopen(fd, mode, 0) -else: -return open(val, mode, 0) - -except StandardError, e: -if fd != -1: -err("Unable to open fd %d: %s: %s" % -(fd, e.__class__.__name__, e)) -else: -err("Unable to open file '%s': %s: %s" % -(val, e.__class__.__name__, e)) - -raise SystemExit(1) - def main(): from optparse import OptionParser @@ -723,7 +696,7 @@ def main(): if __name__ == "__main__": try: sys.exit(main()) -except SystemExit, e: +except SystemExit as e: sys.exit(e.code) except KeyboardInterrupt: sys.exit(1) diff --git a/tools/python/scripts/verify-stream-v2 b/tools/python/scripts/verify-stream-v2 index 3daf25791e..8355c2d206 100755 --- a/tools/python/scripts/verify-stream-v2 +++ b/tools/python/scripts/verify-stream-v2 @@ -3,12 +3,15 @@ """ Verify a v2 format migration stream """ +from __future__ import print_function + import sys import struct import os, os.path import syslog import traceback +from xen.util import open_file_or_fd as open_file_or_fd from xen.migration.verify import StreamError,
Re: [Xen-devel] [PATCH v2 3/6] x86/viridian: drop private copy of definitions from synic.c
> -Original Message- > From: Wei Liu On Behalf Of Wei Liu > Sent: 18 December 2019 14:43 > To: Xen Development List > Cc: Michael Kelley ; Durrant, Paul > ; Wei Liu ; Paul Durrant > ; Jan Beulich ; Andrew Cooper > ; Wei Liu ; Roger Pau Monné > > Subject: [PATCH v2 3/6] x86/viridian: drop private copy of definitions > from synic.c > > Use hyperv-tlfs.h instead. No functional change intended. > > Signed-off-by: Wei Liu Reviewed-by: Paul Durrant ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2 1/6] x86: import hyperv-tlfs.h from Linux
> -Original Message- > From: Wei Liu On Behalf Of Wei Liu > Sent: 18 December 2019 14:42 > To: Xen Development List > Cc: Michael Kelley ; Durrant, Paul > ; Wei Liu ; Jan Beulich > ; Andrew Cooper ; Wei Liu > ; Roger Pau Monné > Subject: [PATCH v2 1/6] x86: import hyperv-tlfs.h from Linux > > Take a pristine copy from Linux commit > b2d8b167e15bb5ec2691d1119c025630a247f649. > > Do the following to fix it up for Xen: > > 1. include xen/types.h and xen/bitops.h > 2. fix up invocations of BIT macro > > Signed-off-by: Wei Liu > Acked-by: Jan Beulich [snip] > +/* > + * The guest OS needs to register the guest ID with the hypervisor. > + * The guest ID is a 64 bit entity and the structure of this ID is > + * specified in the Hyper-V specification: > + * > + * msdn.microsoft.com/en- > us/library/windows/hardware/ff542653%28v=vs.85%29.aspx > + * > + * While the current guideline does not specify how Linux guest ID(s) > + * need to be generated, our plan is to publish the guidelines for > + * Linux and other guest operating systems that currently are hosted > + * on Hyper-V. The implementation here conforms to this yet > + * unpublished guidelines. > + * > + * > + * Bit(s) > + * 63 - Indicates if the OS is Open Source or not; 1 is Open Source > + * 62:56 - Os Type; Linux is 0x100 > + * 55:48 - Distro specific identification > + * 47:16 - Linux kernel version number > + * 15:0 - Distro specific identification > + * > + * It might be useful to pull the declaration of union viridian_guest_os_id_msr in here since the comment is explaining the format. Paul ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2 2/6] x86/viridian: drop duplicate defines from private.h and viridian.c
> -Original Message- > From: Wei Liu On Behalf Of Wei Liu > Sent: 18 December 2019 14:42 > To: Xen Development List > Cc: Michael Kelley ; Durrant, Paul > ; Wei Liu ; Paul Durrant > ; Jan Beulich ; Andrew Cooper > ; Wei Liu ; Roger Pau Monné > > Subject: [PATCH v2 2/6] x86/viridian: drop duplicate defines from > private.h and viridian.c > > No functional change intended. > > Signed-off-by: Wei Liu [snip] > diff --git a/xen/arch/x86/hvm/viridian/viridian.c > b/xen/arch/x86/hvm/viridian/viridian.c > index 4b06b78a27..76f6b6510b 100644 > --- a/xen/arch/x86/hvm/viridian/viridian.c > +++ b/xen/arch/x86/hvm/viridian/viridian.c > @@ -10,6 +10,7 @@ > #include > #include > #include > +#include > #include > #include > #include > @@ -19,22 +20,10 @@ > > #include "private.h" > > -/* Viridian Hypercall Status Codes. */ > -#define HV_STATUS_SUCCESS 0x > -#define HV_STATUS_INVALID_HYPERCALL_CODE0x0002 > -#define HV_STATUS_INVALID_PARAMETER 0x0005 > - > /* Viridian Hypercall Codes. */ > -#define HvFlushVirtualAddressSpace 0x0002 > -#define HvFlushVirtualAddressList 0x0003 > -#define HvNotifyLongSpinWait 0x0008 > -#define HvSendSyntheticClusterIpi 0x000b > #define HvGetPartitionId 0x0046 > #define HvExtCallQueryCapabilities 0x8001 These ought to be added to hyperv-tlfs.h. After all they are specified in the TLFS. Paul ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] tools/python: Drop test.py
On Wed, Dec 18, 2019 at 01:50:06PM +, Andrew Cooper wrote: > This file hasn't been touched since it was introduced in 2005 (c/s 0c6f36628) > and has a wildly obsolete shebang for Python 2.3. Most importantly for us is > that it isn't Python 3 compatible. > > Drop the file entirely. Since the 2.3 days, automatic discovery of tests has > been included in standard functionality. Rewrite the test rule to use > "$(PYTHON) -m unittest discover" which is equivelent. > > Dropping test.py drops the only piece of ZPL-2.0 code in the tree. Drop the > ancillary files, and adjust COPYING to match. > > Signed-off-by: Andrew Cooper Acked-by: Wei Liu ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v2 3/6] x86/viridian: drop private copy of definitions from synic.c
Use hyperv-tlfs.h instead. No functional change intended. Signed-off-by: Wei Liu --- xen/arch/x86/hvm/viridian/synic.c | 68 --- 1 file changed, 16 insertions(+), 52 deletions(-) diff --git a/xen/arch/x86/hvm/viridian/synic.c b/xen/arch/x86/hvm/viridian/synic.c index 2791021bcc..54c62f843f 100644 --- a/xen/arch/x86/hvm/viridian/synic.c +++ b/xen/arch/x86/hvm/viridian/synic.c @@ -12,58 +12,22 @@ #include #include +#include #include #include #include "private.h" -typedef struct _HV_VIRTUAL_APIC_ASSIST -{ -uint32_t no_eoi:1; -uint32_t reserved_zero:31; -} HV_VIRTUAL_APIC_ASSIST; - -typedef union _HV_VP_ASSIST_PAGE -{ -HV_VIRTUAL_APIC_ASSIST ApicAssist; -uint8_t ReservedZBytePadding[PAGE_SIZE]; -} HV_VP_ASSIST_PAGE; - -typedef enum HV_MESSAGE_TYPE { -HvMessageTypeNone, -HvMessageTimerExpired = 0x8010, -} HV_MESSAGE_TYPE; - -typedef struct HV_MESSAGE_FLAGS { -uint8_t MessagePending:1; -uint8_t Reserved:7; -} HV_MESSAGE_FLAGS; - -typedef struct HV_MESSAGE_HEADER { -HV_MESSAGE_TYPE MessageType; -uint16_t Reserved1; -HV_MESSAGE_FLAGS MessageFlags; -uint8_t PayloadSize; -uint64_t Reserved2; -} HV_MESSAGE_HEADER; - -#define HV_MESSAGE_SIZE 256 -#define HV_MESSAGE_MAX_PAYLOAD_QWORD_COUNT 30 - -typedef struct HV_MESSAGE { -HV_MESSAGE_HEADER Header; -uint64_t Payload[HV_MESSAGE_MAX_PAYLOAD_QWORD_COUNT]; -} HV_MESSAGE; void __init __maybe_unused build_assertions(void) { -BUILD_BUG_ON(sizeof(HV_MESSAGE) != HV_MESSAGE_SIZE); +BUILD_BUG_ON(sizeof(struct hv_message) != HV_MESSAGE_SIZE); } void viridian_apic_assist_set(const struct vcpu *v) { struct viridian_vcpu *vv = v->arch.hvm.viridian; -HV_VP_ASSIST_PAGE *ptr = vv->vp_assist.ptr; +struct hv_vp_assist_page *ptr = vv->vp_assist.ptr; if ( !ptr ) return; @@ -77,18 +41,18 @@ void viridian_apic_assist_set(const struct vcpu *v) domain_crash(v->domain); vv->apic_assist_pending = true; -ptr->ApicAssist.no_eoi = 1; +ptr->apic_assist = 1; } bool viridian_apic_assist_completed(const struct vcpu *v) { struct viridian_vcpu *vv = v->arch.hvm.viridian; -HV_VP_ASSIST_PAGE *ptr = vv->vp_assist.ptr; +struct hv_vp_assist_page *ptr = vv->vp_assist.ptr; if ( !ptr ) return false; -if ( vv->apic_assist_pending && !ptr->ApicAssist.no_eoi ) +if ( vv->apic_assist_pending && !ptr->apic_assist ) { /* An EOI has been avoided */ vv->apic_assist_pending = false; @@ -101,12 +65,12 @@ bool viridian_apic_assist_completed(const struct vcpu *v) void viridian_apic_assist_clear(const struct vcpu *v) { struct viridian_vcpu *vv = v->arch.hvm.viridian; -HV_VP_ASSIST_PAGE *ptr = vv->vp_assist.ptr; +struct hv_vp_assist_page *ptr = vv->vp_assist.ptr; if ( !ptr ) return; -ptr->ApicAssist.no_eoi = 0; +ptr->apic_assist = 0; vv->apic_assist_pending = false; } @@ -358,7 +322,7 @@ bool viridian_synic_deliver_timer_msg(struct vcpu *v, unsigned int sintx, { struct viridian_vcpu *vv = v->arch.hvm.viridian; const union viridian_sint_msr *vs = >sint[sintx]; -HV_MESSAGE *msg = vv->simp.ptr; +struct hv_message *msg = vv->simp.ptr; struct { uint32_t TimerIndex; uint32_t Reserved; @@ -382,19 +346,19 @@ bool viridian_synic_deliver_timer_msg(struct vcpu *v, unsigned int sintx, msg += sintx; -if ( msg->Header.MessageType != HvMessageTypeNone ) +if ( msg->header.message_type != HVMSG_NONE ) { -msg->Header.MessageFlags.MessagePending = 1; +msg->header.message_flags.msg_pending = 1; __set_bit(sintx, >msg_pending); return false; } -msg->Header.MessageType = HvMessageTimerExpired; -msg->Header.MessageFlags.MessagePending = 0; -msg->Header.PayloadSize = sizeof(payload); +msg->header.message_type = HVMSG_TIMER_EXPIRED; +msg->header.message_flags.msg_pending = 0; +msg->header.payload_size = sizeof(payload); -BUILD_BUG_ON(sizeof(payload) > sizeof(msg->Payload)); -memcpy(msg->Payload, , sizeof(payload)); +BUILD_BUG_ON(sizeof(payload) > sizeof(msg->u.payload)); +memcpy(msg->u.payload, , sizeof(payload)); if ( !vs->mask ) vlapic_set_irq(vcpu_vlapic(v), vs->vector, 0); -- 2.20.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v2 6/6] x86: implement Hyper-V clock source
Implement a clock source using Hyper-V's reference TSC page. Signed-off-by: Wei Liu --- v2: 1. Address Jan's comments. Relevant spec: https://github.com/MicrosoftDocs/Virtualization-Documentation/raw/live/tlfs/Hypervisor%20Top%20Level%20Functional%20Specification%20v5.0C.pdf Section 12.6. --- xen/arch/x86/time.c | 101 1 file changed, 101 insertions(+) diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c index 216169a025..8b96b2e9a5 100644 --- a/xen/arch/x86/time.c +++ b/xen/arch/x86/time.c @@ -31,6 +31,7 @@ #include #include #include +#include #include #include #include @@ -644,6 +645,103 @@ static struct platform_timesource __initdata plt_xen_timer = }; #endif +#ifdef CONFIG_HYPERV_GUEST +/ + * HYPER-V REFERENCE TSC + */ + +static struct ms_hyperv_tsc_page *hyperv_tsc; +static struct page_info *hyperv_tsc_page; + +static int64_t __init init_hyperv_timer(struct platform_timesource *pts) +{ +paddr_t maddr; +uint64_t tsc_msr, freq; + +if ( !(ms_hyperv.features & HV_MSR_REFERENCE_TSC_AVAILABLE) ) +return 0; + +hyperv_tsc_page = alloc_domheap_page(NULL, 0); +if ( !hyperv_tsc_page ) +return 0; + +hyperv_tsc = __map_domain_page_global(hyperv_tsc_page); +if ( !hyperv_tsc ) +{ +free_domheap_page(hyperv_tsc_page); +hyperv_tsc_page = NULL; +return 0; +} + +maddr = page_to_maddr(hyperv_tsc_page); + +/* + * Per Hyper-V TLFS: + * 1. Read existing MSR value + * 2. Preserve bits [11:1] + * 3. Set bits [63:12] to be guest physical address of tsc page + * 4. Set enabled bit (0) + * 5. Write back new MSR value + */ +rdmsrl(HV_X64_MSR_REFERENCE_TSC, tsc_msr); +tsc_msr &= 0xffeULL; +tsc_msr |= maddr | 1 /* enabled */; +wrmsrl(HV_X64_MSR_REFERENCE_TSC, tsc_msr); + +/* Get TSC frequency from Hyper-V */ +rdmsrl(HV_X64_MSR_TSC_FREQUENCY, freq); +pts->frequency = freq; + +return freq; +} + +static inline uint64_t read_hyperv_timer(void) +{ +uint64_t scale, offset, ret, tsc; +uint32_t seq; +const struct ms_hyperv_tsc_page *tsc_page = hyperv_tsc; + +do { +seq = tsc_page->tsc_sequence; + +/* Seq 0 is special. It means the TSC enlightenment is not + * available at the moment. The reference time can only be + * obtained from the Reference Counter MSR. + */ +if ( seq == 0 ) +{ +rdmsrl(HV_X64_MSR_TIME_REF_COUNT, ret); +return ret; +} + +/* rdtsc_ordered already contains a load fence */ +tsc = rdtsc_ordered(); +scale = tsc_page->tsc_scale; +offset = tsc_page->tsc_offset; + +smp_rmb(); + +} while (tsc_page->tsc_sequence != seq); + +/* ret = ((tsc * scale) >> 64) + offset; */ +asm ( "mul %[scale]; add %[offset], %[ret]" + : "+a" (tsc), [ret] "=d" (ret) + : [scale] "rm" (scale), [offset] "rm" (offset) ); + +return ret; +} + +static struct platform_timesource __initdata plt_hyperv_timer = +{ +.id = "hyperv", +.name = "HYPER-V REFERENCE TSC", +.read_counter = read_hyperv_timer, +.init = init_hyperv_timer, +/* See TSC time source for why counter_bits is set to 63 */ +.counter_bits = 63, +}; +#endif + / * GENERIC PLATFORM TIMER INFRASTRUCTURE */ @@ -793,6 +891,9 @@ static u64 __init init_platform_timer(void) static struct platform_timesource * __initdata plt_timers[] = { #ifdef CONFIG_XEN_GUEST _xen_timer, +#endif +#ifdef CONFIG_HYPERV_GUEST +_hyperv_timer, #endif _hpet, _pmtimer, _pit }; -- 2.20.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v2 4/6] x86/viridian: drop private copy of HV_REFERENCE_TSC_PAGE in time.c
Use the one defined in hyperv-tlfs.h instead. No functional change intended. Signed-off-by: Wei Liu --- xen/arch/x86/hvm/viridian/time.c | 30 +++--- 1 file changed, 11 insertions(+), 19 deletions(-) diff --git a/xen/arch/x86/hvm/viridian/time.c b/xen/arch/x86/hvm/viridian/time.c index 6ddca29b29..33c15782e4 100644 --- a/xen/arch/x86/hvm/viridian/time.c +++ b/xen/arch/x86/hvm/viridian/time.c @@ -13,19 +13,11 @@ #include #include +#include #include #include "private.h" -typedef struct _HV_REFERENCE_TSC_PAGE -{ -uint32_t TscSequence; -uint32_t Reserved1; -uint64_t TscScale; -int64_t TscOffset; -uint64_t Reserved2[509]; -} HV_REFERENCE_TSC_PAGE, *PHV_REFERENCE_TSC_PAGE; - static void update_reference_tsc(const struct domain *d, bool initialize) { struct viridian_domain *vd = d->arch.hvm.viridian; @@ -41,18 +33,18 @@ static void update_reference_tsc(const struct domain *d, bool initialize) * This enlightenment must be disabled is the host TSC is not invariant. * However it is also disabled if vtsc is true (which means rdtsc is * being emulated). This generally happens when guest TSC freq and host - * TSC freq don't match. The TscScale value could be adjusted to cope + * TSC freq don't match. The tsc_scale value could be adjusted to cope * with this, allowing vtsc to be turned off, but support for this is * not yet present in the hypervisor. Thus is it is possible that * migrating a Windows VM between hosts of differing TSC frequencies * may result in large differences in guest performance. Any jump in * TSC due to migration down-time can, however, be compensated for by - * setting the TscOffset value (see below). + * setting the tsc_offset value (see below). */ if ( !host_tsc_is_safe() || d->arch.vtsc ) { /* - * The specification states that valid values of TscSequence range + * The specification states that valid values of tsc_sequence range * from 0 to 0xFFFE. The value 0x is used to indicate * this mechanism is no longer a reliable source of time and that * the VM should fall back to a different source. @@ -61,7 +53,7 @@ static void update_reference_tsc(const struct domain *d, bool initialize) * violate the spec. and rely on a value of 0 to indicate that this * enlightenment should no longer be used. */ -p->TscSequence = 0; +p->tsc_sequence = 0; printk(XENLOG_G_INFO "d%d: VIRIDIAN REFERENCE_TSC: invalidated\n", d->domain_id); @@ -72,29 +64,29 @@ static void update_reference_tsc(const struct domain *d, bool initialize) * The guest will calculate reference time according to the following * formula: * - * ReferenceTime = ((RDTSC() * TscScale) >> 64) + TscOffset + * ReferenceTime = ((RDTSC() * tsc_scale) >> 64) + tsc_offset * * Windows uses a 100ns tick, so we need a scale which is cpu * ticks per 100ns shifted left by 64. * The offset value is calculated on restore after migration and * ensures that Windows will not see a large jump in ReferenceTime. */ -p->TscScale = ((1ul << 32) / d->arch.tsc_khz) << 32; -p->TscOffset = trc->off; +p->tsc_scale = ((1ul << 32) / d->arch.tsc_khz) << 32; +p->tsc_offset = trc->off; smp_wmb(); -seq = p->TscSequence + 1; +seq = p->tsc_sequence + 1; if ( seq == 0x || seq == 0 ) /* Avoid both 'invalid' values */ seq = 1; -p->TscSequence = seq; +p->tsc_sequence = seq; } /* * The specification says: "The partition reference time is computed * by the following formula: * - * ReferenceTime = ((VirtualTsc * TscScale) >> 64) + TscOffset + * ReferenceTime = ((VirtualTsc * tsc_scale) >> 64) + tsc_offset * * The multiplication is a 64 bit multiplication, which results in a * 128 bit number which is then shifted 64 times to the right to obtain -- 2.20.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v2 0/6] Implement Hyper-V reference TSC based clock source
Hi all This series adds a clock source based on Hyper-V's reference TSC. The meat is in the last patch. I also put in some clean up patches to Xen's viridian code per Paul's request. With this series, Xen on Hyper-V no longer runs on emulated PIT. (XEN) Platform timer is 2294.686MHz HYPER-V REFERENCE TSC Wei. Cc: Jan Beulich Cc: Andrew Cooper Cc: Wei Liu Cc: Roger Pau Monné Cc: Paul Durrant Wei Liu (6): x86: import hyperv-tlfs.h from Linux x86/viridian: drop duplicate defines from private.h and viridian.c x86/viridian: drop private copy of definitions from synic.c x86/viridian: drop private copy of HV_REFERENCE_TSC_PAGE in time.c x86/hyperv: extract more information from Hyper-V x86: implement Hyper-V clock source xen/arch/x86/guest/hyperv/hyperv.c | 17 + xen/arch/x86/hvm/viridian/private.h | 66 -- xen/arch/x86/hvm/viridian/synic.c | 68 +- xen/arch/x86/hvm/viridian/time.c| 30 +- xen/arch/x86/hvm/viridian/viridian.c| 23 +- xen/arch/x86/time.c | 101 +++ xen/include/asm-x86/guest/hyperv-tlfs.h | 907 xen/include/asm-x86/guest/hyperv.h | 12 + 8 files changed, 1070 insertions(+), 154 deletions(-) create mode 100644 xen/include/asm-x86/guest/hyperv-tlfs.h -- 2.20.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v2 1/6] x86: import hyperv-tlfs.h from Linux
Take a pristine copy from Linux commit b2d8b167e15bb5ec2691d1119c025630a247f649. Do the following to fix it up for Xen: 1. include xen/types.h and xen/bitops.h 2. fix up invocations of BIT macro Signed-off-by: Wei Liu Acked-by: Jan Beulich --- xen/include/asm-x86/guest/hyperv-tlfs.h | 907 1 file changed, 907 insertions(+) create mode 100644 xen/include/asm-x86/guest/hyperv-tlfs.h diff --git a/xen/include/asm-x86/guest/hyperv-tlfs.h b/xen/include/asm-x86/guest/hyperv-tlfs.h new file mode 100644 index 00..ccd9850b27 --- /dev/null +++ b/xen/include/asm-x86/guest/hyperv-tlfs.h @@ -0,0 +1,907 @@ +/* SPDX-License-Identifier: GPL-2.0 */ + +/* + * This file contains definitions from Hyper-V Hypervisor Top-Level Functional + * Specification (TLFS): + * https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/reference/tlfs + */ + +#ifndef _ASM_X86_HYPERV_TLFS_H +#define _ASM_X86_HYPERV_TLFS_H + +#include +#include +#include + +/* + * While not explicitly listed in the TLFS, Hyper-V always runs with a page size + * of 4096. These definitions are used when communicating with Hyper-V using + * guest physical pages and guest physical page addresses, since the guest page + * size may not be 4096 on all architectures. + */ +#define HV_HYP_PAGE_SHIFT 12 +#define HV_HYP_PAGE_SIZE BIT(HV_HYP_PAGE_SHIFT, UL) +#define HV_HYP_PAGE_MASK (~(HV_HYP_PAGE_SIZE - 1)) + +/* + * The below CPUID leaves are present if VersionAndFeatures.HypervisorPresent + * is set by CPUID(HvCpuIdFunctionVersionAndFeatures). + */ +#define HYPERV_CPUID_VENDOR_AND_MAX_FUNCTIONS 0x4000 +#define HYPERV_CPUID_INTERFACE 0x4001 +#define HYPERV_CPUID_VERSION 0x4002 +#define HYPERV_CPUID_FEATURES 0x4003 +#define HYPERV_CPUID_ENLIGHTMENT_INFO 0x4004 +#define HYPERV_CPUID_IMPLEMENT_LIMITS 0x4005 +#define HYPERV_CPUID_NESTED_FEATURES 0x400A + +#define HYPERV_HYPERVISOR_PRESENT_BIT 0x8000 +#define HYPERV_CPUID_MIN 0x4005 +#define HYPERV_CPUID_MAX 0x4000 + +/* + * Feature identification. EAX indicates which features are available + * to the partition based upon the current partition privileges. + * These are HYPERV_CPUID_FEATURES.EAX bits. + */ + +/* VP Runtime (HV_X64_MSR_VP_RUNTIME) available */ +#define HV_X64_MSR_VP_RUNTIME_AVAILABLEBIT(0, UL) +/* Partition Reference Counter (HV_X64_MSR_TIME_REF_COUNT) available*/ +#define HV_MSR_TIME_REF_COUNT_AVAILABLEBIT(1, UL) +/* + * Basic SynIC MSRs (HV_X64_MSR_SCONTROL through HV_X64_MSR_EOM + * and HV_X64_MSR_SINT0 through HV_X64_MSR_SINT15) available + */ +#define HV_X64_MSR_SYNIC_AVAILABLE BIT(2, UL) +/* + * Synthetic Timer MSRs (HV_X64_MSR_STIMER0_CONFIG through + * HV_X64_MSR_STIMER3_COUNT) available + */ +#define HV_MSR_SYNTIMER_AVAILABLE BIT(3, UL) +/* + * APIC access MSRs (HV_X64_MSR_EOI, HV_X64_MSR_ICR and HV_X64_MSR_TPR) + * are available + */ +#define HV_X64_MSR_APIC_ACCESS_AVAILABLE BIT(4, UL) +/* Hypercall MSRs (HV_X64_MSR_GUEST_OS_ID and HV_X64_MSR_HYPERCALL) available*/ +#define HV_X64_MSR_HYPERCALL_AVAILABLE BIT(5, UL) +/* Access virtual processor index MSR (HV_X64_MSR_VP_INDEX) available*/ +#define HV_X64_MSR_VP_INDEX_AVAILABLE BIT(6, UL) +/* Virtual system reset MSR (HV_X64_MSR_RESET) is available*/ +#define HV_X64_MSR_RESET_AVAILABLE BIT(7, UL) +/* + * Access statistics pages MSRs (HV_X64_MSR_STATS_PARTITION_RETAIL_PAGE, + * HV_X64_MSR_STATS_PARTITION_INTERNAL_PAGE, HV_X64_MSR_STATS_VP_RETAIL_PAGE, + * HV_X64_MSR_STATS_VP_INTERNAL_PAGE) available + */ +#define HV_X64_MSR_STAT_PAGES_AVAILABLEBIT(8, UL) +/* Partition reference TSC MSR is available */ +#define HV_MSR_REFERENCE_TSC_AVAILABLE BIT(9, UL) +/* Partition Guest IDLE MSR is available */ +#define HV_X64_MSR_GUEST_IDLE_AVAILABLEBIT(10, UL) +/* + * There is a single feature flag that signifies if the partition has access + * to MSRs with local APIC and TSC frequencies. + */ +#define HV_X64_ACCESS_FREQUENCY_MSRS BIT(11, UL) +/* AccessReenlightenmentControls privilege */ +#define HV_X64_ACCESS_REENLIGHTENMENT BIT(13, UL) + +/* + * Feature identification: indicates which flags were specified at partition + * creation. The format is the same as the partition creation flag structure + * defined in section Partition Creation Flags. + * These are HYPERV_CPUID_FEATURES.EBX bits. + */ +#define HV_X64_CREATE_PARTITIONS BIT(0, UL) +#define HV_X64_ACCESS_PARTITION_ID BIT(1, UL) +#define HV_X64_ACCESS_MEMORY_POOL BIT(2, UL) +#define HV_X64_ADJUST_MESSAGE_BUFFERS BIT(3, UL) +#define HV_X64_POST_MESSAGES BIT(4, UL) +#define HV_X64_SIGNAL_EVENTS BIT(5, UL) +#define HV_X64_CREATE_PORT
[Xen-devel] [PATCH v2 2/6] x86/viridian: drop duplicate defines from private.h and viridian.c
No functional change intended. Signed-off-by: Wei Liu --- xen/arch/x86/hvm/viridian/private.h | 66 xen/arch/x86/hvm/viridian/viridian.c | 23 +++--- 2 files changed, 6 insertions(+), 83 deletions(-) diff --git a/xen/arch/x86/hvm/viridian/private.h b/xen/arch/x86/hvm/viridian/private.h index c272c34cda..958a2814c2 100644 --- a/xen/arch/x86/hvm/viridian/private.h +++ b/xen/arch/x86/hvm/viridian/private.h @@ -5,72 +5,6 @@ #include -/* Viridian MSR numbers. */ -#define HV_X64_MSR_GUEST_OS_ID 0x4000 -#define HV_X64_MSR_HYPERCALL 0x4001 -#define HV_X64_MSR_VP_INDEX 0x4002 -#define HV_X64_MSR_RESET 0x4003 -#define HV_X64_MSR_VP_RUNTIME0x4010 -#define HV_X64_MSR_TIME_REF_COUNT0x4020 -#define HV_X64_MSR_REFERENCE_TSC 0x4021 -#define HV_X64_MSR_TSC_FREQUENCY 0x4022 -#define HV_X64_MSR_APIC_FREQUENCY0x4023 -#define HV_X64_MSR_EOI 0x4070 -#define HV_X64_MSR_ICR 0x4071 -#define HV_X64_MSR_TPR 0x4072 -#define HV_X64_MSR_VP_ASSIST_PAGE0x4073 -#define HV_X64_MSR_SCONTROL 0x4080 -#define HV_X64_MSR_SVERSION 0x4081 -#define HV_X64_MSR_SIEFP 0x4082 -#define HV_X64_MSR_SIMP 0x4083 -#define HV_X64_MSR_EOM 0x4084 -#define HV_X64_MSR_SINT0 0x4090 -#define HV_X64_MSR_SINT1 0x4091 -#define HV_X64_MSR_SINT2 0x4092 -#define HV_X64_MSR_SINT3 0x4093 -#define HV_X64_MSR_SINT4 0x4094 -#define HV_X64_MSR_SINT5 0x4095 -#define HV_X64_MSR_SINT6 0x4096 -#define HV_X64_MSR_SINT7 0x4097 -#define HV_X64_MSR_SINT8 0x4098 -#define HV_X64_MSR_SINT9 0x4099 -#define HV_X64_MSR_SINT100x409A -#define HV_X64_MSR_SINT110x409B -#define HV_X64_MSR_SINT120x409C -#define HV_X64_MSR_SINT130x409D -#define HV_X64_MSR_SINT140x409E -#define HV_X64_MSR_SINT150x409F -#define HV_X64_MSR_STIMER0_CONFIG0x40B0 -#define HV_X64_MSR_STIMER0_COUNT 0x40B1 -#define HV_X64_MSR_STIMER1_CONFIG0x40B2 -#define HV_X64_MSR_STIMER1_COUNT 0x40B3 -#define HV_X64_MSR_STIMER2_CONFIG0x40B4 -#define HV_X64_MSR_STIMER2_COUNT 0x40B5 -#define HV_X64_MSR_STIMER3_CONFIG0x40B6 -#define HV_X64_MSR_STIMER3_COUNT 0x40B7 -#define HV_X64_MSR_POWER_STATE_TRIGGER_C10x40C1 -#define HV_X64_MSR_POWER_STATE_TRIGGER_C20x40C2 -#define HV_X64_MSR_POWER_STATE_TRIGGER_C30x40C3 -#define HV_X64_MSR_POWER_STATE_CONFIG_C1 0x40D1 -#define HV_X64_MSR_POWER_STATE_CONFIG_C2 0x40D2 -#define HV_X64_MSR_POWER_STATE_CONFIG_C3 0x40D3 -#define HV_X64_MSR_STATS_PARTITION_RETAIL_PAGE 0x40E0 -#define HV_X64_MSR_STATS_PARTITION_INTERNAL_PAGE 0x40E1 -#define HV_X64_MSR_STATS_VP_RETAIL_PAGE 0x40E2 -#define HV_X64_MSR_STATS_VP_INTERNAL_PAGE0x40E3 -#define HV_X64_MSR_GUEST_IDLE0x40F0 -#define HV_X64_MSR_SYNTH_DEBUG_CONTROL 0x40F1 -#define HV_X64_MSR_SYNTH_DEBUG_STATUS0x40F2 -#define HV_X64_MSR_SYNTH_DEBUG_SEND_BUFFER 0x40F3 -#define HV_X64_MSR_SYNTH_DEBUG_RECEIVE_BUFFER0x40F4 -#define HV_X64_MSR_SYNTH_DEBUG_PENDING_BUFFER0x40F5 -#define HV_X64_MSR_CRASH_P0 0x4100 -#define HV_X64_MSR_CRASH_P1 0x4101 -#define HV_X64_MSR_CRASH_P2 0x4102 -#define HV_X64_MSR_CRASH_P3 0x4103 -#define HV_X64_MSR_CRASH_P4 0x4104 -#define HV_X64_MSR_CRASH_CTL 0x4105 - int viridian_synic_wrmsr(struct vcpu *v, uint32_t idx, uint64_t val); int viridian_synic_rdmsr(const struct vcpu *v, uint32_t idx, uint64_t *val); diff --git a/xen/arch/x86/hvm/viridian/viridian.c b/xen/arch/x86/hvm/viridian/viridian.c index 4b06b78a27..76f6b6510b 100644 --- a/xen/arch/x86/hvm/viridian/viridian.c +++ b/xen/arch/x86/hvm/viridian/viridian.c @@ -10,6 +10,7 @@ #include #include #include +#include #include #include #include @@ -19,22 +20,10 @@ #include "private.h" -/* Viridian Hypercall Status Codes. */ -#define HV_STATUS_SUCCESS
[Xen-devel] [PATCH v2 5/6] x86/hyperv: extract more information from Hyper-V
Provide a structure to store that information. The structure will be accessed from other places later so make it public. Signed-off-by: Wei Liu Acked-by: Jan Beulich --- xen/arch/x86/guest/hyperv/hyperv.c | 17 + xen/include/asm-x86/guest/hyperv.h | 12 2 files changed, 29 insertions(+) diff --git a/xen/arch/x86/guest/hyperv/hyperv.c b/xen/arch/x86/guest/hyperv/hyperv.c index b82ae3833f..2e70b4aa82 100644 --- a/xen/arch/x86/guest/hyperv/hyperv.c +++ b/xen/arch/x86/guest/hyperv/hyperv.c @@ -21,6 +21,9 @@ #include #include +#include + +struct ms_hyperv_info __read_mostly ms_hyperv; static const struct hypervisor_ops ops = { .name = "Hyper-V", @@ -40,6 +43,20 @@ const struct hypervisor_ops *__init hyperv_probe(void) if ( eax != 0x31237648 )/* Hv#1 */ return NULL; +/* Extract more information from Hyper-V */ +cpuid(HYPERV_CPUID_FEATURES, , , , ); +ms_hyperv.features = eax; +ms_hyperv.misc_features = edx; + +ms_hyperv.hints = cpuid_eax(HYPERV_CPUID_ENLIGHTMENT_INFO); + +if ( ms_hyperv.hints & HV_X64_ENLIGHTENED_VMCS_RECOMMENDED ) +ms_hyperv.nested_features = cpuid_eax(HYPERV_CPUID_NESTED_FEATURES); + +cpuid(HYPERV_CPUID_IMPLEMENT_LIMITS, , , , ); +ms_hyperv.max_vp_index = eax; +ms_hyperv.max_lp_index = ebx; + return } diff --git a/xen/include/asm-x86/guest/hyperv.h b/xen/include/asm-x86/guest/hyperv.h index 3f88b94c77..cc21b9abfc 100644 --- a/xen/include/asm-x86/guest/hyperv.h +++ b/xen/include/asm-x86/guest/hyperv.h @@ -21,8 +21,20 @@ #ifdef CONFIG_HYPERV_GUEST +#include + #include +struct ms_hyperv_info { +uint32_t features; +uint32_t misc_features; +uint32_t hints; +uint32_t nested_features; +uint32_t max_vp_index; +uint32_t max_lp_index; +}; +extern struct ms_hyperv_info ms_hyperv; + const struct hypervisor_ops *hyperv_probe(void); #else -- 2.20.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH] x86/hvm/rtc: preserved guest RTC offset during suspend/resume/migrate
The emulated RTC is synchronized with the PV wallclock; any write to the RTC will update struct domain's 'time_offset_seconds' field and call update_domain_wallclock(). However, the value of 'time_offset_seconds' is not preserved in any save record and indeed, when the RTC save record is loaded, the CMOS values will be updated based on an offset value which may or may not have been set by the toolstack [1]. This may result in making bogus values available to the guest and messing up any calculations done in the call to alarm_timer_update() at the end of rtc_load(). This patch extends the RTC save record to contain an offset value, which will be zero filled on load of an older record. The 'time_offset_secoonds' field in struct domain is also modified into a 'time_offset' struct, containing a 'seconds' field and a boolean 'set' field. The code in rtc_load() then uses the new value in the save record to update the value of struct domain's 'time_offset.seconds' unless 'time_offset.set' is true, which will only be the case if the toolstack has already performed a XEN_DOMCTL_settimeoffset. [1] There is currently no way for a toolstack to read the value of 'time_offset_seconds' from struct domain. In the past, any hope of preservation of the value across a guest life-cycle operation was based on relying on qemu-dm to write a value into xenstore whenever the RTC was updated, in response to an IOREQ with type IOREQ_TYPE_TIMEOFFSET being sent by Xen; see: https://xenbits.xen.org/gitweb/?p=qemu-xen-traditional.git;a=blob;f=i386-dm/helper2.c#l457 but this behaviour was never forward-ported into upstream QEMU, which completely ignores that IOREQ type. In either case, nothing in xl or libxl ever samples the value of RTC offset from xenstore so any offset adjustment to a non-zero value performed by the guest (which in the case of Windows is highly likely as it normally writes RTC in local time, whereas Xen maintains time in UTC) is completely lost with the de-facto toolstack, and always has been. Instead, PV drivers are relied upon to paper over this gaping hole. Signed-off-by: Paul Durrant --- Cc: Stefano Stabellini Cc: Julien Grall Cc: Volodymyr Babchuk Cc: Andrew Cooper Cc: George Dunlap Cc: Ian Jackson Cc: Jan Beulich Cc: Konrad Rzeszutek Wilk Cc: Wei Liu Cc: "Roger Pau Monné" --- xen/arch/arm/platform_hypercall.c | 2 +- xen/arch/arm/time.c| 3 ++- xen/arch/arm/vtimer.c | 4 ++-- xen/arch/x86/hvm/rtc.c | 12 ++-- xen/arch/x86/time.c| 3 ++- xen/common/time.c | 6 +++--- xen/include/public/arch-x86/hvm/save.h | 2 ++ xen/include/xen/sched.h| 5 - 8 files changed, 26 insertions(+), 11 deletions(-) diff --git a/xen/arch/arm/platform_hypercall.c b/xen/arch/arm/platform_hypercall.c index 5aab856ce7..8efac7ee60 100644 --- a/xen/arch/arm/platform_hypercall.c +++ b/xen/arch/arm/platform_hypercall.c @@ -53,7 +53,7 @@ long do_platform_op(XEN_GUEST_HANDLE_PARAM(xen_platform_op_t) u_xenpf_op) if ( likely(!op->u.settime64.mbz) ) do_settime(op->u.settime64.secs, op->u.settime64.nsecs, - op->u.settime64.system_time + SECONDS(d->time_offset_seconds)); + op->u.settime64.system_time + SECONDS(d->time_offset.seconds)); else ret = -EINVAL; break; diff --git a/xen/arch/arm/time.c b/xen/arch/arm/time.c index 739bcf186c..b0021c2c69 100644 --- a/xen/arch/arm/time.c +++ b/xen/arch/arm/time.c @@ -353,7 +353,8 @@ void update_vcpu_system_time(struct vcpu *v) void domain_set_time_offset(struct domain *d, int64_t time_offset_seconds) { -d->time_offset_seconds = time_offset_seconds; +d->time_offset.seconds = time_offset_seconds; +d->time_offset.set = true; /* XXX update guest visible wallclock time */ } diff --git a/xen/arch/arm/vtimer.c b/xen/arch/arm/vtimer.c index e6aebdac9e..240a850b6e 100644 --- a/xen/arch/arm/vtimer.c +++ b/xen/arch/arm/vtimer.c @@ -64,8 +64,8 @@ int domain_vtimer_init(struct domain *d, struct xen_arch_domainconfig *config) { d->arch.phys_timer_base.offset = NOW(); d->arch.virt_timer_base.offset = READ_SYSREG64(CNTPCT_EL0); -d->time_offset_seconds = ticks_to_ns(d->arch.virt_timer_base.offset - boot_count); -do_div(d->time_offset_seconds, 10); +d->time_offset.seconds = ticks_to_ns(d->arch.virt_timer_base.offset - boot_count); +do_div(d->time_offset.seconds, 10); config->clock_frequency = timer_dt_clock_frequency; diff --git a/xen/arch/x86/hvm/rtc.c b/xen/arch/x86/hvm/rtc.c index 42339682e8..bb41efe84a 100644 --- a/xen/arch/x86/hvm/rtc.c +++ b/xen/arch/x86/hvm/rtc.c @@ -594,7 +594,7 @@ static void rtc_set_time(RTCState *s) /* We use the guest's setting of the RTC to define the local-time * offset for
[Xen-devel] [ovmf test] 144927: all pass - PUSHED
flight 144927 ovmf real [real] http://logs.test-lab.xenproject.org/osstest/logs/144927/ Perfect :-) All tests in this flight passed as required version targeted for testing: ovmf 01b6090b75922bc72604c334bd3dc331490af3bb baseline version: ovmf c5d6a57da02774019127e5ac271de274aee0d9e2 Last test of basis 144923 2019-12-18 06:39:22 Z0 days Testing same since 144927 2019-12-18 09:10:04 Z0 days1 attempts People who touched revisions under test: Bob Feng jobs: build-amd64-xsm pass build-i386-xsm pass build-amd64 pass build-i386 pass build-amd64-libvirt pass build-i386-libvirt pass build-amd64-pvopspass build-i386-pvops pass test-amd64-amd64-xl-qemuu-ovmf-amd64 pass test-amd64-i386-xl-qemuu-ovmf-amd64 pass sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Pushing revision : To xenbits.xen.org:/home/xen/git/osstest/ovmf.git c5d6a57da0..01b6090b75 01b6090b75922bc72604c334bd3dc331490af3bb -> xen-tested-master ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v12 2/5] xenbus/backend: Protect xenbus callback with lock
On Wed, 18 Dec 2019 14:30:44 +0100 "Jürgen Groß" wrote: > On 18.12.19 13:42, SeongJae Park wrote: > > On Wed, 18 Dec 2019 13:27:37 +0100 "Jürgen Groß" wrote: > > > >> On 18.12.19 11:42, SeongJae Park wrote: > >>> From: SeongJae Park > >>> > >>> 'reclaim_memory' callback can race with a driver code as this callback > >>> will be called from any memory pressure detected context. To deal with > >>> the case, this commit adds a spinlock in the 'xenbus_device'. Whenever > >>> 'reclaim_memory' callback is called, the lock of the device which passed > >>> to the callback as its argument is locked. Thus, drivers registering > >>> their 'reclaim_memory' callback should protect the data that might race > >>> with the callback with the lock by themselves. > >> > >> Any reason you don't take the lock around the .probe() and .remove() > >> calls of the backend (xenbus_dev_probe() and xenbus_dev_remove())? This > >> would eliminate the need to do that in each backend instead. > > > > First of all, I would like to keep the critical section as small as > > possible. > > With my small test, I could see slightly increasing memory pressure as the > > critical section becomes wider. Also, some drivers might share the data > > their > > 'reclaim_memory' callback touches with other functions. I think only the > > driver owners can know what data is shared and what is the minimum critical > > section to protect it. > > But this kind of serialization can still be added on top. I'm still worrying about the unnecessarily large critical section, but it might be small enough to be ignored. If no others have strong objection, I will take the lock around the '->probe()' and '->remove()'. > > And with the trylock in the reclaim path I believe you can even avoid > the irq variants of the spinlock. But I might be wrong, so you should > try that with lockdep enabled. If it is working there is no harm done > when making the critical section larger, as memory allocations will > work as before. Yes, you're right. I will try test with lockdep. Thanks, SeongJae Park > > > Juergen ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v3 5/7] Add Code Review Guide
Hi Lars, On 12/12/2019 21:14, Lars Kurth wrote: +### Workflow from an Author's Perspective + +When code authors receive feedback on their patches, they typically first try +to clarify feedback they do not understand. For smaller patches or patch series +it makes sense to wait until receiving feedback on the entire series before +sending out a new version addressing the changes. For larger series, it may +make sense to send out a new revision earlier. + +As a reviewer, you need some system that he;ps ensure that you address all Just a small typo: I think you meant "helps" rather than "he;ps". Cheers, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v3 2/2] xen/arm: sign extend writes to TimerValue
Hi Jeff, On 11/12/2019 21:13, Jeff Kubascik wrote: Per the ARMv8 Reference Manual (ARM DDI 0487E.a), section D11.2.4 specifies that the values in the TimerValue view of the timers are signed in standard two's complement form. When writing to the TimerValue Do you mean CompareValue register instead of TimerValue register? register, it should be signed extended as described by the equation CompareValue = (Counter[63:0] + SignExtend(TimerValue))[63:0] This explains the signed part, but it does not explain why the 32-bit case. So I would mention that TimerValue is a 32-bit signed integer. Maybe saying "are 32-bit signed in standard ..." Signed-off-by: Jeff Kubascik --- xen/arch/arm/vtimer.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/xen/arch/arm/vtimer.c b/xen/arch/arm/vtimer.c index 21b98ec20a..872181d9b6 100644 --- a/xen/arch/arm/vtimer.c +++ b/xen/arch/arm/vtimer.c @@ -211,7 +211,7 @@ static bool vtimer_cntp_tval(struct cpu_user_regs *regs, uint32_t *r, } else { -v->arch.phys_timer.cval = cntpct + *r; +v->arch.phys_timer.cval = cntpct + (uint64_t)(int32_t)*r; if ( v->arch.phys_timer.ctl & CNTx_CTL_ENABLE ) { v->arch.phys_timer.ctl &= ~CNTx_CTL_PENDING; Cheers, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v3 1/2] xen/arm: remove physical timer offset
Hi Jeff, On 11/12/2019 21:13, Jeff Kubascik wrote: The physical timer traps apply an offset so that time starts at 0 for the guest. However, this offset is not currently applied to the physical counter. Per the ARMv8 Reference Manual (ARM DDI 0487E.a), section D11.2.4 Timers, the "Offset" between the counter and timer should be zero for a physical timer. This removes the offset to make the timer and counter consistent. This also cleans up the physical timer implementation to better match the virtual timer - both cval's now hold the hardware value. Signed-off-by: Jeff Kubascik --- xen/arch/arm/vtimer.c| 34 ++ xen/include/asm-arm/domain.h | 3 --- 2 files changed, 18 insertions(+), 19 deletions(-) diff --git a/xen/arch/arm/vtimer.c b/xen/arch/arm/vtimer.c index e6aebdac9e..21b98ec20a 100644 --- a/xen/arch/arm/vtimer.c +++ b/xen/arch/arm/vtimer.c @@ -62,7 +62,6 @@ static void virt_timer_expired(void *data) int domain_vtimer_init(struct domain *d, struct xen_arch_domainconfig *config) { -d->arch.phys_timer_base.offset = NOW(); d->arch.virt_timer_base.offset = READ_SYSREG64(CNTPCT_EL0); d->time_offset_seconds = ticks_to_ns(d->arch.virt_timer_base.offset - boot_count); do_div(d->time_offset_seconds, 10); @@ -108,7 +107,6 @@ int vcpu_vtimer_init(struct vcpu *v) init_timer(>timer, phys_timer_expired, t, v->processor); t->ctl = 0; -t->cval = NOW(); t->irq = d0 ? timer_get_irq(TIMER_PHYS_NONSECURE_PPI) : GUEST_TIMER_PHYS_NS_PPI; @@ -167,6 +165,7 @@ void virt_timer_restore(struct vcpu *v) static bool vtimer_cntp_ctl(struct cpu_user_regs *regs, uint32_t *r, bool read) { struct vcpu *v = current; +s_time_t expires; if ( !ACCESS_ALLOWED(regs, EL0PTEN) ) return false; @@ -184,8 +183,9 @@ static bool vtimer_cntp_ctl(struct cpu_user_regs *regs, uint32_t *r, bool read) if ( v->arch.phys_timer.ctl & CNTx_CTL_ENABLE ) { -set_timer(>arch.phys_timer.timer, - v->arch.phys_timer.cval + v->domain->arch.phys_timer_base.offset); +expires = v->arch.phys_timer.cval > boot_count + ? ticks_to_ns(v->arch.phys_timer.cval - boot_count) : 0; +set_timer(>arch.phys_timer.timer, expires); } else stop_timer(>arch.phys_timer.timer); @@ -197,26 +197,27 @@ static bool vtimer_cntp_tval(struct cpu_user_regs *regs, uint32_t *r, bool read) { struct vcpu *v = current; -s_time_t now; +uint64_t cntpct; +s_time_t expires; if ( !ACCESS_ALLOWED(regs, EL0PTEN) ) return false; -now = NOW() - v->domain->arch.phys_timer_base.offset; +cntpct = get_cycles(); if ( read ) { -*r = (uint32_t)(ns_to_ticks(v->arch.phys_timer.cval - now) & 0xull); +*r = (uint32_t)((v->arch.phys_timer.cval - cntpct) & 0xull); } else { -v->arch.phys_timer.cval = now + ticks_to_ns(*r); +v->arch.phys_timer.cval = cntpct + *r; if ( v->arch.phys_timer.ctl & CNTx_CTL_ENABLE ) { v->arch.phys_timer.ctl &= ~CNTx_CTL_PENDING; -set_timer(>arch.phys_timer.timer, - v->arch.phys_timer.cval + - v->domain->arch.phys_timer_base.offset); +expires = v->arch.phys_timer.cval > boot_count + ? ticks_to_ns(v->arch.phys_timer.cval - boot_count) : 0; You probably want a comment to explain why you set to 0 here. +set_timer(>arch.phys_timer.timer, expires); } } return true; @@ -226,23 +227,24 @@ static bool vtimer_cntp_cval(struct cpu_user_regs *regs, uint64_t *r, bool read) { struct vcpu *v = current; +s_time_t expires; if ( !ACCESS_ALLOWED(regs, EL0PTEN) ) return false; if ( read ) { -*r = ns_to_ticks(v->arch.phys_timer.cval); +*r = v->arch.phys_timer.cval; } else { -v->arch.phys_timer.cval = ticks_to_ns(*r); +v->arch.phys_timer.cval = *r; if ( v->arch.phys_timer.ctl & CNTx_CTL_ENABLE ) { v->arch.phys_timer.ctl &= ~CNTx_CTL_PENDING; -set_timer(>arch.phys_timer.timer, - v->arch.phys_timer.cval + - v->domain->arch.phys_timer_base.offset); +expires = v->arch.phys_timer.cval > boot_count + ? ticks_to_ns(v->arch.phys_timer.cval - boot_count) : 0; Same here. But I am wondering whether we could factor this code in a function. This would avoid code duplication and make the code simpler. This can be done as a follow-up as we may want to backport the fix. +set_timer(>arch.phys_timer.timer, expires); } } return
[Xen-devel] [xen-unstable-smoke test] 144931: tolerable all pass - PUSHED
flight 144931 xen-unstable-smoke real [real] http://logs.test-lab.xenproject.org/osstest/logs/144931/ Failures :-/ but no regressions. Tests which did not succeed, but are not blocking: test-amd64-amd64-libvirt 13 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 13 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 14 saverestore-support-checkfail never pass test-armhf-armhf-xl 13 migrate-support-checkfail never pass test-armhf-armhf-xl 14 saverestore-support-checkfail never pass version targeted for testing: xen 0e7c69bd3c0b35a677d73843b39522787ccf5a3f baseline version: xen 704fa1532801bc02c4500462f0b913b3c137db4d Last test of basis 144912 2019-12-17 22:02:21 Z0 days Testing same since 144931 2019-12-18 12:00:25 Z0 days1 attempts People who touched revisions under test: Andrew Cooper Jan Beulich Steven Haigh Wei Liu jobs: build-arm64-xsm pass build-amd64 pass build-armhf pass build-amd64-libvirt pass test-armhf-armhf-xl pass test-arm64-arm64-xl-xsm pass test-amd64-amd64-xl-qemuu-debianhvm-amd64pass test-amd64-amd64-libvirt pass sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Pushing revision : To xenbits.xen.org:/home/xen/git/xen.git 704fa15328..0e7c69bd3c 0e7c69bd3c0b35a677d73843b39522787ccf5a3f -> smoke ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [XEN PATCH v3] x86/vm_event: add short-circuit for breakpoints (aka, , "fast single step")
> diff --git a/xen/include/public/vm_event.h b/xen/include/public/vm_event.h > index aa54c86325..cb577a7ba9 100644 > --- a/xen/include/public/vm_event.h > +++ b/xen/include/public/vm_event.h > @@ -110,6 +110,11 @@ > * interrupt pending after resuming the VCPU. > */ > #define VM_EVENT_FLAG_GET_NEXT_INTERRUPT (1 << 10) > +/* > + * Execute fast singlestepping on vm_event response. > + * Requires the vCPU to be paused already (synchronous events only). > + */ > +#define VM_EVENT_FLAG_FAST_SINGLESTEP (1 << 11) Just another minor style nitpick: alignment of (1 << 11) is off compared to all of the previous declaration above. Tamas ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 9/9] xen/sched: add const qualifier where appropriate
On Wed, 2019-12-18 at 08:48 +0100, Juergen Gross wrote: > Make use of the const qualifier more often in scheduling code. > > Signed-off-by: Juergen Gross > Cool! Reviewed-by: Dario Faggioli Another thing that it may be worth checking is whether all the places where 'int' is used for CPUs and vCPUs IDs (or alike) really need to be integer, or could be turned into unsigned. Of course, I'm not suggesting/asking to you to do that as well, I'm just mentioning in case anyone is interested/has time, or even just for the records. Regards -- Dario Faggioli, Ph.D http://about.me/dario.faggioli Virtualization Software Engineer SUSE Labs, SUSE https://www.suse.com/ --- <> (Raistlin Majere) signature.asc Description: This is a digitally signed message part ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [RFC] Integrate CoC, Governance, Security Policy and other key documents into sphinx docs
Hi all, now that 4.13 is out of the way I wanted to get the CoC discussion closed - see https://lists.xenproject.org/archives/html/xen-devel/2019-12/threads.html#00926, which means I need ACKs or final suggestions. The next step would be to publish it on the website. However, I have also been thinking about keeping some documents in multiple places and defining a *master* copy somewhere in a tree. Right now, these are a few personal repos that I own, which seems unnecessary, given that we have the sphinx docs. In the interest of improving the docs, we also need more useful content in the docs to guide people to them. My proposal would be to move the master sources for a number of key process docs to xen.git:/docs maybe under a "Working with the Xen Project community" in a process-guide directory. This would then include content from • http://xenbits.xen.org/gitweb/?p=people/larsk/governance.git;a=summary • http://xenbits.xen.org/gitweb/?p=people/larsk/security-process.git;a=summary • http://xenbits.xen.org/gitweb/?p=people/larsk/code-of-conduct.git;a=summary and we could also consider including some of the wiki pages related to contribution workflow and re-direct the pages. We would need to answer some questions, such as a) Are we OK with these staying in markdown - I don’t mind converting b) Are we OK with some of the documents needing project wide agreement before they can be changed, specifically this would cover - governance.git - code-of-conduct.git:code-of-conduct.md - code-of-conduct.git:communication-guide.md Best Regards Lars ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2] x86: irq: Do not BUG_ON multiple unbind calls for shared pirqs
Hi Varad, Please send new version of a patch in a new thread rather than in-reply to the first version. On 18/12/2019 10:53, Varad Gautam wrote: XEN_DOMCTL_destroydomain creates a continuation if domain_kill -ERESTARTS. In that scenario, it is possible to receive multiple _pirq_guest_unbind calls for the same pirq from domain_kill, if the pirq has not yet been removed from the domain's pirq_tree, as: domain_kill() -> domain_relinquish_resources() -> pci_release_devices() -> pci_clean_dpci_irq() -> pirq_guest_unbind() -> __pirq_guest_unbind() For a shared pirq (nr_guests > 1), the first call would zap the current domain from the pirq's guests[] list, but the action handler is never freed as there are other guests using this pirq. As a result, on the second call, __pirq_guest_unbind searches for the current domain which has been removed from the guests[] list, and hits a BUG_ON. Make __pirq_guest_unbind safe to be called multiple times by letting xen continue if a shared pirq has already been unbound from this guest. The PIRQ will be cleaned up from the domain's pirq_tree during the destruction in complete_domain_destroy anyways. Signed-off-by: Varad Gautam CC: Jan Beulich CC: Roger Pau Monné CC: Andrew Cooper v2: Split the check on action->nr_guests > 0 and make it an ASSERT, reword. --- xen/arch/x86/irq.c | 11 ++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c index 5d0d94c..3eb7b22 100644 --- a/xen/arch/x86/irq.c +++ b/xen/arch/x86/irq.c @@ -1863,7 +1863,16 @@ static irq_guest_action_t *__pirq_guest_unbind( for ( i = 0; (i < action->nr_guests) && (action->guest[i] != d); i++ ) continue; -BUG_ON(i == action->nr_guests); +if ( i == action->nr_guests ) { The { should be a new line. +ASSERT(action->nr_guests > 0) ; The space before ; is not necessary. +/* In case the pirq was shared, unbound for this domain in an earlier call, but still + * existed on the domain's pirq_tree, we still reach here if there are any later + * unbind calls on the same pirq. Return if such an unbind happens. */ The coding style for comment is: /* * Foo * Bar */ +if ( action->shareable ) +return NULL; +BUG(); Given that the previous BUG_ON() was hit, would it make sense to try to avoid a new BUG(). So why not just returning NULL as you do for action->shareable? +} + memmove(>guest[i], >guest[i+1], (action->nr_guests-i-1) * sizeof(action->guest[0])); action->nr_guests--; Cheers, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH] tools/python: Drop test.py
This file hasn't been touched since it was introduced in 2005 (c/s 0c6f36628) and has a wildly obsolete shebang for Python 2.3. Most importantly for us is that it isn't Python 3 compatible. Drop the file entirely. Since the 2.3 days, automatic discovery of tests has been included in standard functionality. Rewrite the test rule to use "$(PYTHON) -m unittest discover" which is equivelent. Dropping test.py drops the only piece of ZPL-2.0 code in the tree. Drop the ancillary files, and adjust COPYING to match. Signed-off-by: Andrew Cooper --- CC: Ian Jackson CC: Wei Liu CC: Lars Kurth This wants backporting to 4.13 as soon as practical. --- COPYING |1 - tools/python/Makefile |2 +- tools/python/README |3 - tools/python/ZPL-2.0 | 59 --- tools/python/test.py | 1094 - 5 files changed, 1 insertion(+), 1158 deletions(-) delete mode 100644 tools/python/README delete mode 100644 tools/python/ZPL-2.0 delete mode 100644 tools/python/test.py diff --git a/COPYING b/COPYING index 80fac091d3..a4bc2b2dd4 100644 --- a/COPYING +++ b/COPYING @@ -57,7 +57,6 @@ Xen tree, retaining the original license, such as - Laurikari License - Public Domain - ZLIB License - - ZPL 2.0 Significant code imports are highlighted in a README.source file in the directory into which the file or code snippet was imported. diff --git a/tools/python/Makefile b/tools/python/Makefile index 541858e2f8..e99f78a537 100644 --- a/tools/python/Makefile +++ b/tools/python/Makefile @@ -33,7 +33,7 @@ uninstall: .PHONY: test test: - export LD_LIBRARY_PATH=$$(readlink -f ../libxc):$$(readlink -f ../xenstore); $(PYTHON) test.py -b -u + LD_LIBRARY_PATH=$$(readlink -f ../libxc):$$(readlink -f ../xenstore) $(PYTHON) -m unittest discover .PHONY: clean clean: diff --git a/tools/python/README b/tools/python/README deleted file mode 100644 index 8fffef3a00..00 --- a/tools/python/README +++ /dev/null @@ -1,3 +0,0 @@ -The file test.py here is from the Zope project, and is Copyright (c) 2001, -2002 Zope Corporation and Contributors. This file is released under the Zope -Public License, version 2.0, a copy of which is in the file ZPL-2.0. diff --git a/tools/python/ZPL-2.0 b/tools/python/ZPL-2.0 deleted file mode 100644 index 5582f08b89..00 --- a/tools/python/ZPL-2.0 +++ /dev/null @@ -1,59 +0,0 @@ -Zope Public License (ZPL) Version 2.0 - -This software is Copyright (c) Zope Corporation (tm) and -Contributors. All rights reserved. - -This license has been certified as open source. It has also -been designated as GPL compatible by the Free Software -Foundation (FSF). - -Redistribution and use in source and binary forms, with or -without modification, are permitted provided that the -following conditions are met: - -1. Redistributions in source code must retain the above - copyright notice, this list of conditions, and the following - disclaimer. - -2. Redistributions in binary form must reproduce the above - copyright notice, this list of conditions, and the following - disclaimer in the documentation and/or other materials - provided with the distribution. - -3. The name Zope Corporation (tm) must not be used to - endorse or promote products derived from this software - without prior written permission from Zope Corporation. - -4. The right to distribute this software or to use it for - any purpose does not give you the right to use Servicemarks - (sm) or Trademarks (tm) of Zope Corporation. Use of them is - covered in a separate agreement (see - http://www.zope.com/Marks). - -5. If any files are modified, you must cause the modified - files to carry prominent notices stating that you changed - the files and the date of any change. - -Disclaimer - - THIS SOFTWARE IS PROVIDED BY ZOPE CORPORATION ``AS IS'' - AND ANY EXPRESSED OR IMPLIED WARRANTIES, INCLUDING, BUT - NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY - AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN - NO EVENT SHALL ZOPE CORPORATION OR ITS CONTRIBUTORS BE - LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, - EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; - LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) - HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN - CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE - OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS - SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH - DAMAGE. - - -This software consists of contributions made by Zope -Corporation and many individuals on behalf of Zope -Corporation. Specific attributions are listed in the -accompanying credits file. \ No newline at end of file diff --git a/tools/python/test.py b/tools/python/test.py deleted file mode 100644 index 13912f61a6..00 ---
Re: [Xen-devel] [PATCH for-next 7/7] x86: implement Hyper-V clock source
On Wed, Dec 18, 2019 at 02:24:33PM +0100, Jan Beulich wrote: > On 18.12.2019 14:18, Wei Liu wrote: > > On Wed, Dec 18, 2019 at 01:51:54PM +0100, Jan Beulich wrote: > >> On 18.12.2019 13:38, Wei Liu wrote: > >>> On Tue, Dec 10, 2019 at 05:59:04PM +0100, Jan Beulich wrote: > On 25.10.2019 11:16, Wei Liu wrote: > > +static inline uint64_t read_hyperv_timer(void) > > +{ > > +uint64_t scale, offset, ret, tsc; > > +uint32_t seq; > > +struct ms_hyperv_tsc_page *tsc_page = _tsc_page; > > + > > +do { > > +seq = tsc_page->tsc_sequence; > > + > > +/* Seq 0 is special. It means the TSC enlightenment is not > > + * available at the moment. The reference time can only be > > + * obtained from the Reference Counter MSR. > > + */ > > +if ( seq == 0 ) > > +{ > > +rdmsrl(HV_X64_MSR_TIME_REF_COUNT, ret); > > +return ret; > > +} > > + > > +smp_rmb(); > > + > > +tsc = rdtsc_ordered(); > > This already includes at least a read fence. > >>> > >>> OK. rdtsc() should be enough here. > >> > >> Are you sure? My comment was rather towards the dropping of smp_rmb() > >> (maybe replacing by a comment). > > > > I do mean to keep smp_rmb() before it. Is that not enough? > > With > > #define smp_rmb() barrier() > > it isn't - it's merely a compiler barrier, but for the ordering > you want you need a fence. Ah, I see. Thank you. Wei. > > Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel