[Xen-devel] [linux-4.9 test] 112086: regressions - FAIL
flight 112086 linux-4.9 real [real] http://logs.test-lab.xenproject.org/osstest/logs/112086/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-amd64-xl-qemut-win7-amd64 16 guest-localmigrate/x10 fail REGR. vs. 111883 Tests which did not succeed, but are not blocking: test-amd64-i386-xl-qemut-win7-amd64 17 guest-stop fail like 111843 test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop fail like 111843 test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stopfail like 111883 test-amd64-amd64-xl-rtds 10 debian-install fail like 111883 test-amd64-amd64-libvirt 13 migrate-support-checkfail never pass test-amd64-i386-libvirt 13 migrate-support-checkfail never pass test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail never pass test-amd64-amd64-xl-qemuu-ws16-amd64 10 windows-installfail never pass test-amd64-i386-libvirt-xsm 13 migrate-support-checkfail never pass test-arm64-arm64-xl 13 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 13 migrate-support-checkfail never pass test-arm64-arm64-xl 14 saverestore-support-checkfail never pass test-arm64-arm64-xl-xsm 14 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit2 13 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 14 saverestore-support-checkfail never pass test-arm64-arm64-libvirt-xsm 13 migrate-support-checkfail never pass test-amd64-amd64-xl-qemut-ws16-amd64 10 windows-installfail never pass test-arm64-arm64-libvirt-xsm 14 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2 fail never pass test-amd64-amd64-libvirt-vhd 12 migrate-support-checkfail never pass test-armhf-armhf-xl 13 migrate-support-checkfail never pass test-armhf-armhf-xl 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit2 13 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 14 saverestore-support-checkfail never pass test-armhf-armhf-libvirt 13 migrate-support-checkfail never pass test-armhf-armhf-libvirt 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-multivcpu 13 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-xsm 13 migrate-support-checkfail never pass test-armhf-armhf-xl-xsm 14 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-xsm 13 migrate-support-checkfail never pass test-armhf-armhf-libvirt-xsm 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-cubietruck 13 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 14 saverestore-support-checkfail never pass test-amd64-i386-xl-qemut-ws16-amd64 13 guest-saverestore fail never pass test-armhf-armhf-xl-rtds 13 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 14 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-raw 12 migrate-support-checkfail never pass test-armhf-armhf-libvirt-raw 13 saverestore-support-checkfail never pass test-armhf-armhf-xl-arndale 13 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-vhd 12 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 13 saverestore-support-checkfail never pass test-amd64-i386-xl-qemuu-ws16-amd64 13 guest-saverestore fail never pass test-amd64-i386-xl-qemut-win10-i386 10 windows-install fail never pass test-amd64-amd64-xl-qemut-win10-i386 10 windows-installfail never pass test-amd64-i386-xl-qemuu-win10-i386 10 windows-install fail never pass test-amd64-amd64-xl-qemuu-win10-i386 10 windows-installfail never pass version targeted for testing: linuxc03917de04aa68017a737e90ea01338d991eaff5 baseline version: linuxf0cd77ded5127168b1b83ca2f366ee17e9c0586f Last test of basis 111883 2017-07-16 11:10:00 Z5 days Testing same since 112086 2017-07-21 06:22:54 Z0 days1 attempts People who touched revisions under test: "Eric W. Biederman"Adam Borowski Alban Browaeys Alexei Starovoitov Amit Pundir Andrei Vagin
[Xen-devel] [linux-linus test] 112083: regressions - FAIL
flight 112083 linux-linus real [real] http://logs.test-lab.xenproject.org/osstest/logs/112083/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsm 7 xen-boot fail REGR. vs. 110515 test-amd64-amd64-i386-pvgrub 7 xen-boot fail REGR. vs. 110515 test-amd64-amd64-xl-pvh-intel 7 xen-bootfail REGR. vs. 110515 test-amd64-amd64-qemuu-nested-intel 7 xen-boot fail REGR. vs. 110515 test-amd64-amd64-xl-qcow2 7 xen-boot fail REGR. vs. 110515 test-amd64-amd64-amd64-pvgrub 7 xen-bootfail REGR. vs. 110515 test-amd64-amd64-xl 16 guest-localmigrate fail REGR. vs. 110515 test-amd64-amd64-libvirt-pair 21 guest-start/debian fail REGR. vs. 110515 test-amd64-i386-libvirt-xsm 16 guest-saverestore.2 fail REGR. vs. 110515 test-amd64-amd64-xl-credit2 15 guest-saverestorefail REGR. vs. 110515 test-amd64-amd64-xl-qemut-debianhvm-amd64 7 xen-bootfail REGR. vs. 110515 test-amd64-amd64-pygrub 7 xen-boot fail REGR. vs. 110515 test-amd64-amd64-xl-xsm 16 guest-localmigrate fail REGR. vs. 110515 test-amd64-amd64-libvirt 16 guest-saverestore.2 fail REGR. vs. 110515 test-amd64-i386-xl 16 guest-localmigrate fail REGR. vs. 110515 test-amd64-amd64-xl-multivcpu 15 guest-saverestore fail REGR. vs. 110515 test-amd64-amd64-pair21 guest-start/debian fail REGR. vs. 110515 test-amd64-amd64-libvirt-xsm 16 guest-saverestore.2 fail REGR. vs. 110515 test-amd64-i386-libvirt 16 guest-saverestore.2 fail REGR. vs. 110515 test-amd64-i386-xl-xsm 16 guest-localmigrate fail REGR. vs. 110515 test-amd64-amd64-xl-pvh-amd 16 guest-localmigrate fail REGR. vs. 110515 test-amd64-i386-libvirt-pair 21 guest-start/debian fail REGR. vs. 110515 test-amd64-i386-pair 21 guest-start/debian fail REGR. vs. 110515 test-amd64-amd64-xl-qemut-win7-amd64 16 guest-localmigrate/x10 fail REGR. vs. 110515 test-amd64-amd64-xl-qemuu-win7-amd64 16 guest-localmigrate/x10 fail REGR. vs. 110515 test-amd64-i386-xl-qemut-win7-amd64 16 guest-localmigrate/x10 fail REGR. vs. 110515 Tests which did not succeed, but are not blocking: test-armhf-armhf-libvirt 14 saverestore-support-checkfail like 110515 test-armhf-armhf-libvirt-xsm 14 saverestore-support-checkfail like 110515 test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop fail like 110515 test-armhf-armhf-libvirt-raw 13 saverestore-support-checkfail like 110515 test-amd64-amd64-xl-rtds 10 debian-install fail like 110515 test-armhf-armhf-xl-rtds 16 guest-start/debian.repeatfail like 110515 test-amd64-i386-libvirt-xsm 13 migrate-support-checkfail never pass test-amd64-amd64-libvirt 13 migrate-support-checkfail never pass test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail never pass test-amd64-i386-libvirt 13 migrate-support-checkfail never pass test-amd64-amd64-xl-qemuu-ws16-amd64 10 windows-installfail never pass test-arm64-arm64-xl 13 migrate-support-checkfail never pass test-arm64-arm64-xl 14 saverestore-support-checkfail never pass test-arm64-arm64-xl-xsm 13 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 14 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-armhf-armhf-xl-arndale 13 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 14 saverestore-support-checkfail never pass test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2 fail never pass test-armhf-armhf-xl-multivcpu 13 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 14 saverestore-support-checkfail never pass test-armhf-armhf-libvirt 13 migrate-support-checkfail never pass test-armhf-armhf-xl-xsm 13 migrate-support-checkfail never pass test-armhf-armhf-xl-xsm 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-cubietruck 13 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 14 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-xsm 13 migrate-support-checkfail never pass test-amd64-amd64-xl-qemut-ws16-amd64 10 windows-installfail never pass test-amd64-i386-xl-qemuu-ws16-amd64 13 guest-saverestore fail never pass test-amd64-i386-xl-qemut-ws16-amd64 13 guest-saverestore fail never pass test-armhf-armhf-xl-rtds 13 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 14
Re: [Xen-devel] [PATCH 09/15] xen: vmx: handle SGX related MSRs
On 7/21/2017 9:42 PM, Huang, Kai wrote: On 7/20/2017 5:27 AM, Andrew Cooper wrote: On 09/07/17 09:09, Kai Huang wrote: This patch handles IA32_FEATURE_CONTROL and IA32_SGXLEPUBKEYHASHn MSRs. For IA32_FEATURE_CONTROL, if SGX is exposed to domain, then SGX_ENABLE bit is always set. If SGX launch control is also exposed to domain, and physical IA32_SGXLEPUBKEYHASHn are writable, then SGX_LAUNCH_CONTROL_ENABLE bit is also always set. Write to IA32_FEATURE_CONTROL is ignored. For IA32_SGXLEPUBKEYHASHn, a new 'struct sgx_vcpu' is added for per-vcpu SGX staff, and currently it has vcpu's virtual ia32_sgxlepubkeyhash[0-3]. Two boolean 'readable' and 'writable' are also added to indicate whether virtual IA32_SGXLEPUBKEYHASHn are readable and writable. During vcpu is initialized, virtual ia32_sgxlepubkeyhash are also initialized. If physical IA32_SGXLEPUBKEYHASHn are writable, then ia32_sgxlepubkeyhash are set to Intel's default value, as for physical machine, those MSRs will have Intel's default value. If physical MSRs are not writable (it is *locked* by BIOS before handling to Xen), then we try to read those MSRs and use physical values as defult value for virtual MSRs. One thing is rdmsr_safe is used, as although SDM says if SGX is present, IA32_SGXLEPUBKEYHASHn are available for read, but in reality, skylake client (at least some, depending on BIOS) doesn't have those MSRs available, so we use rdmsr_safe and set readable to false if it returns error code. For IA32_SGXLEPUBKEYHASHn MSR read from guest, if physical MSRs are not readable, guest is not allowed to read either, otherwise vcpu's virtual MSR value is returned. For IA32_SGXLEPUBKEYHASHn MSR write from guest, we allow guest to write if both physical MSRs are writable and SGX launch control is exposed to domain, otherwise error is injected. To make EINIT run successfully in guest, vcpu's virtual IA32_SGXLEPUBKEYHASHn will be update to physical MSRs when vcpu is scheduled in. Signed-off-by: Kai Huang--- xen/arch/x86/hvm/vmx/sgx.c | 194 + xen/arch/x86/hvm/vmx/vmx.c | 24 + xen/include/asm-x86/cpufeature.h | 3 + xen/include/asm-x86/hvm/vmx/sgx.h | 22 + xen/include/asm-x86/hvm/vmx/vmcs.h | 2 + xen/include/asm-x86/msr-index.h| 6 ++ 6 files changed, 251 insertions(+) diff --git a/xen/arch/x86/hvm/vmx/sgx.c b/xen/arch/x86/hvm/vmx/sgx.c index 14379151e8..4944e57aef 100644 --- a/xen/arch/x86/hvm/vmx/sgx.c +++ b/xen/arch/x86/hvm/vmx/sgx.c @@ -405,6 +405,200 @@ void hvm_destroy_epc(struct domain *d) hvm_reset_epc(d, true); } +/* Whether IA32_SGXLEPUBKEYHASHn are physically *unlocked* by BIOS */ +bool_t sgx_ia32_sgxlepubkeyhash_writable(void) +{ +uint64_t sgx_lc_enabled = IA32_FEATURE_CONTROL_SGX_ENABLE | + IA32_FEATURE_CONTROL_SGX_LAUNCH_CONTROL_ENABLE | + IA32_FEATURE_CONTROL_LOCK; +uint64_t val; + +rdmsrl(MSR_IA32_FEATURE_CONTROL, val); + +return (val & sgx_lc_enabled) == sgx_lc_enabled; +} + +bool_t domain_has_sgx(struct domain *d) +{ +/* hvm_epc_populated(d) implies CPUID has SGX */ +return hvm_epc_populated(d); +} + +bool_t domain_has_sgx_launch_control(struct domain *d) +{ +struct cpuid_policy *p = d->arch.cpuid; + +if ( !domain_has_sgx(d) ) +return false; + +/* Unnecessary but check anyway */ +if ( !cpu_has_sgx_launch_control ) +return false; + +return !!p->feat.sgx_launch_control; +} Both of these should be d->arch.cpuid->feat.{sgx,sgx_lc} only, and not from having individual helpers. The CPUID setup during host boot and domain construction should take care of setting everything up properly, or hiding the features from the guest. The point of the work I've been doing is to prevent situations where the guest can see SGX but something doesn't work because of Xen using nested checks like this. Thanks for comments. Will change to simple check against d->arch.cpuid->feat.{sgx,sgx_lc}. + +/* Digest of Intel signing key. MSR's default value after reset. */ +#define SGX_INTEL_DEFAULT_LEPUBKEYHASH0 0xa6053e051270b7ac +#define SGX_INTEL_DEFAULT_LEPUBKEYHASH1 0x6cfbe8ba8b3b413d +#define SGX_INTEL_DEFAULT_LEPUBKEYHASH2 0xc4916d99f2b3735d +#define SGX_INTEL_DEFAULT_LEPUBKEYHASH3 0xd4f8c05909f9bb3b + +void sgx_vcpu_init(struct vcpu *v) +{ +struct sgx_vcpu *sgxv = to_sgx_vcpu(v); + +memset(sgxv, 0, sizeof (*sgxv)); + +if ( sgx_ia32_sgxlepubkeyhash_writable() ) +{ +/* + * If physical MSRs are writable, set vcpu's default value to Intel's + * default value. For real machine, after reset, MSRs contain Intel's + * default value. + */ +sgxv->ia32_sgxlepubkeyhash[0] = SGX_INTEL_DEFAULT_LEPUBKEYHASH0; +sgxv->ia32_sgxlepubkeyhash[1] = SGX_INTEL_DEFAULT_LEPUBKEYHASH1; +sgxv->ia32_sgxlepubkeyhash[2] =
Re: [Xen-devel] [PATCH 03/15] xen: x86: add early stage SGX feature detection
On 7/21/2017 9:17 PM, Huang, Kai wrote: On 7/20/2017 2:23 AM, Andrew Cooper wrote: On 09/07/17 09:09, Kai Huang wrote: This patch adds early stage SGX feature detection via SGX CPUID 0x12. Function detect_sgx is added to detect SGX info on each CPU (called from vmx_cpu_up). SDM says SGX info returned by CPUID is per-thread, and we cannot assume all threads will return the same SGX info, so we have to detect SGX for each CPU. For simplicity, currently SGX is only supported when all CPUs reports the same SGX info. SDM also says it's possible to have multiple EPC sections but this is only for multiple-socket server, which we don't support now (there are other things need to be done, ex, NUMA EPC, scheduling, etc, as well), so currently only one EPC is supported. Dedicated files sgx.c and sgx.h are added (under vmx directory as SGX is Intel specific) for bulk of above SGX detection code detection code, and for further SGX code as well. Signed-off-by: Kai HuangI am not sure putting this under hvm/ is a sensible move. Almost everything in this patch is currently common, and I can forsee us wanting to introduce PV support, so it would be good to introduce this in a guest-neutral location to begin with. Sorry I forgot to response to this in my last reply. I looked at code again and yes I think we can make the code to common place. I will move current sgx.c to arch/x86/sgx.c. Thanks for comments. --- xen/arch/x86/hvm/vmx/Makefile | 1 + xen/arch/x86/hvm/vmx/sgx.c| 208 ++ xen/arch/x86/hvm/vmx/vmcs.c | 4 + xen/include/asm-x86/cpufeature.h | 1 + xen/include/asm-x86/hvm/vmx/sgx.h | 45 + 5 files changed, 259 insertions(+) create mode 100644 xen/arch/x86/hvm/vmx/sgx.c create mode 100644 xen/include/asm-x86/hvm/vmx/sgx.h diff --git a/xen/arch/x86/hvm/vmx/Makefile b/xen/arch/x86/hvm/vmx/Makefile index 04a29ce59d..f6bcf0d143 100644 --- a/xen/arch/x86/hvm/vmx/Makefile +++ b/xen/arch/x86/hvm/vmx/Makefile @@ -4,3 +4,4 @@ obj-y += realmode.o obj-y += vmcs.o obj-y += vmx.o obj-y += vvmx.o +obj-y += sgx.o diff --git a/xen/arch/x86/hvm/vmx/sgx.c b/xen/arch/x86/hvm/vmx/sgx.c new file mode 100644 index 00..6b41469371 --- /dev/null +++ b/xen/arch/x86/hvm/vmx/sgx.c This file looks like it should be arch/x86/sgx.c, given its current content. Will do. @@ -0,0 +1,208 @@ +/* + * Intel Software Guard Extensions support Please include a GPLv2 header. Yes will do. Thanks, -Kai + * + * Author: Kai Huang + */ + +#include +#include +#include +#include +#include + +static struct sgx_cpuinfo __read_mostly sgx_cpudata[NR_CPUS]; +static struct sgx_cpuinfo __read_mostly boot_sgx_cpudata; I don't think any of this is necessary. The description says that all EPCs across the server will be reported in CPUID subleaves, and our implementation gives up if the data are non-identical across CPUs. Therefore, we only need to keep one copy of the data, and check check APs against the master copy. Right. boot_sgx_cpudata is what we need. Currently detect_sgx is called from vmx_cpu_up. How about changing to calling it from identify_cpu, and something like below ? if ( c == _cpu_data ) detect_sgx(_sgx_cpudata); else { struct sgx_cpuinfo tmp; detect_sgx(); if ( memcmp(_sgx_cpudata, , sizeof (tmp)) ) //disable SGX } Thanks, -Kai Let me see about splitting up a few bits of the existing CPUID infrastructure, so we can use the host cpuid policy more effectively for Xen related things. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PULL for-2.10 1/2] xen: fix compilation on 32-bit hosts
From: Igor DruzhininSigned-off-by: Igor Druzhinin Reviewed-by: Stefano Stabellini --- hw/i386/xen/xen-mapcache.c | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/hw/i386/xen/xen-mapcache.c b/hw/i386/xen/xen-mapcache.c index 2a1fbd1..bb1078c 100644 --- a/hw/i386/xen/xen-mapcache.c +++ b/hw/i386/xen/xen-mapcache.c @@ -527,7 +527,7 @@ static uint8_t *xen_replace_cache_entry_unlocked(hwaddr old_phys_addr, entry = entry->next; } if (!entry) { -DPRINTF("Trying to update an entry for %lx " \ +DPRINTF("Trying to update an entry for "TARGET_FMT_plx \ "that is not in the mapcache!\n", old_phys_addr); return NULL; } @@ -535,15 +535,16 @@ static uint8_t *xen_replace_cache_entry_unlocked(hwaddr old_phys_addr, address_index = new_phys_addr >> MCACHE_BUCKET_SHIFT; address_offset = new_phys_addr & (MCACHE_BUCKET_SIZE - 1); -fprintf(stderr, "Replacing a dummy mapcache entry for %lx with %lx\n", -old_phys_addr, new_phys_addr); +fprintf(stderr, "Replacing a dummy mapcache entry for "TARGET_FMT_plx \ +" with "TARGET_FMT_plx"\n", old_phys_addr, new_phys_addr); xen_remap_bucket(entry, entry->vaddr_base, cache_size, address_index, false); if (!test_bits(address_offset >> XC_PAGE_SHIFT, test_bit_size >> XC_PAGE_SHIFT, entry->valid_mapping)) { -DPRINTF("Unable to update a mapcache entry for %lx!\n", old_phys_addr); +DPRINTF("Unable to update a mapcache entry for "TARGET_FMT_plx"!\n", +old_phys_addr); return NULL; } -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PULL for-2.10 0/2] please pull xen-20170721-tag
The following changes since commit 91939262ffcd3c85ea6a4793d3029326eea1d649: configure: Drop ancient Solaris 9 and earlier support (2017-07-21 15:04:05 +0100) are available in the git repository at: git://xenbits.xen.org/people/sstabellini/qemu-dm.git tags/xen-20170721-tag for you to fetch changes up to 7fb394ad8a7c4609cefa2136dec16cf65d028f40: xen-mapcache: Fix the bug when overlapping emulated DMA operations may cause inconsistency in guest memory mappings (2017-07-21 17:37:06 -0700) Xen 2017/07/21 Alexey G (1): xen-mapcache: Fix the bug when overlapping emulated DMA operations may cause inconsistency in guest memory mappings Igor Druzhinin (1): xen: fix compilation on 32-bit hosts hw/i386/xen/xen-mapcache.c | 22 -- 1 file changed, 16 insertions(+), 6 deletions(-) ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PULL for-2.10 2/2] xen-mapcache: Fix the bug when overlapping emulated DMA operations may cause inconsistency in guest memory mappings
From: Alexey GUnder certain circumstances normal xen-mapcache functioning may be broken by guest's actions. This may lead to either QEMU performing exit() due to a caught bad pointer (and with QEMU process gone the guest domain simply appears hung afterwards) or actual use of the incorrect pointer inside QEMU address space -- a write to unmapped memory is possible. The bug is hard to reproduce on a i440 machine as multiple DMA sources are required (though it's possible in theory, using multiple emulated devices), but can be reproduced somewhat easily on a Q35 machine using an emulated AHCI controller -- each NCQ queue command slot may be used as an independent DMA source ex. using READ FPDMA QUEUED command, so a single storage device on the AHCI controller port will be enough to produce multiple DMAs (up to 32). The detailed description of the issue follows. Xen-mapcache provides an ability to map parts of a guest memory into QEMU's own address space to work with. There are two types of cache lookups: - translating a guest physical address into a pointer in QEMU's address space, mapping a part of guest domain memory if necessary (while trying to reduce a number of such (re)mappings to a minimum) - translating a QEMU's pointer back to its physical address in guest RAM These lookups are managed via two linked-lists of structures. MapCacheEntry is used for forward cache lookups, while MapCacheRev -- for reverse lookups. Every guest physical address is broken down into 2 parts: address_index = phys_addr >> MCACHE_BUCKET_SHIFT; address_offset = phys_addr & (MCACHE_BUCKET_SIZE - 1); MCACHE_BUCKET_SHIFT depends on a system (32/64) and is equal to 20 for a 64-bit system (which assumed for the further description). Basically, this means that we deal with 1 MB chunks and offsets within those 1 MB chunks. All mappings are created with 1MB-granularity, i.e. 1MB/2MB/3MB etc. Most DMA transfers typically are less than 1MB, however, if the transfer crosses any 1MB border(s) - than a nearest larger mapping size will be used, so ex. a 512-byte DMA transfer with the start address 700FFF80h will actually require a 2MB range. Current implementation assumes that MapCacheEntries are unique for a given address_index and size pair and that a single MapCacheEntry may be reused by multiple requests -- in this case the 'lock' field will be larger than 1. On other hand, each requested guest physical address (with 'lock' flag) is described by each own MapCacheRev. So there may be multiple MapCacheRev entries corresponding to a single MapCacheEntry. The xen-mapcache code uses MapCacheRev entries to retrieve the address_index & size pair which in turn used to find a related MapCacheEntry. The 'lock' field within a MapCacheEntry structure is actually a reference counter which shows a number of corresponding MapCacheRev entries. The bug lies in ability for the guest to indirectly manipulate with the xen-mapcache MapCacheEntries list via a special sequence of DMA operations, typically for storage devices. In order to trigger the bug, guest needs to issue DMA operations in specific order and timing. Although xen-mapcache is protected by the mutex lock -- this doesn't help in this case, as the bug is not due to a race condition. Suppose we have 3 DMA transfers, namely A, B and C, where - transfer A crosses 1MB border and thus uses a 2MB mapping - transfers B and C are normal transfers within 1MB range - and all 3 transfers belong to the same address_index In this case, if all these transfers are to be executed one-by-one (without overlaps), no special treatment necessary -- each transfer's mapping lock will be set and then cleared on unmap before starting the next transfer. The situation changes when DMA transfers overlap in time, ex. like this: |= transfer A (2MB) =| |= transfer B (1MB) =| |= transfer C (1MB) =| time ---> In this situation the following sequence of actions happens: 1. transfer A creates a mapping to 2MB area (lock=1) 2. transfer B (1MB) tries to find available mapping but cannot find one because transfer A is still in progress, and it has 2MB size + non-zero lock. So transfer B creates another mapping -- same address_index, but 1MB size. 3. transfer A completes, making 1st mapping entry available by setting its lock to 0 4. transfer C starts and tries to find available mapping entry and sees that 1st entry has lock=0, so it uses this entry but remaps the mapping to a 1MB size 5. transfer B completes and by this time - there are two locked entries in the MapCacheEntry list with the SAME values for both address_index and size - the entry for transfer B actually resides farther in list while transfer C's entry is first 6. xen_ram_addr_from_mapcache() for transfer B gets correct address_index and size pair from corresponding MapCacheRev entry, but then it starts looking for
Re: [Xen-devel] [PATCH] xen-mapcache: Fix the bug when overlapping emulated DMA operations may cause inconsistency in guest memory mappings
On Thu, 20 Jul 2017, Alexey G wrote: > On Wed, 19 Jul 2017 11:00:26 -0700 (PDT) > Stefano Stabelliniwrote: > > > My expectation is that unlocked mappings are much more frequent than > > locked mappings. Also, I expect that only very rarely we'll be able to > > reuse locked mappings. Over the course of a VM lifetime, it seems to me > > that walking the list every time would cost more than it would benefit. > > > > These are only "expectations", I would love to see numbers. Numbers make > > for better decisions :-) Would you be up for gathering some of these > > numbers? Such as how many times you get to reuse locked mappings and how > > many times we walk items on the list fruitlessly? > > > > Otherwise, would you be up for just testing the modified version of the > > patch I sent to verify that solves the bug? > > Numbers will show that there is a one single entry in the bucket's list > most of the time. :) Even two entries are rare encounters, typically to be > seen only when guest performs some intensive I/O. OK, I'll collect some real > stats for different scenarios, these are interesting numbers, might come > useful for later optimizations. > > The approach your proposed is good, but it allows reusing of suitable > locked entries only when they come first in list (an existing behavior). > But we can actually reuse a locked entry which may come next (if any) in > the list as well. When we have the situation when lock=0 entry comes first > in the list and lock=1 entry is the second -- there is a chance the first > entry was a 2MB-type (must be some reason why 2nd entry was added to the > list), so picking it for a lock0-request might result in > xen_remap_bucket... which should be avoided. Anyway, there is no big deal > which approach is better as these situations are uncommon. After all, > mostly it's just a single entry in the bucket's list. Given that QEMU is about to release and I have to send a pull request with another fix now, I am going to also send my version of the fix right away (keeping you as main author of course). However, I am more than happy to change the behavior of the algorithm in the future if the numbers show that your version is better. ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PULL for-2.10 6/7] xen/mapcache: introduce xen_replace_cache_entry()
On Fri, 21 Jul 2017, Igor Druzhinin wrote: > On 21/07/17 14:50, Anthony PERARD wrote: > > On Tue, Jul 18, 2017 at 03:22:41PM -0700, Stefano Stabellini wrote: > > > From: Igor Druzhinin> > > > ... > > > > > +static uint8_t *xen_replace_cache_entry_unlocked(hwaddr old_phys_addr, > > > + hwaddr new_phys_addr, > > > + hwaddr size) > > > +{ > > > +MapCacheEntry *entry; > > > +hwaddr address_index, address_offset; > > > +hwaddr test_bit_size, cache_size = size; > > > + > > > +address_index = old_phys_addr >> MCACHE_BUCKET_SHIFT; > > > +address_offset = old_phys_addr & (MCACHE_BUCKET_SIZE - 1); > > > + > > > +assert(size); > > > +/* test_bit_size is always a multiple of XC_PAGE_SIZE */ > > > +test_bit_size = size + (old_phys_addr & (XC_PAGE_SIZE - 1)); > > > +if (test_bit_size % XC_PAGE_SIZE) { > > > +test_bit_size += XC_PAGE_SIZE - (test_bit_size % XC_PAGE_SIZE); > > > +} > > > +cache_size = size + address_offset; > > > +if (cache_size % MCACHE_BUCKET_SIZE) { > > > +cache_size += MCACHE_BUCKET_SIZE - (cache_size % > > > MCACHE_BUCKET_SIZE); > > > +} > > > + > > > +entry = >entry[address_index % mapcache->nr_buckets]; > > > +while (entry && !(entry->paddr_index == address_index && > > > + entry->size == cache_size)) { > > > +entry = entry->next; > > > +} > > > +if (!entry) { > > > +DPRINTF("Trying to update an entry for %lx " \ > > > +"that is not in the mapcache!\n", old_phys_addr); > > > +return NULL; > > > +} > > > + > > > +address_index = new_phys_addr >> MCACHE_BUCKET_SHIFT; > > > +address_offset = new_phys_addr & (MCACHE_BUCKET_SIZE - 1); > > > + > > > +fprintf(stderr, "Replacing a dummy mapcache entry for %lx with > > > %lx\n", > > > +old_phys_addr, new_phys_addr); > > > > Looks likes this does not build on 32bits. > > in: > > http://logs.test-lab.xenproject.org/osstest/logs/112041/build-i386/6.ts-xen-build.log > > > > /home/osstest/build.112041.build-i386/xen/tools/qemu-xen-dir/hw/i386/xen/xen-mapcache.c: > > In function 'xen_replace_cache_entry_unlocked': > > /home/osstest/build.112041.build-i386/xen/tools/qemu-xen-dir/hw/i386/xen/xen-mapcache.c:539:13: > > error: format '%lx' expects argument of type 'long unsigned int', but > > argument 3 has type 'hwaddr' [-Werror=format=] > > old_phys_addr, new_phys_addr); > > ^ > > /home/osstest/build.112041.build-i386/xen/tools/qemu-xen-dir/hw/i386/xen/xen-mapcache.c:539:13: > > error: format '%lx' expects argument of type 'long unsigned int', but > > argument 4 has type 'hwaddr' [-Werror=format=] > > cc1: all warnings being treated as errors > >CC i386-softmmu/target/i386/gdbstub.o > > /home/osstest/build.112041.build-i386/xen/tools/qemu-xen-dir/rules.mak:66: > > recipe for target 'hw/i386/xen/xen-mapcache.o' failed > > > > > + > > > +xen_remap_bucket(entry, entry->vaddr_base, > > > + cache_size, address_index, false); > > > +if (!test_bits(address_offset >> XC_PAGE_SHIFT, > > > +test_bit_size >> XC_PAGE_SHIFT, > > > +entry->valid_mapping)) { > > > +DPRINTF("Unable to update a mapcache entry for %lx!\n", > > > old_phys_addr); > > > +return NULL; > > > +} > > > + > > > +return entry->vaddr_base + address_offset; > > > +} > > > + > > > > Please, accept the attached patch to fix the issue. The patch looks good to me. I'll send it upstream. ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH v1 04/13] xen/pvcalls: implement connect command
Send PVCALLS_CONNECT to the backend. Allocate a new ring and evtchn for the active socket. Introduce a data structure to keep track of sockets. Introduce a waitqueue to allow the frontend to wait on data coming from the backend on the active socket (recvmsg command). Two mutexes (one of reads and one for writes) will be used to protect the active socket in and out rings from concurrent accesses. sock->sk->sk_send_head is not used for ip sockets: reuse the field to store a pointer to the struct sock_mapping corresponding to the socket. This way, we can easily get the struct sock_mapping from the struct socket. Convert the struct socket pointer into an uint64_t and use it as id for the new socket to pass to the backend. Signed-off-by: Stefano StabelliniCC: boris.ostrov...@oracle.com CC: jgr...@suse.com --- drivers/xen/pvcalls-front.c | 153 drivers/xen/pvcalls-front.h | 2 + 2 files changed, 155 insertions(+) diff --git a/drivers/xen/pvcalls-front.c b/drivers/xen/pvcalls-front.c index 7933c73..0d305e0 100644 --- a/drivers/xen/pvcalls-front.c +++ b/drivers/xen/pvcalls-front.c @@ -13,6 +13,8 @@ */ #include +#include +#include #include #include @@ -20,6 +22,8 @@ #include #include +#include + #define PVCALLS_INVALID_ID (UINT_MAX) #define RING_ORDER XENBUS_MAX_RING_GRANT_ORDER #define PVCALLS_NR_REQ_PER_RING __CONST_RING_SIZE(xen_pvcalls, XEN_PAGE_SIZE) @@ -38,6 +42,24 @@ struct pvcalls_bedata { }; struct xenbus_device *pvcalls_front_dev; +struct sock_mapping { + bool active_socket; + struct list_head list; + struct socket *sock; + union { + struct { + int irq; + grant_ref_t ref; + struct pvcalls_data_intf *ring; + struct pvcalls_data data; + struct mutex in_mutex; + struct mutex out_mutex; + + wait_queue_head_t inflight_conn_req; + } active; + }; +}; + static irqreturn_t pvcalls_front_event_handler(int irq, void *dev_id) { struct xenbus_device *dev = dev_id; @@ -80,6 +102,18 @@ static irqreturn_t pvcalls_front_event_handler(int irq, void *dev_id) return IRQ_HANDLED; } +static irqreturn_t pvcalls_front_conn_handler(int irq, void *sock_map) +{ + struct sock_mapping *map = sock_map; + + if (map == NULL) + return IRQ_HANDLED; + + wake_up_interruptible(>active.inflight_conn_req); + + return IRQ_HANDLED; +} + int pvcalls_front_socket(struct socket *sock) { struct pvcalls_bedata *bedata; @@ -134,6 +168,125 @@ int pvcalls_front_socket(struct socket *sock) return ret; } +static struct sock_mapping *create_active(int *evtchn) +{ + struct sock_mapping *map = NULL; + void *bytes; + int ret, irq = -1, i; + + map = kzalloc(sizeof(*map), GFP_KERNEL); + if (map == NULL) + return NULL; + + init_waitqueue_head(>active.inflight_conn_req); + + map->active.ring = (struct pvcalls_data_intf *) + __get_free_page(GFP_KERNEL | __GFP_ZERO); + if (map->active.ring == NULL) + goto out_error; + memset(map->active.ring, 0, XEN_PAGE_SIZE); + map->active.ring->ring_order = RING_ORDER; + bytes = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, + map->active.ring->ring_order); + if (bytes == NULL) + goto out_error; + for (i = 0; i < (1 << map->active.ring->ring_order); i++) + map->active.ring->ref[i] = gnttab_grant_foreign_access( + pvcalls_front_dev->otherend_id, + pfn_to_gfn(virt_to_pfn(bytes) + i), 0); + + map->active.ref = gnttab_grant_foreign_access( + pvcalls_front_dev->otherend_id, + pfn_to_gfn(virt_to_pfn((void *)map->active.ring)), 0); + + ret = xenbus_alloc_evtchn(pvcalls_front_dev, evtchn); + if (ret) + goto out_error; + map->active.data.in = bytes; + map->active.data.out = bytes + + XEN_FLEX_RING_SIZE(map->active.ring->ring_order); + irq = bind_evtchn_to_irqhandler(*evtchn, pvcalls_front_conn_handler, + 0, "pvcalls-frontend", map); + if (irq < 0) + goto out_error; + + map->active.irq = irq; + map->active_socket = true; + mutex_init(>active.in_mutex); + mutex_init(>active.out_mutex); + + return map; + +out_error: + if (irq >= 0) + unbind_from_irqhandler(irq, map); + else if (*evtchn >= 0) + xenbus_free_evtchn(pvcalls_front_dev, *evtchn); + kfree(map->active.data.in); + kfree(map->active.ring); + kfree(map); + return NULL; +} + +int pvcalls_front_connect(struct socket *sock,
[Xen-devel] [PATCH v1 09/13] xen/pvcalls: implement recvmsg
Implement recvmsg by copying data from the "in" ring. If not enough data is available and the recvmsg call is blocking, then wait on the inflight_conn_req waitqueue. Take the active socket in_mutex so that only one function can access the ring at any given time. If not enough data is available on the ring, rather than returning immediately or sleep-waiting, spin for up to 5000 cycles. This small optimization turns out to improve performance and latency significantly. Signed-off-by: Stefano StabelliniCC: boris.ostrov...@oracle.com CC: jgr...@suse.com --- drivers/xen/pvcalls-front.c | 106 drivers/xen/pvcalls-front.h | 4 ++ 2 files changed, 110 insertions(+) diff --git a/drivers/xen/pvcalls-front.c b/drivers/xen/pvcalls-front.c index bf29f40..3d1041a 100644 --- a/drivers/xen/pvcalls-front.c +++ b/drivers/xen/pvcalls-front.c @@ -94,6 +94,20 @@ static int pvcalls_front_write_todo(struct sock_mapping *map) return size - pvcalls_queued(prod, cons, size); } +static int pvcalls_front_read_todo(struct sock_mapping *map) +{ + struct pvcalls_data_intf *intf = map->active.ring; + RING_IDX cons, prod; + int32_t error; + + cons = intf->in_cons; + prod = intf->in_prod; + error = intf->in_error; + return (error != 0 || + pvcalls_queued(prod, cons, + XEN_FLEX_RING_SIZE(intf->ring_order))) != 0; +} + static irqreturn_t pvcalls_front_event_handler(int irq, void *dev_id) { struct xenbus_device *dev = dev_id; @@ -413,6 +427,98 @@ int pvcalls_front_sendmsg(struct socket *sock, struct msghdr *msg, return tot_sent; } +static int __read_ring(struct pvcalls_data_intf *intf, + struct pvcalls_data *data, + struct iov_iter *msg_iter, + size_t len, int flags) +{ + RING_IDX cons, prod, size, masked_prod, masked_cons; + RING_IDX array_size = XEN_FLEX_RING_SIZE(intf->ring_order); + int32_t error; + + cons = intf->in_cons; + prod = intf->in_prod; + error = intf->in_error; + /* get pointers before reading from the ring */ + virt_rmb(); + if (error < 0) + return error; + + size = pvcalls_queued(prod, cons, array_size); + masked_prod = pvcalls_mask(prod, array_size); + masked_cons = pvcalls_mask(cons, array_size); + + if (size == 0) + return 0; + + if (len > size) + len = size; + + if (masked_prod > masked_cons) { + copy_to_iter(data->in + masked_cons, len, msg_iter); + } else { + if (len > (array_size - masked_cons)) { + copy_to_iter(data->in + masked_cons, +array_size - masked_cons, msg_iter); + copy_to_iter(data->in, +len - (array_size - masked_cons), +msg_iter); + } else { + copy_to_iter(data->in + masked_cons, len, msg_iter); + } + } + /* read data from the ring before increasing the index */ + virt_mb(); + if (!(flags & MSG_PEEK)) + intf->in_cons += len; + + return len; +} + +int pvcalls_front_recvmsg(struct socket *sock, struct msghdr *msg, size_t len, +int flags) +{ + struct pvcalls_bedata *bedata; + int ret = -EAGAIN; + struct sock_mapping *map; + int count = 0; + + if (!pvcalls_front_dev) + return -ENOTCONN; + bedata = dev_get_drvdata(_front_dev->dev); + + map = (struct sock_mapping *) READ_ONCE(sock->sk->sk_send_head); + if (!map) + return -ENOTSOCK; + + if (flags & (MSG_CMSG_CLOEXEC|MSG_ERRQUEUE|MSG_OOB|MSG_TRUNC)) + return -EOPNOTSUPP; + + mutex_lock(>active.in_mutex); + if (len > XEN_FLEX_RING_SIZE(map->active.ring->ring_order)) + len = XEN_FLEX_RING_SIZE(map->active.ring->ring_order); + + while (!(flags & MSG_DONTWAIT) && !pvcalls_front_read_todo(map)) { + if (count < PVCALLS_FRON_MAX_SPIN) + count++; + else + wait_event_interruptible(map->active.inflight_conn_req, +pvcalls_front_read_todo(map)); + } + ret = __read_ring(map->active.ring, >active.data, + >msg_iter, len, flags); + + if (ret > 0) + notify_remote_via_irq(map->active.irq); + if (ret == 0) + ret = -EAGAIN; + if (ret == -ENOTCONN) + ret = 0; + + mutex_unlock(>active.in_mutex); + return ret; +} + int pvcalls_front_bind(struct socket *sock, struct sockaddr *addr, int addr_len) { struct pvcalls_bedata *bedata; diff --git
[Xen-devel] [PATCH v1 11/13] xen/pvcalls: implement release command
Send PVCALLS_RELEASE to the backend and wait for a reply. Take both in_mutex and out_mutex to avoid concurrent accesses. Then, free the socket. Signed-off-by: Stefano StabelliniCC: boris.ostrov...@oracle.com CC: jgr...@suse.com --- drivers/xen/pvcalls-front.c | 86 + drivers/xen/pvcalls-front.h | 1 + 2 files changed, 87 insertions(+) diff --git a/drivers/xen/pvcalls-front.c b/drivers/xen/pvcalls-front.c index b6cfb7d..bd3dfac 100644 --- a/drivers/xen/pvcalls-front.c +++ b/drivers/xen/pvcalls-front.c @@ -174,6 +174,24 @@ static irqreturn_t pvcalls_front_conn_handler(int irq, void *sock_map) return IRQ_HANDLED; } +static void pvcalls_front_free_map(struct pvcalls_bedata *bedata, + struct sock_mapping *map) +{ + int i; + + spin_lock(>pvcallss_lock); + if (!list_empty(>list)) + list_del_init(>list); + spin_unlock(>pvcallss_lock); + + /* what if the thread waiting still need access? */ + for (i = 0; i < (1 << map->active.ring->ring_order); i++) + gnttab_end_foreign_access(map->active.ring->ref[i], 0, 0); + gnttab_end_foreign_access(map->active.ref, 0, 0); + free_page((unsigned long)map->active.ring); + unbind_from_irqhandler(map->active.irq, map); +} + int pvcalls_front_socket(struct socket *sock) { struct pvcalls_bedata *bedata; @@ -805,6 +823,74 @@ unsigned int pvcalls_front_poll(struct file *file, struct socket *sock, return pvcalls_front_poll_passive(file, bedata, map, wait); } +int pvcalls_front_release(struct socket *sock) +{ + struct pvcalls_bedata *bedata; + struct sock_mapping *map; + int req_id, notify; + struct xen_pvcalls_request *req; + + if (!pvcalls_front_dev) + return -EIO; + bedata = dev_get_drvdata(_front_dev->dev); + if (!bedata) + return -EIO; + + if (sock->sk == NULL) + return 0; + + map = (struct sock_mapping *) READ_ONCE(sock->sk->sk_send_head); + if (map == NULL) + return 0; + WRITE_ONCE(sock->sk->sk_send_head, NULL); + + spin_lock(>pvcallss_lock); + req_id = bedata->ring.req_prod_pvt & (RING_SIZE(>ring) - 1); + BUG_ON(req_id >= PVCALLS_NR_REQ_PER_RING); + if (RING_FULL(>ring) || + READ_ONCE(bedata->rsp[req_id].req_id) != PVCALLS_INVALID_ID) { + spin_unlock(>pvcallss_lock); + return -EAGAIN; + } + req = RING_GET_REQUEST(>ring, req_id); + req->req_id = req_id; + req->cmd = PVCALLS_RELEASE; + req->u.release.id = (uint64_t)sock; + + bedata->ring.req_prod_pvt++; + RING_PUSH_REQUESTS_AND_CHECK_NOTIFY(>ring, notify); + spin_unlock(>pvcallss_lock); + if (notify) + notify_remote_via_irq(bedata->irq); + + wait_event(bedata->inflight_req, + READ_ONCE(bedata->rsp[req_id].req_id) == req_id); + + if (map->active_socket) { + /* +* Set in_error and wake up inflight_conn_req to force +* recvmsg waiters to exit. +*/ + map->active.ring->in_error = -EBADF; + wake_up_interruptible(>active.inflight_conn_req); + + mutex_lock(>active.in_mutex); + mutex_lock(>active.out_mutex); + pvcalls_front_free_map(bedata, map); + mutex_unlock(>active.out_mutex); + mutex_unlock(>active.in_mutex); + kfree(map); + } else { + spin_lock(>pvcallss_lock); + list_del_init(>list); + kfree(map); + spin_unlock(>pvcallss_lock); + } + WRITE_ONCE(bedata->rsp[req_id].req_id, PVCALLS_INVALID_ID); + + return 0; +} + static const struct xenbus_device_id pvcalls_front_ids[] = { { "pvcalls" }, { "" } diff --git a/drivers/xen/pvcalls-front.h b/drivers/xen/pvcalls-front.h index 25e05b8..3332978 100644 --- a/drivers/xen/pvcalls-front.h +++ b/drivers/xen/pvcalls-front.h @@ -23,5 +23,6 @@ int pvcalls_front_recvmsg(struct socket *sock, unsigned int pvcalls_front_poll(struct file *file, struct socket *sock, poll_table *wait); +int pvcalls_front_release(struct socket *sock); #endif -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH v1 06/13] xen/pvcalls: implement listen command
Send PVCALLS_LISTEN to the backend. Signed-off-by: Stefano StabelliniCC: boris.ostrov...@oracle.com CC: jgr...@suse.com --- drivers/xen/pvcalls-front.c | 49 + drivers/xen/pvcalls-front.h | 1 + 2 files changed, 50 insertions(+) diff --git a/drivers/xen/pvcalls-front.c b/drivers/xen/pvcalls-front.c index 71619bc..80fd5fb 100644 --- a/drivers/xen/pvcalls-front.c +++ b/drivers/xen/pvcalls-front.c @@ -361,6 +361,55 @@ int pvcalls_front_bind(struct socket *sock, struct sockaddr *addr, int addr_len) return 0; } +int pvcalls_front_listen(struct socket *sock, int backlog) +{ + struct pvcalls_bedata *bedata; + struct sock_mapping *map; + struct xen_pvcalls_request *req; + int notify, req_id, ret; + + if (!pvcalls_front_dev) + return -ENOTCONN; + bedata = dev_get_drvdata(_front_dev->dev); + + map = (struct sock_mapping *) READ_ONCE(sock->sk->sk_send_head); + if (!map) + return -ENOTSOCK; + + if (map->passive.status != PVCALLS_STATUS_BIND) + return -EOPNOTSUPP; + + spin_lock(>pvcallss_lock); + req_id = bedata->ring.req_prod_pvt & (RING_SIZE(>ring) - 1); + BUG_ON(req_id >= PVCALLS_NR_REQ_PER_RING); + if (RING_FULL(>ring) || + bedata->rsp[req_id].req_id != PVCALLS_INVALID_ID) { + spin_unlock(>pvcallss_lock); + return -EAGAIN; + } + req = RING_GET_REQUEST(>ring, req_id); + req->req_id = req_id; + req->cmd = PVCALLS_LISTEN; + req->u.listen.id = (uint64_t) sock; + req->u.listen.backlog = backlog; + + bedata->ring.req_prod_pvt++; + RING_PUSH_REQUESTS_AND_CHECK_NOTIFY(>ring, notify); + spin_unlock(>pvcallss_lock); + if (notify) + notify_remote_via_irq(bedata->irq); + + wait_event(bedata->inflight_req, + READ_ONCE(bedata->rsp[req_id].req_id) == req_id); + + map->passive.status = PVCALLS_STATUS_LISTEN; + ret = bedata->rsp[req_id].ret; + /* read ret, then set this rsp slot to be reused */ + smp_mb(); + WRITE_ONCE(bedata->rsp[req_id].req_id, PVCALLS_INVALID_ID); + return ret; +} + static const struct xenbus_device_id pvcalls_front_ids[] = { { "pvcalls" }, { "" } diff --git a/drivers/xen/pvcalls-front.h b/drivers/xen/pvcalls-front.h index 8b0a274..aa8fe10 100644 --- a/drivers/xen/pvcalls-front.h +++ b/drivers/xen/pvcalls-front.h @@ -9,5 +9,6 @@ int pvcalls_front_connect(struct socket *sock, struct sockaddr *addr, int pvcalls_front_bind(struct socket *sock, struct sockaddr *addr, int addr_len); +int pvcalls_front_listen(struct socket *sock, int backlog); #endif -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH v1 08/13] xen/pvcalls: implement sendmsg
Send data to an active socket by copying data to the "out" ring. Take the active socket out_mutex so that only one function can access the ring at any given time. If not enough room is available on the ring, rather than returning immediately or sleep-waiting, spin for up to 5000 cycles. This small optimization turns out to improve performance significantly. Signed-off-by: Stefano StabelliniCC: boris.ostrov...@oracle.com CC: jgr...@suse.com --- drivers/xen/pvcalls-front.c | 109 drivers/xen/pvcalls-front.h | 3 ++ 2 files changed, 112 insertions(+) diff --git a/drivers/xen/pvcalls-front.c b/drivers/xen/pvcalls-front.c index f3a04a2..bf29f40 100644 --- a/drivers/xen/pvcalls-front.c +++ b/drivers/xen/pvcalls-front.c @@ -27,6 +27,7 @@ #define PVCALLS_INVALID_ID (UINT_MAX) #define RING_ORDER XENBUS_MAX_RING_GRANT_ORDER #define PVCALLS_NR_REQ_PER_RING __CONST_RING_SIZE(xen_pvcalls, XEN_PAGE_SIZE) +#define PVCALLS_FRON_MAX_SPIN 5000 struct pvcalls_bedata { struct xen_pvcalls_front_ring ring; @@ -77,6 +78,22 @@ struct sock_mapping { }; }; +static int pvcalls_front_write_todo(struct sock_mapping *map) +{ + struct pvcalls_data_intf *intf = map->active.ring; + RING_IDX cons, prod, size = XEN_FLEX_RING_SIZE(intf->ring_order); + int32_t error; + + cons = intf->out_cons; + prod = intf->out_prod; + error = intf->out_error; + if (error == -ENOTCONN) + return 0; + if (error != 0) + return error; + return size - pvcalls_queued(prod, cons, size); +} + static irqreturn_t pvcalls_front_event_handler(int irq, void *dev_id) { struct xenbus_device *dev = dev_id; @@ -304,6 +321,98 @@ int pvcalls_front_connect(struct socket *sock, struct sockaddr *addr, return ret; } +static int __write_ring(struct pvcalls_data_intf *intf, + struct pvcalls_data *data, + struct iov_iter *msg_iter, + size_t len) +{ + RING_IDX cons, prod, size, masked_prod, masked_cons; + RING_IDX array_size = XEN_FLEX_RING_SIZE(intf->ring_order); + int32_t error; + + cons = intf->out_cons; + prod = intf->out_prod; + error = intf->out_error; + /* read indexes before continuing */ + virt_mb(); + + if (error < 0) + return error; + + size = pvcalls_queued(prod, cons, array_size); + if (size >= array_size) + return 0; + if (len > array_size - size) + len = array_size - size; + + masked_prod = pvcalls_mask(prod, array_size); + masked_cons = pvcalls_mask(cons, array_size); + + if (masked_prod < masked_cons) { + copy_from_iter(data->out + masked_prod, len, msg_iter); + } else { + if (len > array_size - masked_prod) { + copy_from_iter(data->out + masked_prod, + array_size - masked_prod, msg_iter); + copy_from_iter(data->out, + len - (array_size - masked_prod), + msg_iter); + } else { + copy_from_iter(data->out + masked_prod, len, msg_iter); + } + } + /* write to ring before updating pointer */ + virt_wmb(); + intf->out_prod += len; + + return len; +} + +int pvcalls_front_sendmsg(struct socket *sock, struct msghdr *msg, + size_t len) +{ + struct pvcalls_bedata *bedata; + struct sock_mapping *map; + int sent = 0, tot_sent = 0; + int count = 0, flags; + + if (!pvcalls_front_dev) + return -ENOTCONN; + bedata = dev_get_drvdata(_front_dev->dev); + + map = (struct sock_mapping *) READ_ONCE(sock->sk->sk_send_head); + if (!map) + return -ENOTSOCK; + + flags = msg->msg_flags; + if (flags & (MSG_CONFIRM|MSG_DONTROUTE|MSG_EOR|MSG_OOB)) + return -EOPNOTSUPP; + + mutex_lock(>active.out_mutex); + if ((flags & MSG_DONTWAIT) && !pvcalls_front_write_todo(map)) { + mutex_unlock(>active.out_mutex); + return -EAGAIN; + } + +again: + count++; + sent = __write_ring(map->active.ring, + >active.data, >msg_iter, + len); + if (sent > 0) { + len -= sent; + tot_sent += sent; + notify_remote_via_irq(map->active.irq); + } + if (sent >= 0 && len > 0 && count < PVCALLS_FRON_MAX_SPIN) + goto again; + if (sent < 0) + tot_sent = sent; + + mutex_unlock(>active.out_mutex); + return tot_sent; +} + int pvcalls_front_bind(struct socket *sock, struct sockaddr *addr, int addr_len) { struct pvcalls_bedata *bedata;
[Xen-devel] [PATCH v1 13/13] xen: introduce a Kconfig option to enable the pvcalls frontend
Also add pvcalls-front to the Makefile. Signed-off-by: Stefano StabelliniCC: boris.ostrov...@oracle.com CC: jgr...@suse.com --- drivers/xen/Kconfig | 9 + drivers/xen/Makefile | 1 + 2 files changed, 10 insertions(+) diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig index 4545561..ea5e99f 100644 --- a/drivers/xen/Kconfig +++ b/drivers/xen/Kconfig @@ -196,6 +196,15 @@ config XEN_PCIDEV_BACKEND If in doubt, say m. +config XEN_PVCALLS_FRONTEND + bool "XEN PV Calls frontend driver" + depends on INET && XEN + help + Experimental frontend for the Xen PV Calls protocol + (https://xenbits.xen.org/docs/unstable/misc/pvcalls.html). It + sends a small set of POSIX calls to the backend, which + implements them. + config XEN_PVCALLS_BACKEND bool "XEN PV Calls backend driver" depends on INET && XEN && XEN_BACKEND diff --git a/drivers/xen/Makefile b/drivers/xen/Makefile index 480b928..afb9e03 100644 --- a/drivers/xen/Makefile +++ b/drivers/xen/Makefile @@ -39,6 +39,7 @@ obj-$(CONFIG_XEN_EFI) += efi.o obj-$(CONFIG_XEN_SCSI_BACKEND) += xen-scsiback.o obj-$(CONFIG_XEN_AUTO_XLATE) += xlate_mmu.o obj-$(CONFIG_XEN_PVCALLS_BACKEND) += pvcalls-back.o +obj-$(CONFIG_XEN_PVCALLS_FRONTEND) += pvcalls-front.o xen-evtchn-y := evtchn.o xen-gntdev-y := gntdev.o xen-gntalloc-y := gntalloc.o -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH v1 00/13] introduce the Xen PV Calls frontend
Hi all, this series introduces the frontend for the newly introduced PV Calls procotol. PV Calls is a paravirtualized protocol that allows the implementation of a set of POSIX functions in a different domain. The PV Calls frontend sends POSIX function calls to the backend, which implements them and returns a value to the frontend and acts on the function call. For more information about PV Calls, please read: https://xenbits.xen.org/docs/unstable/misc/pvcalls.html This patch series only implements the frontend driver. It doesn't attempt to redirect POSIX calls to it. The functions exported in pvcalls-front.h are meant to be used for that. A separate patch series will be sent to use them and hook them into the system. Stefano Stabellini (13): xen/pvcalls: introduce the pvcalls xenbus frontend xen/pvcalls: connect to the backend xen/pvcalls: implement socket command and handle events xen/pvcalls: implement connect command xen/pvcalls: implement bind command xen/pvcalls: implement listen command xen/pvcalls: implement accept command xen/pvcalls: implement sendmsg xen/pvcalls: implement recvmsg xen/pvcalls: implement poll command xen/pvcalls: implement release command xen/pvcalls: implement frontend disconnect xen: introduce a Kconfig option to enable the pvcalls frontend drivers/xen/Kconfig |9 + drivers/xen/Makefile|1 + drivers/xen/pvcalls-front.c | 1097 +++ drivers/xen/pvcalls-front.h | 28 ++ 4 files changed, 1135 insertions(+) create mode 100644 drivers/xen/pvcalls-front.c create mode 100644 drivers/xen/pvcalls-front.h ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH v1 05/13] xen/pvcalls: implement bind command
Send PVCALLS_BIND to the backend. Introduce a new structure, part of struct sock_mapping, to store information specific to passive sockets. Introduce a status field to keep track of the status of the passive socket. Introduce a waitqueue for the "accept" command (see the accept command implementation): it is used to allow only one outstanding accept command at any given time and to implement polling on the passive socket. Introduce a flags field to keep track of in-flight accept and poll commands. sock->sk->sk_send_head is not used for ip sockets: reuse the field to store a pointer to the struct sock_mapping corresponding to the socket. Convert the struct socket pointer into an uint64_t and use it as id for the socket to pass to the backend. Signed-off-by: Stefano StabelliniCC: boris.ostrov...@oracle.com CC: jgr...@suse.com --- drivers/xen/pvcalls-front.c | 74 + drivers/xen/pvcalls-front.h | 3 ++ 2 files changed, 77 insertions(+) diff --git a/drivers/xen/pvcalls-front.c b/drivers/xen/pvcalls-front.c index 0d305e0..71619bc 100644 --- a/drivers/xen/pvcalls-front.c +++ b/drivers/xen/pvcalls-front.c @@ -57,6 +57,23 @@ struct sock_mapping { wait_queue_head_t inflight_conn_req; } active; + struct { + /* Socket status */ +#define PVCALLS_STATUS_UNINITALIZED 0 +#define PVCALLS_STATUS_BIND 1 +#define PVCALLS_STATUS_LISTEN2 + uint8_t status; + /* +* Internal state-machine flags. +* Only one accept operation can be inflight for a socket. +* Only one poll operation can be inflight for a given socket. +*/ +#define PVCALLS_FLAG_ACCEPT_INFLIGHT 0 +#define PVCALLS_FLAG_POLL_INFLIGHT 1 +#define PVCALLS_FLAG_POLL_RET2 + uint8_t flags; + wait_queue_head_t inflight_accept_req; + } passive; }; }; @@ -287,6 +304,63 @@ int pvcalls_front_connect(struct socket *sock, struct sockaddr *addr, return ret; } +int pvcalls_front_bind(struct socket *sock, struct sockaddr *addr, int addr_len) +{ + struct pvcalls_bedata *bedata; + struct sock_mapping *map = NULL; + struct xen_pvcalls_request *req; + int notify, req_id, ret; + + if (!pvcalls_front_dev) + return -ENOTCONN; + if (addr->sa_family != AF_INET || sock->type != SOCK_STREAM) + return -ENOTSUPP; + bedata = dev_get_drvdata(_front_dev->dev); + + map = kzalloc(sizeof(*map), GFP_KERNEL); + if (map == NULL) + return -ENOMEM; + + spin_lock(>pvcallss_lock); + req_id = bedata->ring.req_prod_pvt & (RING_SIZE(>ring) - 1); + BUG_ON(req_id >= PVCALLS_NR_REQ_PER_RING); + if (RING_FULL(>ring) || + READ_ONCE(bedata->rsp[req_id].req_id) != PVCALLS_INVALID_ID) { + kfree(map); + spin_unlock(>pvcallss_lock); + return -EAGAIN; + } + req = RING_GET_REQUEST(>ring, req_id); + req->req_id = req_id; + map->sock = sock; + req->cmd = PVCALLS_BIND; + req->u.bind.id = (uint64_t) sock; + memcpy(req->u.bind.addr, addr, sizeof(*addr)); + req->u.bind.len = addr_len; + + init_waitqueue_head(>passive.inflight_accept_req); + + list_add_tail(>list, >socketpass_mappings); + WRITE_ONCE(sock->sk->sk_send_head, (void *)map); + map->active_socket = false; + + bedata->ring.req_prod_pvt++; + RING_PUSH_REQUESTS_AND_CHECK_NOTIFY(>ring, notify); + spin_unlock(>pvcallss_lock); + if (notify) + notify_remote_via_irq(bedata->irq); + + wait_event(bedata->inflight_req, + READ_ONCE(bedata->rsp[req_id].req_id) == req_id); + + map->passive.status = PVCALLS_STATUS_BIND; + ret = bedata->rsp[req_id].ret; + /* read ret, then set this rsp slot to be reused */ + smp_mb(); + WRITE_ONCE(bedata->rsp[req_id].req_id, PVCALLS_INVALID_ID); + return 0; +} + static const struct xenbus_device_id pvcalls_front_ids[] = { { "pvcalls" }, { "" } diff --git a/drivers/xen/pvcalls-front.h b/drivers/xen/pvcalls-front.h index 63b0417..8b0a274 100644 --- a/drivers/xen/pvcalls-front.h +++ b/drivers/xen/pvcalls-front.h @@ -6,5 +6,8 @@ int pvcalls_front_socket(struct socket *sock); int pvcalls_front_connect(struct socket *sock, struct sockaddr *addr, int addr_len, int flags); +int pvcalls_front_bind(struct socket *sock, + struct sockaddr *addr, + int addr_len); #endif -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH v1 01/13] xen/pvcalls: introduce the pvcalls xenbus frontend
Introduce a xenbus frontend for the pvcalls protocol, as defined by https://xenbits.xen.org/docs/unstable/misc/pvcalls.html. This patch only adds the stubs, the code will be added by the following patches. Signed-off-by: Stefano StabelliniCC: boris.ostrov...@oracle.com CC: jgr...@suse.com --- drivers/xen/pvcalls-front.c | 68 + 1 file changed, 68 insertions(+) create mode 100644 drivers/xen/pvcalls-front.c diff --git a/drivers/xen/pvcalls-front.c b/drivers/xen/pvcalls-front.c new file mode 100644 index 000..173e204 --- /dev/null +++ b/drivers/xen/pvcalls-front.c @@ -0,0 +1,68 @@ +/* + * (c) 2017 Stefano Stabellini + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + */ + +#include + +#include +#include +#include +#include +#include + +static const struct xenbus_device_id pvcalls_front_ids[] = { + { "pvcalls" }, + { "" } +}; + +static int pvcalls_front_remove(struct xenbus_device *dev) +{ + return 0; +} + +static int pvcalls_front_probe(struct xenbus_device *dev, + const struct xenbus_device_id *id) +{ + return 0; +} + +static int pvcalls_front_resume(struct xenbus_device *dev) +{ + dev_warn(>dev, "suspsend/resume unsupported\n"); + return 0; +} + +static void pvcalls_front_changed(struct xenbus_device *dev, + enum xenbus_state backend_state) +{ +} + +static struct xenbus_driver pvcalls_front_driver = { + .ids = pvcalls_front_ids, + .probe = pvcalls_front_probe, + .remove = pvcalls_front_remove, + .resume = pvcalls_front_resume, + .otherend_changed = pvcalls_front_changed, +}; + +static int __init pvcalls_frontend_init(void) +{ + if (!xen_domain()) + return -ENODEV; + + pr_info("Initialising Xen pvcalls frontend driver\n"); + + return xenbus_register_frontend(_front_driver); +} + +module_init(pvcalls_frontend_init); -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH v1 10/13] xen/pvcalls: implement poll command
For active sockets, check the indexes and use the inflight_conn_req waitqueue to wait. For passive sockets, send PVCALLS_POLL to the backend. Use the inflight_accept_req waitqueue if an accept is outstanding. Otherwise use the inflight_req waitqueue: inflight_req is awaken when a new response is received; on wakeup we check whether the POLL response is arrived by looking at the PVCALLS_FLAG_POLL_RET flag. We set the flag from pvcalls_front_event_handler, if the response was for a POLL command. In pvcalls_front_event_handler, get the struct socket pointer from the poll id (we previously converted struct socket* to uint64_t and used it as id). Signed-off-by: Stefano StabelliniCC: boris.ostrov...@oracle.com CC: jgr...@suse.com --- drivers/xen/pvcalls-front.c | 123 drivers/xen/pvcalls-front.h | 3 ++ 2 files changed, 115 insertions(+), 11 deletions(-) diff --git a/drivers/xen/pvcalls-front.c b/drivers/xen/pvcalls-front.c index 3d1041a..b6cfb7d 100644 --- a/drivers/xen/pvcalls-front.c +++ b/drivers/xen/pvcalls-front.c @@ -128,17 +128,29 @@ static irqreturn_t pvcalls_front_event_handler(int irq, void *dev_id) rsp = RING_GET_RESPONSE(>ring, bedata->ring.rsp_cons); req_id = rsp->req_id; - src = (uint8_t *)>rsp[req_id]; - src += sizeof(rsp->req_id); - dst = (uint8_t *)rsp; - dst += sizeof(rsp->req_id); - memcpy(dst, src, sizeof(*rsp) - sizeof(rsp->req_id)); - /* -* First copy the rest of the data, then req_id. It is -* paired with the barrier when accessing bedata->rsp. -*/ - smp_wmb(); - WRITE_ONCE(bedata->rsp[req_id].req_id, rsp->req_id); + if (rsp->cmd == PVCALLS_POLL) { + struct socket *sock = (struct socket *) rsp->u.poll.id; + struct sock_mapping *map = + (struct sock_mapping *) + READ_ONCE(sock->sk->sk_send_head); + + set_bit(PVCALLS_FLAG_POLL_RET, + (void *)>passive.flags); + clear_bit(PVCALLS_FLAG_POLL_INFLIGHT, + (void *)>passive.flags); + } else { + src = (uint8_t *)>rsp[req_id]; + src += sizeof(rsp->req_id); + dst = (uint8_t *)rsp; + dst += sizeof(rsp->req_id); + memcpy(dst, src, sizeof(*rsp) - sizeof(rsp->req_id)); + /* +* First copy the rest of the data, then req_id. It is +* paired with the barrier when accessing bedata->rsp. +*/ + smp_wmb(); + WRITE_ONCE(bedata->rsp[req_id].req_id, rsp->req_id); + } bedata->ring.rsp_cons++; wake_up(>inflight_req); @@ -704,6 +716,95 @@ int pvcalls_front_accept(struct socket *sock, struct socket *newsock, int flags) return ret; } +static unsigned int pvcalls_front_poll_passive(struct file *file, + struct pvcalls_bedata *bedata, + struct sock_mapping *map, + poll_table *wait) +{ + int notify, req_id; + struct xen_pvcalls_request *req; + + if (test_bit(PVCALLS_FLAG_ACCEPT_INFLIGHT, +(void *)>passive.flags)) { + poll_wait(file, >passive.inflight_accept_req, wait); + return 0; + } + + if (test_and_clear_bit(PVCALLS_FLAG_POLL_RET, + (void *)>passive.flags)) + return POLLIN; + + if (test_and_set_bit(PVCALLS_FLAG_POLL_INFLIGHT, +(void *)>passive.flags)) { + poll_wait(file, >inflight_req, wait); + return 0; + } + + spin_lock(>pvcallss_lock); + req_id = bedata->ring.req_prod_pvt & (RING_SIZE(>ring) - 1); + BUG_ON(req_id >= PVCALLS_NR_REQ_PER_RING); + if (RING_FULL(>ring) || + READ_ONCE(bedata->rsp[req_id].req_id) != PVCALLS_INVALID_ID) { + spin_unlock(>pvcallss_lock); + return -EAGAIN; + } + req = RING_GET_REQUEST(>ring, req_id); + req->req_id = req_id; + req->cmd = PVCALLS_POLL; + req->u.poll.id = (uint64_t) map->sock; + + bedata->ring.req_prod_pvt++; + RING_PUSH_REQUESTS_AND_CHECK_NOTIFY(>ring, notify); + spin_unlock(>pvcallss_lock); + if (notify) + notify_remote_via_irq(bedata->irq); + + poll_wait(file, >inflight_req, wait); + return 0; +} + +static unsigned int pvcalls_front_poll_active(struct file *file, +
[Xen-devel] [PATCH v1 03/13] xen/pvcalls: implement socket command and handle events
Send a PVCALLS_SOCKET command to the backend, use the masked req_prod_pvt as req_id. This way, req_id is guaranteed to be between 0 and PVCALLS_NR_REQ_PER_RING. We already have a slot in the rsp array ready for the response, and there cannot be two outstanding responses with the same req_id. Wait for the response by waiting on the inflight_req waitqueue and check for the req_id field in rsp[req_id]. Use atomic accesses to read the field. Once a response is received, clear the corresponding rsp slot by setting req_id to PVCALLS_INVALID_ID. Note that PVCALLS_INVALID_ID is invalid only from the frontend point of view. It is not part of the PVCalls protocol. pvcalls_front_event_handler is in charge of copying responses from the ring to the appropriate rsp slot. It is done by copying the body of the response first, then by copying req_id atomically. After the copies, wake up anybody waiting on waitqueue. pvcallss_lock protects accesses to the ring. Signed-off-by: Stefano StabelliniCC: boris.ostrov...@oracle.com CC: jgr...@suse.com --- drivers/xen/pvcalls-front.c | 91 + drivers/xen/pvcalls-front.h | 8 2 files changed, 99 insertions(+) create mode 100644 drivers/xen/pvcalls-front.h diff --git a/drivers/xen/pvcalls-front.c b/drivers/xen/pvcalls-front.c index fb08ebf..7933c73 100644 --- a/drivers/xen/pvcalls-front.c +++ b/drivers/xen/pvcalls-front.c @@ -40,9 +40,100 @@ struct pvcalls_bedata { static irqreturn_t pvcalls_front_event_handler(int irq, void *dev_id) { + struct xenbus_device *dev = dev_id; + struct pvcalls_bedata *bedata; + struct xen_pvcalls_response *rsp; + uint8_t *src, *dst; + int req_id = 0, more = 0; + + if (dev == NULL) + return IRQ_HANDLED; + + bedata = dev_get_drvdata(>dev); + if (bedata == NULL) + return IRQ_HANDLED; + +again: + while (RING_HAS_UNCONSUMED_RESPONSES(>ring)) { + rsp = RING_GET_RESPONSE(>ring, bedata->ring.rsp_cons); + + req_id = rsp->req_id; + src = (uint8_t *)>rsp[req_id]; + src += sizeof(rsp->req_id); + dst = (uint8_t *)rsp; + dst += sizeof(rsp->req_id); + memcpy(dst, src, sizeof(*rsp) - sizeof(rsp->req_id)); + /* +* First copy the rest of the data, then req_id. It is +* paired with the barrier when accessing bedata->rsp. +*/ + smp_wmb(); + WRITE_ONCE(bedata->rsp[req_id].req_id, rsp->req_id); + + bedata->ring.rsp_cons++; + wake_up(>inflight_req); + } + + RING_FINAL_CHECK_FOR_RESPONSES(>ring, more); + if (more) + goto again; return IRQ_HANDLED; } +int pvcalls_front_socket(struct socket *sock) +{ + struct pvcalls_bedata *bedata; + struct xen_pvcalls_request *req; + int notify, req_id, ret; + + if (!pvcalls_front_dev) + return -EACCES; + /* +* PVCalls only supports domain AF_INET, +* type SOCK_STREAM and protocol 0 sockets for now. +* +* Check socket type here, AF_INET and protocol checks are done +* by the caller. +*/ + if (sock->type != SOCK_STREAM) + return -ENOTSUPP; + + bedata = dev_get_drvdata(_front_dev->dev); + + spin_lock(>pvcallss_lock); + req_id = bedata->ring.req_prod_pvt & (RING_SIZE(>ring) - 1); + BUG_ON(req_id >= PVCALLS_NR_REQ_PER_RING); + if (RING_FULL(>ring) || + READ_ONCE(bedata->rsp[req_id].req_id) != PVCALLS_INVALID_ID) { + spin_unlock(>pvcallss_lock); + return -EAGAIN; + } + req = RING_GET_REQUEST(>ring, req_id); + req->req_id = req_id; + req->cmd = PVCALLS_SOCKET; + req->u.socket.id = (uint64_t) sock; + req->u.socket.domain = AF_INET; + req->u.socket.type = SOCK_STREAM; + req->u.socket.protocol = 0; + + bedata->ring.req_prod_pvt++; + RING_PUSH_REQUESTS_AND_CHECK_NOTIFY(>ring, notify); + spin_unlock(>pvcallss_lock); + if (notify) + notify_remote_via_irq(bedata->irq); + + if (wait_event_interruptible(bedata->inflight_req, + READ_ONCE(bedata->rsp[req_id].req_id) == req_id) != 0) + return -EINTR; + + ret = bedata->rsp[req_id].ret; + /* read ret, then set this rsp slot to be reused */ + smp_mb(); + WRITE_ONCE(bedata->rsp[req_id].req_id, PVCALLS_INVALID_ID); + + return ret; +} + static const struct xenbus_device_id pvcalls_front_ids[] = { { "pvcalls" }, { "" } diff --git a/drivers/xen/pvcalls-front.h b/drivers/xen/pvcalls-front.h new file mode 100644 index 000..b7dabed --- /dev/null +++ b/drivers/xen/pvcalls-front.h @@ -0,0 +1,8 @@ +#ifndef __PVCALLS_FRONT_H__ +#define __PVCALLS_FRONT_H__ + +#include +
[Xen-devel] [PATCH v1 02/13] xen/pvcalls: connect to the backend
Implement the probe function for the pvcalls frontend. Read the supported versions, max-page-order and function-calls nodes from xenstore. Introduce a data structure named pvcalls_bedata. It contains pointers to the command ring, the event channel, a list of active sockets and a list of passive sockets. Lists accesses are protected by a spin_lock. Introduce a waitqueue to allow waiting for a response on commands sent to the backend. Introduce an array of struct xen_pvcalls_response to store commands responses. Only one frontend<->backend connection is supported at any given time for a guest. Store the active frontend device to a static pointer. Introduce a stub functions for the event handler. Signed-off-by: Stefano StabelliniCC: boris.ostrov...@oracle.com CC: jgr...@suse.com --- drivers/xen/pvcalls-front.c | 153 1 file changed, 153 insertions(+) diff --git a/drivers/xen/pvcalls-front.c b/drivers/xen/pvcalls-front.c index 173e204..fb08ebf 100644 --- a/drivers/xen/pvcalls-front.c +++ b/drivers/xen/pvcalls-front.c @@ -20,6 +20,29 @@ #include #include +#define PVCALLS_INVALID_ID (UINT_MAX) +#define RING_ORDER XENBUS_MAX_RING_GRANT_ORDER +#define PVCALLS_NR_REQ_PER_RING __CONST_RING_SIZE(xen_pvcalls, XEN_PAGE_SIZE) + +struct pvcalls_bedata { + struct xen_pvcalls_front_ring ring; + grant_ref_t ref; + int irq; + + struct list_head socket_mappings; + struct list_head socketpass_mappings; + spinlock_t pvcallss_lock; + + wait_queue_head_t inflight_req; + struct xen_pvcalls_response rsp[PVCALLS_NR_REQ_PER_RING]; +}; +struct xenbus_device *pvcalls_front_dev; + +static irqreturn_t pvcalls_front_event_handler(int irq, void *dev_id) +{ + return IRQ_HANDLED; +} + static const struct xenbus_device_id pvcalls_front_ids[] = { { "pvcalls" }, { "" } @@ -33,7 +56,114 @@ static int pvcalls_front_remove(struct xenbus_device *dev) static int pvcalls_front_probe(struct xenbus_device *dev, const struct xenbus_device_id *id) { + int ret = -EFAULT, evtchn, ref = -1, i; + unsigned int max_page_order, function_calls, len; + char *versions; + grant_ref_t gref_head = 0; + struct xenbus_transaction xbt; + struct pvcalls_bedata *bedata = NULL; + struct xen_pvcalls_sring *sring; + + if (pvcalls_front_dev != NULL) { + dev_err(>dev, "only one PV Calls connection supported\n"); + return -EINVAL; + } + + versions = xenbus_read(XBT_NIL, dev->otherend, "versions", ); + if (!len) + return -EINVAL; + if (strcmp(versions, "1")) { + kfree(versions); + return -EINVAL; + } + kfree(versions); + ret = xenbus_scanf(XBT_NIL, dev->otherend, + "max-page-order", "%u", _page_order); + if (ret <= 0) + return -ENODEV; + if (max_page_order < RING_ORDER) + return -ENODEV; + ret = xenbus_scanf(XBT_NIL, dev->otherend, + "function-calls", "%u", _calls); + if (ret <= 0 || function_calls != 1) + return -ENODEV; + pr_info("%s max-page-order is %u\n", __func__, max_page_order); + + bedata = kzalloc(sizeof(struct pvcalls_bedata), GFP_KERNEL); + if (!bedata) + return -ENOMEM; + + init_waitqueue_head(>inflight_req); + for (i = 0; i < PVCALLS_NR_REQ_PER_RING; i++) + bedata->rsp[i].req_id = PVCALLS_INVALID_ID; + + sring = (struct xen_pvcalls_sring *) __get_free_page(GFP_KERNEL | +__GFP_ZERO); + if (!sring) + goto error; + SHARED_RING_INIT(sring); + FRONT_RING_INIT(>ring, sring, XEN_PAGE_SIZE); + + ret = xenbus_alloc_evtchn(dev, ); + if (ret) + goto error; + + bedata->irq = bind_evtchn_to_irqhandler(evtchn, + pvcalls_front_event_handler, + 0, "pvcalls-frontend", dev); + if (bedata->irq < 0) { + ret = bedata->irq; + goto error; + } + + ret = gnttab_alloc_grant_references(1, _head); + if (ret < 0) + goto error; + bedata->ref = ref = gnttab_claim_grant_reference(_head); + if (ref < 0) + goto error; + gnttab_grant_foreign_access_ref(ref, dev->otherend_id, + virt_to_gfn((void *)sring), 0); + + again: + ret = xenbus_transaction_start(); + if (ret) { + xenbus_dev_fatal(dev, ret, "starting transaction"); + goto error; + } + ret = xenbus_printf(xbt, dev->nodename, "version", "%u", 1); + if (ret) + goto error_xenbus; + ret = xenbus_printf(xbt,
[Xen-devel] [PATCH v1 07/13] xen/pvcalls: implement accept command
Send PVCALLS_ACCEPT to the backend. Allocate a new active socket. Make sure that only one accept command is executed at any given time by setting PVCALLS_FLAG_ACCEPT_INFLIGHT and waiting on the inflight_accept_req waitqueue. sock->sk->sk_send_head is not used for ip sockets: reuse the field to store a pointer to the struct sock_mapping corresponding to the socket. Convert the new struct socket pointer into an uint64_t and use it as id for the new socket to pass to the backend. Signed-off-by: Stefano StabelliniCC: boris.ostrov...@oracle.com CC: jgr...@suse.com --- drivers/xen/pvcalls-front.c | 79 + drivers/xen/pvcalls-front.h | 3 ++ 2 files changed, 82 insertions(+) diff --git a/drivers/xen/pvcalls-front.c b/drivers/xen/pvcalls-front.c index 80fd5fb..f3a04a2 100644 --- a/drivers/xen/pvcalls-front.c +++ b/drivers/xen/pvcalls-front.c @@ -410,6 +410,85 @@ int pvcalls_front_listen(struct socket *sock, int backlog) return ret; } +int pvcalls_front_accept(struct socket *sock, struct socket *newsock, int flags) +{ + struct pvcalls_bedata *bedata; + struct sock_mapping *map; + struct sock_mapping *map2 = NULL; + struct xen_pvcalls_request *req; + int notify, req_id, ret, evtchn; + + if (!pvcalls_front_dev) + return -ENOTCONN; + bedata = dev_get_drvdata(_front_dev->dev); + + map = (struct sock_mapping *) READ_ONCE(sock->sk->sk_send_head); + if (!map) + return -ENOTSOCK; + + if (map->passive.status != PVCALLS_STATUS_LISTEN) + return -EINVAL; + + /* +* Backend only supports 1 inflight accept request, will return +* errors for the others +*/ + if (test_and_set_bit(PVCALLS_FLAG_ACCEPT_INFLIGHT, +(void *)>passive.flags)) { + if (wait_event_interruptible(map->passive.inflight_accept_req, + !test_and_set_bit(PVCALLS_FLAG_ACCEPT_INFLIGHT, + (void *)>passive.flags)) + != 0) + return -EINTR; + } + + + newsock->sk = kzalloc(sizeof(*newsock->sk), GFP_KERNEL); + if (newsock->sk == NULL) + return -ENOMEM; + + spin_lock(>pvcallss_lock); + req_id = bedata->ring.req_prod_pvt & (RING_SIZE(>ring) - 1); + BUG_ON(req_id >= PVCALLS_NR_REQ_PER_RING); + if (RING_FULL(>ring) || + READ_ONCE(bedata->rsp[req_id].req_id) != PVCALLS_INVALID_ID) { + spin_unlock(>pvcallss_lock); + return -EAGAIN; + } + + map2 = create_active(); + + req = RING_GET_REQUEST(>ring, req_id); + req->req_id = req_id; + req->cmd = PVCALLS_ACCEPT; + req->u.accept.id = (uint64_t) sock; + req->u.accept.ref = map2->active.ref; + req->u.accept.id_new = (uint64_t) newsock; + req->u.accept.evtchn = evtchn; + + list_add_tail(>list, >socket_mappings); + WRITE_ONCE(newsock->sk->sk_send_head, (void *)map2); + map2->sock = newsock; + + bedata->ring.req_prod_pvt++; + RING_PUSH_REQUESTS_AND_CHECK_NOTIFY(>ring, notify); + spin_unlock(>pvcallss_lock); + if (notify) + notify_remote_via_irq(bedata->irq); + + wait_event(bedata->inflight_req, + READ_ONCE(bedata->rsp[req_id].req_id) == req_id); + + clear_bit(PVCALLS_FLAG_ACCEPT_INFLIGHT, (void *)>passive.flags); + wake_up(>passive.inflight_accept_req); + + ret = bedata->rsp[req_id].ret; + /* read ret, then set this rsp slot to be reused */ + smp_mb(); + WRITE_ONCE(bedata->rsp[req_id].req_id, PVCALLS_INVALID_ID); + return ret; +} + static const struct xenbus_device_id pvcalls_front_ids[] = { { "pvcalls" }, { "" } diff --git a/drivers/xen/pvcalls-front.h b/drivers/xen/pvcalls-front.h index aa8fe10..ab4f1da 100644 --- a/drivers/xen/pvcalls-front.h +++ b/drivers/xen/pvcalls-front.h @@ -10,5 +10,8 @@ int pvcalls_front_bind(struct socket *sock, struct sockaddr *addr, int addr_len); int pvcalls_front_listen(struct socket *sock, int backlog); +int pvcalls_front_accept(struct socket *sock, +struct socket *newsock, +int flags); #endif -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH v1 12/13] xen/pvcalls: implement frontend disconnect
Implement pvcalls frontend removal function. Go through the list of active and passive sockets and free them all, one at a time. Signed-off-by: Stefano StabelliniCC: boris.ostrov...@oracle.com CC: jgr...@suse.com --- drivers/xen/pvcalls-front.c | 28 1 file changed, 28 insertions(+) diff --git a/drivers/xen/pvcalls-front.c b/drivers/xen/pvcalls-front.c index bd3dfac..fcc15fb 100644 --- a/drivers/xen/pvcalls-front.c +++ b/drivers/xen/pvcalls-front.c @@ -898,6 +898,34 @@ int pvcalls_front_release(struct socket *sock) static int pvcalls_front_remove(struct xenbus_device *dev) { + struct pvcalls_bedata *bedata; + struct sock_mapping *map = NULL, *n; + + bedata = dev_get_drvdata(_front_dev->dev); + + list_for_each_entry_safe(map, n, >socket_mappings, list) { + mutex_lock(>active.in_mutex); + mutex_lock(>active.out_mutex); + pvcalls_front_free_map(bedata, map); + mutex_unlock(>active.out_mutex); + mutex_unlock(>active.in_mutex); + kfree(map); + } + list_for_each_entry_safe(map, n, >socketpass_mappings, list) { + spin_lock(>pvcallss_lock); + list_del_init(>list); + spin_unlock(>pvcallss_lock); + kfree(map); + } + if (bedata->irq > 0) + unbind_from_irqhandler(bedata->irq, dev); + if (bedata->ref >= 0) + gnttab_end_foreign_access(bedata->ref, 0, 0); + kfree(bedata->ring.sring); + kfree(bedata); + dev_set_drvdata(>dev, NULL); + xenbus_switch_state(dev, XenbusStateClosed); + pvcalls_front_dev = NULL; return 0; } -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] Question about hvm_monitor_interrupt
On 07/22/2017 12:33 AM, Tamas K Lengyel wrote: > Hey Razvan, Hello, > the vm_event that is being generated by doing > VM_EVENT_FLAG_GET_NEXT_INTERRUPT sends almost all required information > about the interrupt to the listener to allow it to get reinjected, > except the instruction length. If the listener wants to reinject the > interrupt to the guest via xc_hvm_inject_trap the instruction length > is something needing to be specified. So shouldn't that information be > included in the vm_event? We only care about requesting guest page faults (TRAP_page_fault), so that we may be able to inspect things like swapped-out pages, so for that purpose the instruction length is not necessary. Having said that, there's nothing against adding the instruction length to the vm_event if you need it. Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] Question about hvm_monitor_interrupt
Hey Razvan, the vm_event that is being generated by doing VM_EVENT_FLAG_GET_NEXT_INTERRUPT sends almost all required information about the interrupt to the listener to allow it to get reinjected, except the instruction length. If the listener wants to reinject the interrupt to the guest via xc_hvm_inject_trap the instruction length is something needing to be specified. So shouldn't that information be included in the vm_event? Thanks, Tamas ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [libvirt test] 112081: tolerable all pass - PUSHED
flight 112081 libvirt real [real] http://logs.test-lab.xenproject.org/osstest/logs/112081/ Failures :-/ but no regressions. Tests which did not succeed, but are not blocking: test-armhf-armhf-libvirt 14 saverestore-support-checkfail like 112036 test-armhf-armhf-libvirt-xsm 14 saverestore-support-checkfail like 112036 test-armhf-armhf-libvirt-raw 13 saverestore-support-checkfail like 112036 test-amd64-i386-libvirt 13 migrate-support-checkfail never pass test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail never pass test-amd64-amd64-libvirt 13 migrate-support-checkfail never pass test-amd64-i386-libvirt-xsm 13 migrate-support-checkfail never pass test-arm64-arm64-libvirt 13 migrate-support-checkfail never pass test-arm64-arm64-libvirt 14 saverestore-support-checkfail never pass test-arm64-arm64-libvirt-xsm 13 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 14 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-arm64-arm64-libvirt-qcow2 12 migrate-support-checkfail never pass test-arm64-arm64-libvirt-qcow2 13 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-vhd 12 migrate-support-checkfail never pass test-armhf-armhf-libvirt 13 migrate-support-checkfail never pass test-armhf-armhf-libvirt-xsm 13 migrate-support-checkfail never pass test-armhf-armhf-libvirt-raw 12 migrate-support-checkfail never pass version targeted for testing: libvirt e04d1074f801a211e2767545e2816cc98d820dd3 baseline version: libvirt 9af764e86aef7dfb0191a9561bf1d1abf941da05 Last test of basis 112036 2017-07-20 04:21:29 Z1 days Testing same since 112081 2017-07-21 04:21:50 Z0 days1 attempts People who touched revisions under test: Antoine MilletChen Hanxiao Cole Robinson Erik Skultety Hao Peng John Ferlan Michal Privoznik Pavel Hrdina Peng Hao Peter Krempa jobs: build-amd64-xsm pass build-arm64-xsm pass build-armhf-xsm pass build-i386-xsm pass build-amd64 pass build-arm64 pass build-armhf pass build-i386 pass build-amd64-libvirt pass build-arm64-libvirt pass build-armhf-libvirt pass build-i386-libvirt pass build-amd64-pvopspass build-arm64-pvopspass build-armhf-pvopspass build-i386-pvops pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsmpass test-amd64-amd64-libvirt-xsm pass test-arm64-arm64-libvirt-xsm pass test-armhf-armhf-libvirt-xsm pass test-amd64-i386-libvirt-xsm pass test-amd64-amd64-libvirt pass test-arm64-arm64-libvirt pass test-armhf-armhf-libvirt pass test-amd64-i386-libvirt pass test-amd64-amd64-libvirt-pairpass test-amd64-i386-libvirt-pair pass test-arm64-arm64-libvirt-qcow2 pass test-armhf-armhf-libvirt-raw pass test-amd64-amd64-libvirt-vhd pass sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of
Re: [Xen-devel] [GIT PULL] xen: features and fixes for 4.13-rc2
On Fri, Jul 21, 2017 at 3:17 AM, Juergen Grosswrote: > drivers/xen/pvcalls-back.c | 1236 > This really doesn't look like a fix. The merge window is over. So I'm not pulling this without way more explanations of why I should. Linus ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] xen: selfballoon: remove unnecessary static in frontswap_selfshrink()
Hi Juergen, On 07/21/2017 02:36 AM, Juergen Gross wrote: On 04/07/17 20:34, Gustavo A. R. Silva wrote: Remove unnecessary static on local variables last_frontswap_pages and tgt_frontswap_pages. Such variables are initialized before being used, on every execution path throughout the function. The statics have no benefit and, removing them reduce the code size. This issue was detected using Coccinelle and the following semantic patch: @bad exists@ position p; identifier x; type T; @@ static T x@p; ... x = <+...x...+> @@ identifier x; expression e; type T; position p != bad.p; @@ -static T x@p; ... when != x when strict ?x = e; You can see a significant difference in the code size after executing the size command, before and after the code change: before: textdata bss dec hex filename 56333452 384946924fd drivers/xen/xen-selfballoon.o after: textdata bss dec hex filename 55763308 256914023b4 drivers/xen/xen-selfballoon.o Signed-off-by: Gustavo A. R. SilvaReviewed-by: Juergen Gross Thank you! -- Gustavo A. R. Silva ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 6/6] xen: sched: optimize exclusive pinning case (Credit1 & 2)
On Fri, Jul 21, 2017 at 8:55 PM, Dario Faggioliwrote: > On Fri, 2017-07-21 at 18:19 +0100, George Dunlap wrote: >> On 06/23/2017 11:55 AM, Dario Faggioli wrote: >> > diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c >> > index 4f6330e..85e014d 100644 >> > --- a/xen/common/sched_credit.c >> > +++ b/xen/common/sched_credit.c >> > @@ -429,6 +429,24 @@ static inline void __runq_tickle(struct >> > csched_vcpu *new) >> > idlers_empty = cpumask_empty(_mask); >> > >> > /* >> > + * Exclusive pinning is when a vcpu has hard-affinity with >> > only one >> > + * cpu, and there is no other vcpu that has hard-affinity with >> > that >> > + * same cpu. This is infrequent, but if it happens, is for >> > achieving >> > + * the most possible determinism, and least possible overhead >> > for >> > + * the vcpus in question. >> > + * >> > + * Try to identify the vast majority of these situations, and >> > deal >> > + * with them quickly. >> > + */ >> > +if ( unlikely(cpumask_cycle(cpu, new->vcpu->cpu_hard_affinity) >> > == cpu && >> >> Won't this check entail a full "loop" of the cpumask? It's cheap >> enough >> if nr_cpu_ids is small; but don't we support (theoretically) 4096 >> logical cpus? >> >> It seems like having a vcpu flag that identifies a vcpu as being >> pinned >> would be a more efficient way to do this. That way we could run this >> check once whenever the hard affinity changed, rather than every time >> we >> want to think about where to run this vcpu. >> >> What do you think? >> > Right. We actually should get some help from the hardware (ffs & > firends)... but I think you're right. Implementing this with a flag, as > you're suggesting, is most likely better, and easy enough. > > I'll go for that! Cool. BTW I checked the first 5 in. -George ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [RFC PATCH v2 17/22] ARM: vGIC: introduce vgic_lock_vcpu_irq()
Since a VCPU can own multiple IRQs, the natural locking order is to take a VCPU lock first, then the individual per-IRQ locks. However there are situations where the target VCPU is not known without looking into the struct pending_irq first, which usually means we need to take the IRQ lock first. To solve this problem, we provide a function called vgic_lock_vcpu_irq(), which takes a locked struct pending_irq() and returns with *both* the VCPU and the IRQ lock held. This is done by looking up the target VCPU, then briefly dropping the IRQ lock, taking the VCPU lock, then grabbing the per-IRQ lock again. Before returning there is a check whether something has changed in the brief period where we didn't hold the IRQ lock, retrying in this (very rare) case. Signed-off-by: Andre Przywara--- xen/arch/arm/vgic.c | 42 ++ 1 file changed, 42 insertions(+) diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c index 1ba0010..0e6dfe5 100644 --- a/xen/arch/arm/vgic.c +++ b/xen/arch/arm/vgic.c @@ -224,6 +224,48 @@ int vcpu_vgic_free(struct vcpu *v) return 0; } +/** + * vgic_lock_vcpu_irq(): lock both the pending_irq and the corresponding VCPU + * + * @v: the VCPU (for private IRQs) + * @p: pointer to the locked struct pending_irq + * @flags: pointer to the IRQ flags used when locking the VCPU + * + * The function takes a locked IRQ and returns with both the IRQ and the + * corresponding VCPU locked. This is non-trivial due to the locking order + * being actually the other way round (VCPU first, then IRQ). + * + * Returns: pointer to the VCPU this IRQ is targeting. + */ +struct vcpu *vgic_lock_vcpu_irq(struct vcpu *v, struct pending_irq *p, +unsigned long *flags) +{ +struct vcpu *target_vcpu; + +ASSERT(spin_is_locked(>lock)); + +target_vcpu = vgic_get_target_vcpu(v, p); +spin_unlock(>lock); + +do +{ +struct vcpu *current_vcpu; + +spin_lock_irqsave(_vcpu->arch.vgic.lock, *flags); +spin_lock(>lock); + +current_vcpu = vgic_get_target_vcpu(v, p); + +if ( target_vcpu->vcpu_id == current_vcpu->vcpu_id ) +return target_vcpu; + +spin_unlock(>lock); +spin_unlock_irqrestore(_vcpu->arch.vgic.lock, *flags); + +target_vcpu = current_vcpu; +} while (1); +} + struct vcpu *vgic_get_target_vcpu(struct vcpu *v, struct pending_irq *p) { struct vgic_irq_rank *rank = vgic_rank_irq(v, p->irq); -- 2.9.0 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [RFC PATCH v2 18/22] ARM: vGIC: move virtual IRQ target VCPU from rank to pending_irq
The VCPU a shared virtual IRQ is targeting is currently stored in the irq_rank structure. For LPIs we already store the target VCPU in struct pending_irq, so move SPIs over as well. The ITS code, which was using this field already, was so far using the VCPU lock to protect the pending_irq, so move this over to the new lock. Signed-off-by: Andre Przywara--- xen/arch/arm/vgic-v2.c | 56 +++ xen/arch/arm/vgic-v3-its.c | 9 +++--- xen/arch/arm/vgic-v3.c | 69 --- xen/arch/arm/vgic.c| 73 +- xen/include/asm-arm/vgic.h | 13 +++-- 5 files changed, 96 insertions(+), 124 deletions(-) diff --git a/xen/arch/arm/vgic-v2.c b/xen/arch/arm/vgic-v2.c index 0c8a598..c7ed3ce 100644 --- a/xen/arch/arm/vgic-v2.c +++ b/xen/arch/arm/vgic-v2.c @@ -66,19 +66,22 @@ void vgic_v2_setup_hw(paddr_t dbase, paddr_t cbase, paddr_t csize, * * Note the byte offset will be aligned to an ITARGETSR boundary. */ -static uint32_t vgic_fetch_itargetsr(struct vgic_irq_rank *rank, - unsigned int offset) +static uint32_t vgic_fetch_itargetsr(struct vcpu *v, unsigned int offset) { uint32_t reg = 0; unsigned int i; +unsigned long flags; -ASSERT(spin_is_locked(>lock)); - -offset &= INTERRUPT_RANK_MASK; offset &= ~(NR_TARGETS_PER_ITARGETSR - 1); for ( i = 0; i < NR_TARGETS_PER_ITARGETSR; i++, offset++ ) -reg |= (1 << read_atomic(>vcpu[offset])) << (i * NR_BITS_PER_TARGET); +{ +struct pending_irq *p = irq_to_pending(v, offset); + +vgic_irq_lock(p, flags); +reg |= (1 << p->vcpu_id) << (i * NR_BITS_PER_TARGET); +vgic_irq_unlock(p, flags); +} return reg; } @@ -89,32 +92,29 @@ static uint32_t vgic_fetch_itargetsr(struct vgic_irq_rank *rank, * * Note the byte offset will be aligned to an ITARGETSR boundary. */ -static void vgic_store_itargetsr(struct domain *d, struct vgic_irq_rank *rank, +static void vgic_store_itargetsr(struct domain *d, unsigned int offset, uint32_t itargetsr) { unsigned int i; unsigned int virq; -ASSERT(spin_is_locked(>lock)); - /* * The ITARGETSR0-7, used for SGIs/PPIs, are implemented RO in the * emulation and should never call this function. * - * They all live in the first rank. + * They all live in the first four bytes of ITARGETSR. */ -BUILD_BUG_ON(NR_INTERRUPT_PER_RANK != 32); -ASSERT(rank->index >= 1); +ASSERT(offset >= 4); -offset &= INTERRUPT_RANK_MASK; +virq = offset; offset &= ~(NR_TARGETS_PER_ITARGETSR - 1); -virq = rank->index * NR_INTERRUPT_PER_RANK + offset; - for ( i = 0; i < NR_TARGETS_PER_ITARGETSR; i++, offset++, virq++ ) { unsigned int new_target, old_target; +unsigned long flags; uint8_t new_mask; +struct pending_irq *p = spi_to_pending(d, virq); /* * Don't need to mask as we rely on new_mask to fit for only one @@ -151,16 +151,14 @@ static void vgic_store_itargetsr(struct domain *d, struct vgic_irq_rank *rank, /* The vCPU ID always starts from 0 */ new_target--; -old_target = read_atomic(>vcpu[offset]); +vgic_irq_lock(p, flags); +old_target = p->vcpu_id; /* Only migrate the vIRQ if the target vCPU has changed */ if ( new_target != old_target ) -{ -if ( vgic_migrate_irq(d->vcpu[old_target], - d->vcpu[new_target], - virq) ) -write_atomic(>vcpu[offset], new_target); -} +vgic_migrate_irq(p, , d->vcpu[new_target]); +else +vgic_irq_unlock(p, flags); } } @@ -264,11 +262,7 @@ static int vgic_v2_distr_mmio_read(struct vcpu *v, mmio_info_t *info, uint32_t itargetsr; if ( dabt.size != DABT_BYTE && dabt.size != DABT_WORD ) goto bad_width; -rank = vgic_rank_offset(v, 8, gicd_reg - GICD_ITARGETSR, DABT_WORD); -if ( rank == NULL) goto read_as_zero; -vgic_lock_rank(v, rank, flags); -itargetsr = vgic_fetch_itargetsr(rank, gicd_reg - GICD_ITARGETSR); -vgic_unlock_rank(v, rank, flags); +itargetsr = vgic_fetch_itargetsr(v, gicd_reg - GICD_ITARGETSR); *r = vreg_reg32_extract(itargetsr, info); return 1; @@ -498,14 +492,10 @@ static int vgic_v2_distr_mmio_write(struct vcpu *v, mmio_info_t *info, uint32_t itargetsr; if ( dabt.size != DABT_BYTE && dabt.size != DABT_WORD ) goto bad_width; -rank = vgic_rank_offset(v, 8, gicd_reg - GICD_ITARGETSR, DABT_WORD); -if ( rank == NULL) goto write_ignore; -vgic_lock_rank(v, rank, flags); -itargetsr = vgic_fetch_itargetsr(rank, gicd_reg - GICD_ITARGETSR); +
[Xen-devel] [RFC PATCH v2 21/22] ARM: vITS: injecting LPIs: use pending_irq lock
Instead of using an atomic access and hoping for the best, let's use the new pending_irq lock now to make sure we read a sane version of the target VCPU. That still doesn't solve the problem mentioned in the comment, but paves the way for future improvements. Signed-off-by: Andre Przywara--- xen/arch/arm/gic-v3-lpi.c | 14 -- 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/xen/arch/arm/gic-v3-lpi.c b/xen/arch/arm/gic-v3-lpi.c index 2306b58..9db26ed 100644 --- a/xen/arch/arm/gic-v3-lpi.c +++ b/xen/arch/arm/gic-v3-lpi.c @@ -140,20 +140,22 @@ void vgic_vcpu_inject_lpi(struct domain *d, unsigned int virq) { /* * TODO: this assumes that the struct pending_irq stays valid all of - * the time. We cannot properly protect this with the current locking - * scheme, but the future per-IRQ lock will solve this problem. + * the time. We cannot properly protect this with the current code, + * but a future refcounting will solve this problem. */ struct pending_irq *p = irq_to_pending(d->vcpu[0], virq); +unsigned long flags; unsigned int vcpu_id; if ( !p ) return; -vcpu_id = ACCESS_ONCE(p->vcpu_id); -if ( vcpu_id >= d->max_vcpus ) - return; +vgic_irq_lock(p, flags); +vcpu_id = p->vcpu_id; +vgic_irq_unlock(p, flags); -vgic_vcpu_inject_irq(d->vcpu[vcpu_id], virq); +if ( vcpu_id < d->max_vcpus ) +vgic_vcpu_inject_irq(d->vcpu[vcpu_id], virq); } /* -- 2.9.0 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [RFC PATCH v2 09/22] ARM: vITS: protect LPI priority update with pending_irq lock
As the priority value is now officially a member of struct pending_irq, we need to take its lock when manipulating it via ITS commands. Make sure we take the IRQ lock after the VCPU lock when we need both. Signed-off-by: Andre Przywara--- xen/arch/arm/vgic-v3-its.c | 26 +++--- 1 file changed, 19 insertions(+), 7 deletions(-) diff --git a/xen/arch/arm/vgic-v3-its.c b/xen/arch/arm/vgic-v3-its.c index 66095d4..705708a 100644 --- a/xen/arch/arm/vgic-v3-its.c +++ b/xen/arch/arm/vgic-v3-its.c @@ -402,6 +402,7 @@ static int update_lpi_property(struct domain *d, struct pending_irq *p) uint8_t property; int ret; +ASSERT(spin_is_locked(>lock)); /* * If no redistributor has its LPIs enabled yet, we can't access the * property table. In this case we just can't update the properties, @@ -419,7 +420,7 @@ static int update_lpi_property(struct domain *d, struct pending_irq *p) if ( ret ) return ret; -write_atomic(>priority, property & LPI_PROP_PRIO_MASK); +p->priority = property & LPI_PROP_PRIO_MASK; if ( property & LPI_PROP_ENABLED ) set_bit(GIC_IRQ_GUEST_ENABLED, >status); @@ -457,7 +458,7 @@ static int its_handle_inv(struct virt_its *its, uint64_t *cmdptr) uint32_t devid = its_cmd_get_deviceid(cmdptr); uint32_t eventid = its_cmd_get_id(cmdptr); struct pending_irq *p; -unsigned long flags; +unsigned long flags, vcpu_flags; struct vcpu *vcpu; uint32_t vlpi; int ret = -1; @@ -485,7 +486,8 @@ static int its_handle_inv(struct virt_its *its, uint64_t *cmdptr) if ( unlikely(!p) ) goto out_unlock_its; -spin_lock_irqsave(>arch.vgic.lock, flags); +spin_lock_irqsave(>arch.vgic.lock, vcpu_flags); +vgic_irq_lock(p, flags); /* Read the property table and update our cached status. */ if ( update_lpi_property(d, p) ) @@ -497,7 +499,8 @@ static int its_handle_inv(struct virt_its *its, uint64_t *cmdptr) ret = 0; out_unlock: -spin_unlock_irqrestore(>arch.vgic.lock, flags); +vgic_irq_unlock(p, flags); +spin_unlock_irqrestore(>arch.vgic.lock, vcpu_flags); out_unlock_its: spin_unlock(>its_lock); @@ -517,7 +520,7 @@ static int its_handle_invall(struct virt_its *its, uint64_t *cmdptr) struct pending_irq *pirqs[16]; uint64_t vlpi = 0; /* 64-bit to catch overflows */ unsigned int nr_lpis, i; -unsigned long flags; +unsigned long flags, vcpu_flags; int ret = 0; /* @@ -542,7 +545,7 @@ static int its_handle_invall(struct virt_its *its, uint64_t *cmdptr) vcpu = get_vcpu_from_collection(its, collid); spin_unlock(>its_lock); -spin_lock_irqsave(>arch.vgic.lock, flags); +spin_lock_irqsave(>arch.vgic.lock, vcpu_flags); read_lock(>d->arch.vgic.pend_lpi_tree_lock); do @@ -555,9 +558,13 @@ static int its_handle_invall(struct virt_its *its, uint64_t *cmdptr) for ( i = 0; i < nr_lpis; i++ ) { +vgic_irq_lock(pirqs[i], flags); /* We only care about LPIs on our VCPU. */ if ( pirqs[i]->lpi_vcpu_id != vcpu->vcpu_id ) +{ +vgic_irq_unlock(pirqs[i], flags); continue; +} vlpi = pirqs[i]->irq; /* If that fails for a single LPI, carry on to handle the rest. */ @@ -566,6 +573,8 @@ static int its_handle_invall(struct virt_its *its, uint64_t *cmdptr) update_lpi_vgic_status(vcpu, pirqs[i]); else ret = err; + +vgic_irq_unlock(pirqs[i], flags); } /* * Loop over the next gang of pending_irqs until we reached the end of @@ -576,7 +585,7 @@ static int its_handle_invall(struct virt_its *its, uint64_t *cmdptr) (nr_lpis == ARRAY_SIZE(pirqs)) ); read_unlock(>d->arch.vgic.pend_lpi_tree_lock); -spin_unlock_irqrestore(>arch.vgic.lock, flags); +spin_unlock_irqrestore(>arch.vgic.lock, vcpu_flags); return ret; } @@ -712,6 +721,7 @@ static int its_handle_mapti(struct virt_its *its, uint64_t *cmdptr) uint32_t intid = its_cmd_get_physical_id(cmdptr), _intid; uint16_t collid = its_cmd_get_collection(cmdptr); struct pending_irq *pirq; +unsigned long flags; struct vcpu *vcpu = NULL; int ret = -1; @@ -765,7 +775,9 @@ static int its_handle_mapti(struct virt_its *its, uint64_t *cmdptr) * We don't need the VGIC VCPU lock here, because the pending_irq isn't * in the radix tree yet. */ +vgic_irq_lock(pirq, flags); ret = update_lpi_property(its->d, pirq); +vgic_irq_unlock(pirq, flags); if ( ret ) goto out_remove_host_entry; -- 2.9.0 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [RFC PATCH v2 14/22] ARM: vGIC: move virtual IRQ configuration from rank to pending_irq
The IRQ configuration (level or edge triggered) for a group of IRQs are still stored in the irq_rank structure. Introduce a new bit called GIC_IRQ_GUEST_LEVEL in the "status" field, which holds that information. Remove the storage from the irq_rank and use the existing wrappers to store and retrieve the configuration bit for multiple IRQs. Signed-off-by: Andre Przywara--- xen/arch/arm/vgic-v2.c | 21 +++- xen/arch/arm/vgic-v3.c | 25 -- xen/arch/arm/vgic.c| 81 +- xen/include/asm-arm/vgic.h | 5 ++- 4 files changed, 73 insertions(+), 59 deletions(-) diff --git a/xen/arch/arm/vgic-v2.c b/xen/arch/arm/vgic-v2.c index a3fd500..0c8a598 100644 --- a/xen/arch/arm/vgic-v2.c +++ b/xen/arch/arm/vgic-v2.c @@ -278,20 +278,12 @@ static int vgic_v2_distr_mmio_read(struct vcpu *v, mmio_info_t *info, goto read_reserved; case VRANGE32(GICD_ICFGR, GICD_ICFGRN): -{ -uint32_t icfgr; - if ( dabt.size != DABT_WORD ) goto bad_width; -rank = vgic_rank_offset(v, 2, gicd_reg - GICD_ICFGR, DABT_WORD); -if ( rank == NULL) goto read_as_zero; -vgic_lock_rank(v, rank, flags); -icfgr = rank->icfg[REG_RANK_INDEX(2, gicd_reg - GICD_ICFGR, DABT_WORD)]; -vgic_unlock_rank(v, rank, flags); -*r = vreg_reg32_extract(icfgr, info); +irq = (gicd_reg - GICD_ICFGR) * 4; +*r = vgic_fetch_irq_config(v, irq); return 1; -} case VRANGE32(0xD00, 0xDFC): goto read_impl_defined; @@ -529,13 +521,8 @@ static int vgic_v2_distr_mmio_write(struct vcpu *v, mmio_info_t *info, case VRANGE32(GICD_ICFGR2, GICD_ICFGRN): /* SPIs */ if ( dabt.size != DABT_WORD ) goto bad_width; -rank = vgic_rank_offset(v, 2, gicd_reg - GICD_ICFGR, DABT_WORD); -if ( rank == NULL) goto write_ignore; -vgic_lock_rank(v, rank, flags); -vreg_reg32_update(>icfg[REG_RANK_INDEX(2, gicd_reg - GICD_ICFGR, - DABT_WORD)], - r, info); -vgic_unlock_rank(v, rank, flags); +irq = (gicd_reg - GICD_ICFGR) * 4; /* 2 bit per IRQ */ +vgic_store_irq_config(v, irq, r); return 1; case VRANGE32(0xD00, 0xDFC): diff --git a/xen/arch/arm/vgic-v3.c b/xen/arch/arm/vgic-v3.c index d3356ae..e9e36eb 100644 --- a/xen/arch/arm/vgic-v3.c +++ b/xen/arch/arm/vgic-v3.c @@ -722,20 +722,11 @@ static int __vgic_v3_distr_common_mmio_read(const char *name, struct vcpu *v, return 1; case VRANGE32(GICD_ICFGR, GICD_ICFGRN): -{ -uint32_t icfgr; - if ( dabt.size != DABT_WORD ) goto bad_width; -rank = vgic_rank_offset(v, 2, reg - GICD_ICFGR, DABT_WORD); -if ( rank == NULL ) goto read_as_zero; -vgic_lock_rank(v, rank, flags); -icfgr = rank->icfg[REG_RANK_INDEX(2, reg - GICD_ICFGR, DABT_WORD)]; -vgic_unlock_rank(v, rank, flags); - -*r = vreg_reg32_extract(icfgr, info); - +irq = (reg - GICD_ICFGR) * 4; +if ( irq >= v->domain->arch.vgic.nr_spis + 32 ) goto read_as_zero; +*r = vgic_fetch_irq_config(v, irq); return 1; -} default: printk(XENLOG_G_ERR @@ -834,13 +825,9 @@ static int __vgic_v3_distr_common_mmio_write(const char *name, struct vcpu *v, /* ICFGR1 for PPI's, which is implementation defined if ICFGR1 is programmable or not. We chose to program */ if ( dabt.size != DABT_WORD ) goto bad_width; -rank = vgic_rank_offset(v, 2, reg - GICD_ICFGR, DABT_WORD); -if ( rank == NULL ) goto write_ignore; -vgic_lock_rank(v, rank, flags); -vreg_reg32_update(>icfg[REG_RANK_INDEX(2, reg - GICD_ICFGR, - DABT_WORD)], - r, info); -vgic_unlock_rank(v, rank, flags); +irq = (reg - GICD_ICFGR) * 4; +if ( irq >= v->domain->arch.vgic.nr_spis + 32 ) goto write_ignore; +vgic_store_irq_config(v, irq, r); return 1; default: diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c index ddcd99b..e5a4765 100644 --- a/xen/arch/arm/vgic.c +++ b/xen/arch/arm/vgic.c @@ -268,6 +268,55 @@ void vgic_store_irq_priority(struct vcpu *v, unsigned int nrirqs, local_irq_restore(flags); } +#define IRQS_PER_CFGR 16 +/** + * vgic_fetch_irq_config: assemble the configuration bits for a group of 16 IRQs + * @v: the VCPU for private IRQs, any VCPU of a domain for SPIs + * @first_irq: the first IRQ to be queried, must be aligned to 16 + */ +uint32_t vgic_fetch_irq_config(struct vcpu *v, unsigned int first_irq) +{ +struct pending_irq *pirqs[IRQS_PER_CFGR]; +unsigned long flags; +uint32_t ret = 0, i; + +local_irq_save(flags); +vgic_lock_irqs(v, IRQS_PER_CFGR, first_irq, pirqs); + +for ( i = 0; i < IRQS_PER_CFGR; i++ ) +if
[Xen-devel] [RFC PATCH v2 03/22] ARM: vGIC: move gic_raise_inflight_irq() into vgic_vcpu_inject_irq()
Currently there is a gic_raise_inflight_irq(), which serves the very special purpose of handling a newly injected interrupt while an older one is still handled. This has only one user, in vgic_vcpu_inject_irq(). Now with the introduction of the pending_irq lock this will later on result in a nasty deadlock, which can only be solved properly by actually embedding the function into the caller (and dropping the lock later in-between). This has the admittedly hideous consequence of needing to export gic_update_one_lr(), but this will go away in a later stage of a rework. In this respect this patch is more a temporary kludge. Signed-off-by: Andre Przywara--- xen/arch/arm/gic.c| 30 +- xen/arch/arm/vgic.c | 11 ++- xen/include/asm-arm/gic.h | 2 +- 3 files changed, 12 insertions(+), 31 deletions(-) diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c index 2c99d71..5bd66a2 100644 --- a/xen/arch/arm/gic.c +++ b/xen/arch/arm/gic.c @@ -44,8 +44,6 @@ static DEFINE_PER_CPU(uint64_t, lr_mask); #undef GIC_DEBUG -static void gic_update_one_lr(struct vcpu *v, int i); - static const struct gic_hw_operations *gic_hw_ops; void register_gic_ops(const struct gic_hw_operations *ops) @@ -416,32 +414,6 @@ void gic_remove_irq_from_queues(struct vcpu *v, struct pending_irq *p) gic_remove_from_lr_pending(v, p); } -void gic_raise_inflight_irq(struct vcpu *v, unsigned int virtual_irq) -{ -struct pending_irq *n = irq_to_pending(v, virtual_irq); - -/* If an LPI has been removed meanwhile, there is nothing left to raise. */ -if ( unlikely(!n) ) -return; - -ASSERT(spin_is_locked(>arch.vgic.lock)); - -/* Don't try to update the LR if the interrupt is disabled */ -if ( !test_bit(GIC_IRQ_GUEST_ENABLED, >status) ) -return; - -if ( list_empty(>lr_queue) ) -{ -if ( v == current ) -gic_update_one_lr(v, n->lr); -} -#ifdef GIC_DEBUG -else -gdprintk(XENLOG_DEBUG, "trying to inject irq=%u into d%dv%d, when it is still lr_pending\n", - virtual_irq, v->domain->domain_id, v->vcpu_id); -#endif -} - /* * Find an unused LR to insert an IRQ into, starting with the LR given * by @lr. If this new interrupt is a PRISTINE LPI, scan the other LRs to @@ -503,7 +475,7 @@ void gic_raise_guest_irq(struct vcpu *v, unsigned int virtual_irq, gic_add_to_lr_pending(v, p); } -static void gic_update_one_lr(struct vcpu *v, int i) +void gic_update_one_lr(struct vcpu *v, int i) { struct pending_irq *p; int irq; diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c index 38dacd3..7b122cd 100644 --- a/xen/arch/arm/vgic.c +++ b/xen/arch/arm/vgic.c @@ -536,7 +536,16 @@ void vgic_vcpu_inject_irq(struct vcpu *v, unsigned int virq) if ( !list_empty(>inflight) ) { -gic_raise_inflight_irq(v, virq); +bool update = test_bit(GIC_IRQ_GUEST_ENABLED, >status) && + list_empty(>lr_queue) && (v == current); + +if ( update ) +gic_update_one_lr(v, n->lr); +#ifdef GIC_DEBUG +else +gdprintk(XENLOG_DEBUG, "trying to inject irq=%u into d%dv%d, when it is still lr_pending\n", + n->irq, v->domain->domain_id, v->vcpu_id); +#endif goto out; } diff --git a/xen/include/asm-arm/gic.h b/xen/include/asm-arm/gic.h index 6203dc5..cf8b8fb 100644 --- a/xen/include/asm-arm/gic.h +++ b/xen/include/asm-arm/gic.h @@ -237,12 +237,12 @@ int gic_remove_irq_from_guest(struct domain *d, unsigned int virq, extern void gic_inject(void); extern void gic_clear_pending_irqs(struct vcpu *v); +extern void gic_update_one_lr(struct vcpu *v, int lr); extern int gic_events_need_delivery(void); extern void init_maintenance_interrupt(void); extern void gic_raise_guest_irq(struct vcpu *v, unsigned int irq, unsigned int priority); -extern void gic_raise_inflight_irq(struct vcpu *v, unsigned int virtual_irq); extern void gic_remove_from_lr_pending(struct vcpu *v, struct pending_irq *p); extern void gic_remove_irq_from_queues(struct vcpu *v, struct pending_irq *p); -- 2.9.0 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [RFC PATCH v2 04/22] ARM: vGIC: rename pending_irq->priority to cur_priority
In preparation for storing the virtual interrupt priority in the struct pending_irq, rename the existing "priority" member to "cur_priority". This is to signify that this is the current priority of an interrupt which has been injected to a VCPU. Once this happened, its priority must stay fixed at this value, subsequenct MMIO accesses to change the priority can only affect newly triggered interrupts. Also since the priority is a sorting criteria for the inflight list, it must not change when it's on a VCPUs list. Signed-off-by: Andre Przywara--- xen/arch/arm/gic-v2.c | 2 +- xen/arch/arm/gic-v3.c | 2 +- xen/arch/arm/gic.c | 10 +- xen/arch/arm/vgic.c| 6 +++--- xen/include/asm-arm/vgic.h | 2 +- 5 files changed, 11 insertions(+), 11 deletions(-) diff --git a/xen/arch/arm/gic-v2.c b/xen/arch/arm/gic-v2.c index cbe71a9..735e23d 100644 --- a/xen/arch/arm/gic-v2.c +++ b/xen/arch/arm/gic-v2.c @@ -437,7 +437,7 @@ static void gicv2_update_lr(int lr, const struct pending_irq *p, BUG_ON(lr < 0); lr_reg = (((state & GICH_V2_LR_STATE_MASK) << GICH_V2_LR_STATE_SHIFT) | - ((GIC_PRI_TO_GUEST(p->priority) & GICH_V2_LR_PRIORITY_MASK) + ((GIC_PRI_TO_GUEST(p->cur_priority) & GICH_V2_LR_PRIORITY_MASK) << GICH_V2_LR_PRIORITY_SHIFT) | ((p->irq & GICH_V2_LR_VIRTUAL_MASK) << GICH_V2_LR_VIRTUAL_SHIFT)); diff --git a/xen/arch/arm/gic-v3.c b/xen/arch/arm/gic-v3.c index f990eae..449bd55 100644 --- a/xen/arch/arm/gic-v3.c +++ b/xen/arch/arm/gic-v3.c @@ -961,7 +961,7 @@ static void gicv3_update_lr(int lr, const struct pending_irq *p, if ( current->domain->arch.vgic.version == GIC_V3 ) val |= GICH_LR_GRP1; -val |= ((uint64_t)p->priority & 0xff) << GICH_LR_PRIORITY_SHIFT; +val |= ((uint64_t)p->cur_priority & 0xff) << GICH_LR_PRIORITY_SHIFT; val |= ((uint64_t)p->irq & GICH_LR_VIRTUAL_MASK) << GICH_LR_VIRTUAL_SHIFT; if ( p->desc != NULL ) diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c index 5bd66a2..8dec736 100644 --- a/xen/arch/arm/gic.c +++ b/xen/arch/arm/gic.c @@ -389,7 +389,7 @@ static inline void gic_add_to_lr_pending(struct vcpu *v, struct pending_irq *n) list_for_each_entry ( iter, >arch.vgic.lr_pending, lr_queue ) { -if ( iter->priority > n->priority ) +if ( iter->cur_priority > n->cur_priority ) { list_add_tail(>lr_queue, >lr_queue); return; @@ -542,7 +542,7 @@ void gic_update_one_lr(struct vcpu *v, int i) if ( test_bit(GIC_IRQ_GUEST_ENABLED, >status) && test_bit(GIC_IRQ_GUEST_QUEUED, >status) && !test_bit(GIC_IRQ_GUEST_MIGRATING, >status) ) -gic_raise_guest_irq(v, irq, p->priority); +gic_raise_guest_irq(v, irq, p->cur_priority); else { list_del_init(>inflight); /* @@ -610,7 +610,7 @@ static void gic_restore_pending_irqs(struct vcpu *v) /* No more free LRs: find a lower priority irq to evict */ list_for_each_entry_reverse( p_r, inflight_r, inflight ) { -if ( p_r->priority == p->priority ) +if ( p_r->cur_priority == p->cur_priority ) goto out; if ( test_bit(GIC_IRQ_GUEST_VISIBLE, _r->status) && !test_bit(GIC_IRQ_GUEST_ACTIVE, _r->status) ) @@ -676,9 +676,9 @@ int gic_events_need_delivery(void) * ordered by priority */ list_for_each_entry( p, >arch.vgic.inflight_irqs, inflight ) { -if ( GIC_PRI_TO_GUEST(p->priority) >= mask_priority ) +if ( GIC_PRI_TO_GUEST(p->cur_priority) >= mask_priority ) goto out; -if ( GIC_PRI_TO_GUEST(p->priority) >= active_priority ) +if ( GIC_PRI_TO_GUEST(p->cur_priority) >= active_priority ) goto out; if ( test_bit(GIC_IRQ_GUEST_ENABLED, >status) ) { diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c index 7b122cd..21b545e 100644 --- a/xen/arch/arm/vgic.c +++ b/xen/arch/arm/vgic.c @@ -395,7 +395,7 @@ void vgic_enable_irqs(struct vcpu *v, uint32_t r, int n) p = irq_to_pending(v_target, irq); set_bit(GIC_IRQ_GUEST_ENABLED, >status); if ( !list_empty(>inflight) && !test_bit(GIC_IRQ_GUEST_VISIBLE, >status) ) -gic_raise_guest_irq(v_target, irq, p->priority); +gic_raise_guest_irq(v_target, irq, p->cur_priority); spin_unlock_irqrestore(_target->arch.vgic.lock, flags); if ( p->desc != NULL ) { @@ -550,7 +550,7 @@ void vgic_vcpu_inject_irq(struct vcpu *v, unsigned int virq) } priority = vgic_get_virq_priority(v, virq); -n->priority = priority; +n->cur_priority = priority; /* the irq is enabled */ if ( test_bit(GIC_IRQ_GUEST_ENABLED, >status) ) @@ -558,7 +558,7 @@ void vgic_vcpu_inject_irq(struct vcpu *v, unsigned
[Xen-devel] [RFC PATCH v2 12/22] ARM: vGIC: protect gic_update_one_lr() with pending_irq lock
When we return from a domain with the active bit set in an LR, we update our pending_irq accordingly. This touches multiple status bits, so requires the pending_irq lock. Signed-off-by: Andre Przywara--- xen/arch/arm/gic.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c index 9637682..84b282b 100644 --- a/xen/arch/arm/gic.c +++ b/xen/arch/arm/gic.c @@ -508,6 +508,7 @@ void gic_update_one_lr(struct vcpu *v, int i) if ( lr_val.state & GICH_LR_ACTIVE ) { +vgic_irq_lock(p, flags); set_bit(GIC_IRQ_GUEST_ACTIVE, >status); if ( test_bit(GIC_IRQ_GUEST_ENABLED, >status) && test_and_clear_bit(GIC_IRQ_GUEST_QUEUED, >status) ) @@ -521,6 +522,7 @@ void gic_update_one_lr(struct vcpu *v, int i) gdprintk(XENLOG_WARNING, "unable to inject hw irq=%d into d%dv%d: already active in LR%d\n", irq, v->domain->domain_id, v->vcpu_id, i); } +vgic_irq_unlock(p, flags); } else if ( lr_val.state & GICH_LR_PENDING ) { -- 2.9.0 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [RFC PATCH v2 11/22] ARM: vGIC: protect gic_events_need_delivery() with pending_irq lock
gic_events_need_delivery() reads the cur_priority field twice, also relies on the consistency of status bits. So it should take pending_irq lock. Signed-off-by: Andre Przywara--- xen/arch/arm/gic.c | 24 +--- 1 file changed, 13 insertions(+), 11 deletions(-) diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c index df89530..9637682 100644 --- a/xen/arch/arm/gic.c +++ b/xen/arch/arm/gic.c @@ -666,7 +666,7 @@ int gic_events_need_delivery(void) { struct vcpu *v = current; struct pending_irq *p; -unsigned long flags; +unsigned long flags, vcpu_flags; const unsigned long apr = gic_hw_ops->read_apr(0); int mask_priority; int active_priority; @@ -675,7 +675,7 @@ int gic_events_need_delivery(void) mask_priority = gic_hw_ops->read_vmcr_priority(); active_priority = find_next_bit(, 32, 0); -spin_lock_irqsave(>arch.vgic.lock, flags); +spin_lock_irqsave(>arch.vgic.lock, vcpu_flags); /* TODO: We order the guest irqs by priority, but we don't change * the priority of host irqs. */ @@ -684,19 +684,21 @@ int gic_events_need_delivery(void) * ordered by priority */ list_for_each_entry( p, >arch.vgic.inflight_irqs, inflight ) { -if ( GIC_PRI_TO_GUEST(p->cur_priority) >= mask_priority ) -goto out; -if ( GIC_PRI_TO_GUEST(p->cur_priority) >= active_priority ) -goto out; -if ( test_bit(GIC_IRQ_GUEST_ENABLED, >status) ) +vgic_irq_lock(p, flags); +if ( GIC_PRI_TO_GUEST(p->cur_priority) < mask_priority && + GIC_PRI_TO_GUEST(p->cur_priority) < active_priority && + !test_bit(GIC_IRQ_GUEST_ENABLED, >status) ) { -rc = 1; -goto out; +vgic_irq_unlock(p, flags); +continue; } + +rc = test_bit(GIC_IRQ_GUEST_ENABLED, >status); +vgic_irq_unlock(p, flags); +break; } -out: -spin_unlock_irqrestore(>arch.vgic.lock, flags); +spin_unlock_irqrestore(>arch.vgic.lock, vcpu_flags); return rc; } -- 2.9.0 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [RFC PATCH v2 20/22] ARM: vGIC: move virtual IRQ enable bit from rank to pending_irq
The enabled bits for a group of IRQs are still stored in the irq_rank structure, although we already have the same information in pending_irq, in the GIC_IRQ_GUEST_ENABLED bit of the "status" field. Remove the storage from the irq_rank and just utilize the existing wrappers to cover enabling/disabling of multiple IRQs. This also marks the removal of the last member of struct vgic_irq_rank. Signed-off-by: Andre Przywara--- xen/arch/arm/vgic-v2.c | 41 +++-- xen/arch/arm/vgic-v3.c | 41 +++-- xen/arch/arm/vgic.c| 201 +++-- xen/include/asm-arm/vgic.h | 10 +-- 4 files changed, 152 insertions(+), 141 deletions(-) diff --git a/xen/arch/arm/vgic-v2.c b/xen/arch/arm/vgic-v2.c index c7ed3ce..3320642 100644 --- a/xen/arch/arm/vgic-v2.c +++ b/xen/arch/arm/vgic-v2.c @@ -166,9 +166,7 @@ static int vgic_v2_distr_mmio_read(struct vcpu *v, mmio_info_t *info, register_t *r, void *priv) { struct hsr_dabt dabt = info->dabt; -struct vgic_irq_rank *rank; int gicd_reg = (int)(info->gpa - v->domain->arch.vgic.dbase); -unsigned long flags; unsigned int irq; perfc_incr(vgicd_reads); @@ -222,20 +220,16 @@ static int vgic_v2_distr_mmio_read(struct vcpu *v, mmio_info_t *info, case VRANGE32(GICD_ISENABLER, GICD_ISENABLERN): if ( dabt.size != DABT_WORD ) goto bad_width; -rank = vgic_rank_offset(v, 1, gicd_reg - GICD_ISENABLER, DABT_WORD); -if ( rank == NULL) goto read_as_zero; -vgic_lock_rank(v, rank, flags); -*r = vreg_reg32_extract(rank->ienable, info); -vgic_unlock_rank(v, rank, flags); +irq = (gicd_reg - GICD_ISENABLER) * 8; +if ( irq >= v->domain->arch.vgic.nr_spis + 32 ) goto read_as_zero; +*r = vgic_fetch_irq_enabled(v, irq); return 1; case VRANGE32(GICD_ICENABLER, GICD_ICENABLERN): if ( dabt.size != DABT_WORD ) goto bad_width; -rank = vgic_rank_offset(v, 1, gicd_reg - GICD_ICENABLER, DABT_WORD); -if ( rank == NULL) goto read_as_zero; -vgic_lock_rank(v, rank, flags); -*r = vreg_reg32_extract(rank->ienable, info); -vgic_unlock_rank(v, rank, flags); +irq = (gicd_reg - GICD_ICENABLER) * 8; +if ( irq >= v->domain->arch.vgic.nr_spis + 32 ) goto read_as_zero; +*r = vgic_fetch_irq_enabled(v, irq); return 1; /* Read the pending status of an IRQ via GICD is not supported */ @@ -386,10 +380,7 @@ static int vgic_v2_distr_mmio_write(struct vcpu *v, mmio_info_t *info, register_t r, void *priv) { struct hsr_dabt dabt = info->dabt; -struct vgic_irq_rank *rank; int gicd_reg = (int)(info->gpa - v->domain->arch.vgic.dbase); -uint32_t tr; -unsigned long flags; unsigned int irq; perfc_incr(vgicd_writes); @@ -426,24 +417,16 @@ static int vgic_v2_distr_mmio_write(struct vcpu *v, mmio_info_t *info, case VRANGE32(GICD_ISENABLER, GICD_ISENABLERN): if ( dabt.size != DABT_WORD ) goto bad_width; -rank = vgic_rank_offset(v, 1, gicd_reg - GICD_ISENABLER, DABT_WORD); -if ( rank == NULL) goto write_ignore; -vgic_lock_rank(v, rank, flags); -tr = rank->ienable; -vreg_reg32_setbits(>ienable, r, info); -vgic_enable_irqs(v, (rank->ienable) & (~tr), rank->index); -vgic_unlock_rank(v, rank, flags); +irq = (gicd_reg - GICD_ISENABLER) * 8; +if ( irq >= v->domain->arch.vgic.nr_spis + 32 ) goto write_ignore; +vgic_store_irq_enable(v, irq, r); return 1; case VRANGE32(GICD_ICENABLER, GICD_ICENABLERN): if ( dabt.size != DABT_WORD ) goto bad_width; -rank = vgic_rank_offset(v, 1, gicd_reg - GICD_ICENABLER, DABT_WORD); -if ( rank == NULL) goto write_ignore; -vgic_lock_rank(v, rank, flags); -tr = rank->ienable; -vreg_reg32_clearbits(>ienable, r, info); -vgic_disable_irqs(v, (~rank->ienable) & tr, rank->index); -vgic_unlock_rank(v, rank, flags); +irq = (gicd_reg - GICD_ICENABLER) * 8; +if ( irq >= v->domain->arch.vgic.nr_spis + 32 ) goto write_ignore; +vgic_store_irq_disable(v, irq, r); return 1; case VRANGE32(GICD_ISPENDR, GICD_ISPENDRN): diff --git a/xen/arch/arm/vgic-v3.c b/xen/arch/arm/vgic-v3.c index e9d46af..00cc1e5 100644 --- a/xen/arch/arm/vgic-v3.c +++ b/xen/arch/arm/vgic-v3.c @@ -676,8 +676,6 @@ static int __vgic_v3_distr_common_mmio_read(const char *name, struct vcpu *v, register_t *r) { struct hsr_dabt dabt = info->dabt; -struct vgic_irq_rank *rank; -unsigned long flags; unsigned int irq; switch ( reg ) @@ -689,20 +687,16 @@ static int __vgic_v3_distr_common_mmio_read(const char *name, struct vcpu *v, case VRANGE32(GICD_ISENABLER, GICD_ISENABLERN):
[Xen-devel] [RFC PATCH v2 19/22] ARM: vGIC: rework vgic_get_target_vcpu to take a domain instead of vcpu
For "historical" reasons we used to pass a vCPU pointer to vgic_get_target_vcpu(), which was only considered to distinguish private IRQs. Now since we have the unique pending_irq pointer already, we don't need the vCPU anymore, but just the domain. So change this function to avoid a rather hackish "d->vcpu[0]" parameter when looking up SPIs, also allows our new vgic_lock_vcpu_irq() function to eventually take a domain parameter (which makes more sense). Signed-off-by: Andre Przywara--- xen/arch/arm/gic.c | 2 +- xen/arch/arm/vgic.c| 22 +++--- xen/include/asm-arm/vgic.h | 3 ++- 3 files changed, 14 insertions(+), 13 deletions(-) diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c index 38e998a..300ce6c 100644 --- a/xen/arch/arm/gic.c +++ b/xen/arch/arm/gic.c @@ -559,7 +559,7 @@ void gic_update_one_lr(struct vcpu *v, int i) smp_wmb(); if ( test_bit(GIC_IRQ_GUEST_MIGRATING, >status) ) { -struct vcpu *v_target = vgic_get_target_vcpu(v, p); +struct vcpu *v_target = vgic_get_target_vcpu(v->domain, p); irq_set_affinity(p->desc, cpumask_of(v_target->processor)); clear_bit(GIC_IRQ_GUEST_MIGRATING, >status); } diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c index f6532ee..a49fcde 100644 --- a/xen/arch/arm/vgic.c +++ b/xen/arch/arm/vgic.c @@ -217,7 +217,7 @@ int vcpu_vgic_free(struct vcpu *v) /** * vgic_lock_vcpu_irq(): lock both the pending_irq and the corresponding VCPU * - * @v: the VCPU (for private IRQs) + * @d: the domain the IRQ belongs to * @p: pointer to the locked struct pending_irq * @flags: pointer to the IRQ flags used when locking the VCPU * @@ -227,14 +227,14 @@ int vcpu_vgic_free(struct vcpu *v) * * Returns: pointer to the VCPU this IRQ is targeting. */ -struct vcpu *vgic_lock_vcpu_irq(struct vcpu *v, struct pending_irq *p, +struct vcpu *vgic_lock_vcpu_irq(struct domain *d, struct pending_irq *p, unsigned long *flags) { struct vcpu *target_vcpu; ASSERT(spin_is_locked(>lock)); -target_vcpu = vgic_get_target_vcpu(v, p); +target_vcpu = vgic_get_target_vcpu(d, p); spin_unlock(>lock); do @@ -244,7 +244,7 @@ struct vcpu *vgic_lock_vcpu_irq(struct vcpu *v, struct pending_irq *p, spin_lock_irqsave(_vcpu->arch.vgic.lock, *flags); spin_lock(>lock); -current_vcpu = vgic_get_target_vcpu(v, p); +current_vcpu = vgic_get_target_vcpu(d, p); if ( target_vcpu->vcpu_id == current_vcpu->vcpu_id ) return target_vcpu; @@ -256,9 +256,9 @@ struct vcpu *vgic_lock_vcpu_irq(struct vcpu *v, struct pending_irq *p, } while (1); } -struct vcpu *vgic_get_target_vcpu(struct vcpu *v, struct pending_irq *p) +struct vcpu *vgic_get_target_vcpu(struct domain *d, struct pending_irq *p) { -return v->domain->vcpu[p->vcpu_id]; +return d->vcpu[p->vcpu_id]; } #define MAX_IRQS_PER_IPRIORITYR 4 @@ -386,7 +386,7 @@ bool vgic_migrate_irq(struct pending_irq *p, unsigned long *flags, /* If the IRQ is still lr_pending, re-inject it to the new vcpu */ if ( !list_empty(>lr_queue) ) { -old = vgic_lock_vcpu_irq(new, p, _flags); +old = vgic_lock_vcpu_irq(new->domain, p, _flags); gic_remove_irq_from_queues(old, p); irq_set_affinity(p->desc, cpumask_of(new->processor)); @@ -430,7 +430,7 @@ void arch_move_irqs(struct vcpu *v) for ( i = 32; i < vgic_num_irqs(d); i++ ) { p = irq_to_pending(v, i); -v_target = vgic_get_target_vcpu(v, p); +v_target = vgic_get_target_vcpu(d, p); if ( v_target == v && !test_bit(GIC_IRQ_GUEST_MIGRATING, >status) ) irq_set_affinity(p->desc, cpu_mask); @@ -453,7 +453,7 @@ void vgic_disable_irqs(struct vcpu *v, uint32_t r, int n) while ( (i = find_next_bit(, 32, i)) < 32 ) { irq = i + (32 * n); p = irq_to_pending(v, irq); -v_target = vgic_get_target_vcpu(v, p); +v_target = vgic_get_target_vcpu(v->domain, p); spin_lock_irqsave(_target->arch.vgic.lock, flags); clear_bit(GIC_IRQ_GUEST_ENABLED, >status); @@ -507,7 +507,7 @@ void vgic_enable_irqs(struct vcpu *v, uint32_t r, int n) while ( (i = find_next_bit(, 32, i)) < 32 ) { irq = i + (32 * n); p = irq_to_pending(v, irq); -v_target = vgic_get_target_vcpu(v, p); +v_target = vgic_get_target_vcpu(v->domain, p); spin_lock_irqsave(_target->arch.vgic.lock, vcpu_flags); vgic_irq_lock(p, flags); set_bit(GIC_IRQ_GUEST_ENABLED, >status); @@ -710,7 +710,7 @@ void vgic_vcpu_inject_spi(struct domain *d, unsigned int virq) /* the IRQ needs to be an SPI */ ASSERT(virq >= 32 && virq <= vgic_num_irqs(d)); -v = vgic_get_target_vcpu(d->vcpu[0], p); +v = vgic_get_target_vcpu(d, p);
[Xen-devel] [RFC PATCH v2 08/22] ARM: vGIC: move virtual IRQ priority from rank to pending_irq
So far a virtual interrupt's priority is stored in the irq_rank structure, which covers multiple IRQs and has a single lock for this group. Generalize the already existing priority variable in struct pending_irq to not only cover LPIs, but every IRQ. Access to this value is protected by the per-IRQ lock. Signed-off-by: Andre Przywara--- xen/arch/arm/vgic-v2.c | 34 ++ xen/arch/arm/vgic-v3.c | 36 xen/arch/arm/vgic.c| 41 + xen/include/asm-arm/vgic.h | 10 -- 4 files changed, 31 insertions(+), 90 deletions(-) diff --git a/xen/arch/arm/vgic-v2.c b/xen/arch/arm/vgic-v2.c index cf4ab89..ed7ff3b 100644 --- a/xen/arch/arm/vgic-v2.c +++ b/xen/arch/arm/vgic-v2.c @@ -171,6 +171,7 @@ static int vgic_v2_distr_mmio_read(struct vcpu *v, mmio_info_t *info, struct vgic_irq_rank *rank; int gicd_reg = (int)(info->gpa - v->domain->arch.vgic.dbase); unsigned long flags; +unsigned int irq; perfc_incr(vgicd_reads); @@ -250,22 +251,10 @@ static int vgic_v2_distr_mmio_read(struct vcpu *v, mmio_info_t *info, goto read_as_zero; case VRANGE32(GICD_IPRIORITYR, GICD_IPRIORITYRN): -{ -uint32_t ipriorityr; -uint8_t rank_index; - if ( dabt.size != DABT_BYTE && dabt.size != DABT_WORD ) goto bad_width; -rank = vgic_rank_offset(v, 8, gicd_reg - GICD_IPRIORITYR, DABT_WORD); -if ( rank == NULL ) goto read_as_zero; -rank_index = REG_RANK_INDEX(8, gicd_reg - GICD_IPRIORITYR, DABT_WORD); - -vgic_lock_rank(v, rank, flags); -ipriorityr = ACCESS_ONCE(rank->ipriorityr[rank_index]); -vgic_unlock_rank(v, rank, flags); -*r = vreg_reg32_extract(ipriorityr, info); - +irq = gicd_reg - GICD_IPRIORITYR; /* 8 bit per IRQ, so IRQ = offset */ +*r = vgic_fetch_irq_priority(v, irq, (dabt.size == DABT_BYTE) ? 1 : 4); return 1; -} case VREG32(0x7FC): goto read_reserved; @@ -415,6 +404,7 @@ static int vgic_v2_distr_mmio_write(struct vcpu *v, mmio_info_t *info, int gicd_reg = (int)(info->gpa - v->domain->arch.vgic.dbase); uint32_t tr; unsigned long flags; +unsigned int irq; perfc_incr(vgicd_writes); @@ -498,23 +488,11 @@ static int vgic_v2_distr_mmio_write(struct vcpu *v, mmio_info_t *info, goto write_ignore_32; case VRANGE32(GICD_IPRIORITYR, GICD_IPRIORITYRN): -{ -uint32_t *ipriorityr, priority; - if ( dabt.size != DABT_BYTE && dabt.size != DABT_WORD ) goto bad_width; -rank = vgic_rank_offset(v, 8, gicd_reg - GICD_IPRIORITYR, DABT_WORD); -if ( rank == NULL) goto write_ignore; -vgic_lock_rank(v, rank, flags); -ipriorityr = >ipriorityr[REG_RANK_INDEX(8, - gicd_reg - GICD_IPRIORITYR, - DABT_WORD)]; -priority = ACCESS_ONCE(*ipriorityr); -vreg_reg32_update(, r, info); -ACCESS_ONCE(*ipriorityr) = priority; -vgic_unlock_rank(v, rank, flags); +irq = gicd_reg - GICD_IPRIORITYR; /* 8 bit per IRQ, so IRQ = offset */ +vgic_store_irq_priority(v, (dabt.size == DABT_BYTE) ? 1 : 4, irq, r); return 1; -} case VREG32(0x7FC): goto write_reserved; diff --git a/xen/arch/arm/vgic-v3.c b/xen/arch/arm/vgic-v3.c index ad9019e..e58e77e 100644 --- a/xen/arch/arm/vgic-v3.c +++ b/xen/arch/arm/vgic-v3.c @@ -677,6 +677,7 @@ static int __vgic_v3_distr_common_mmio_read(const char *name, struct vcpu *v, struct hsr_dabt dabt = info->dabt; struct vgic_irq_rank *rank; unsigned long flags; +unsigned int irq; switch ( reg ) { @@ -714,23 +715,11 @@ static int __vgic_v3_distr_common_mmio_read(const char *name, struct vcpu *v, goto read_as_zero; case VRANGE32(GICD_IPRIORITYR, GICD_IPRIORITYRN): -{ -uint32_t ipriorityr; -uint8_t rank_index; - if ( dabt.size != DABT_BYTE && dabt.size != DABT_WORD ) goto bad_width; -rank = vgic_rank_offset(v, 8, reg - GICD_IPRIORITYR, DABT_WORD); -if ( rank == NULL ) goto read_as_zero; -rank_index = REG_RANK_INDEX(8, reg - GICD_IPRIORITYR, DABT_WORD); - -vgic_lock_rank(v, rank, flags); -ipriorityr = ACCESS_ONCE(rank->ipriorityr[rank_index]); -vgic_unlock_rank(v, rank, flags); - -*r = vreg_reg32_extract(ipriorityr, info); - +irq = reg - GICD_IPRIORITYR; /* 8 bit per IRQ, so IRQ = offset */ +if ( irq >= v->domain->arch.vgic.nr_spis + 32 ) goto read_as_zero; +*r = vgic_fetch_irq_priority(v, irq, (dabt.size == DABT_BYTE) ? 1 : 4); return 1; -} case VRANGE32(GICD_ICFGR, GICD_ICFGRN): { @@ -774,6 +763,7 @@ static int __vgic_v3_distr_common_mmio_write(const char *name, struct vcpu *v,
[Xen-devel] [RFC PATCH v2 10/22] ARM: vGIC: protect gic_set_lr() with pending_irq lock
When putting a (pending) IRQ into an LR, we should better make sure that no-one changes it behind our back. So make sure we take the pending_irq lock. This bubbles up to all users of gic_add_to_lr_pending() and gic_raise_guest_irq(). Signed-off-by: Andre Przywara--- xen/arch/arm/gic.c | 14 +++--- 1 file changed, 11 insertions(+), 3 deletions(-) diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c index 8dec736..df89530 100644 --- a/xen/arch/arm/gic.c +++ b/xen/arch/arm/gic.c @@ -383,6 +383,7 @@ static inline void gic_add_to_lr_pending(struct vcpu *v, struct pending_irq *n) struct pending_irq *iter; ASSERT(spin_is_locked(>arch.vgic.lock)); +ASSERT(spin_is_locked(>lock)); if ( !list_empty(>lr_queue) ) return; @@ -480,6 +481,7 @@ void gic_update_one_lr(struct vcpu *v, int i) struct pending_irq *p; int irq; struct gic_lr lr_val; +unsigned long flags; ASSERT(spin_is_locked(>arch.vgic.lock)); ASSERT(!local_irq_is_enabled()); @@ -534,6 +536,7 @@ void gic_update_one_lr(struct vcpu *v, int i) gic_hw_ops->clear_lr(i); clear_bit(i, _cpu(lr_mask)); +vgic_irq_lock(p, flags); if ( p->desc != NULL ) clear_bit(_IRQ_INPROGRESS, >desc->status); clear_bit(GIC_IRQ_GUEST_VISIBLE, >status); @@ -559,6 +562,7 @@ void gic_update_one_lr(struct vcpu *v, int i) clear_bit(GIC_IRQ_GUEST_MIGRATING, >status); } } +vgic_irq_unlock(p, flags); } } @@ -592,11 +596,11 @@ static void gic_restore_pending_irqs(struct vcpu *v) int lr = 0; struct pending_irq *p, *t, *p_r; struct list_head *inflight_r; -unsigned long flags; +unsigned long flags, vcpu_flags; unsigned int nr_lrs = gic_hw_ops->info->nr_lrs; int lrs = nr_lrs; -spin_lock_irqsave(>arch.vgic.lock, flags); +spin_lock_irqsave(>arch.vgic.lock, vcpu_flags); if ( list_empty(>arch.vgic.lr_pending) ) goto out; @@ -621,16 +625,20 @@ static void gic_restore_pending_irqs(struct vcpu *v) goto out; found: +vgic_irq_lock(p_r, flags); lr = p_r->lr; p_r->lr = GIC_INVALID_LR; set_bit(GIC_IRQ_GUEST_QUEUED, _r->status); clear_bit(GIC_IRQ_GUEST_VISIBLE, _r->status); gic_add_to_lr_pending(v, p_r); inflight_r = _r->inflight; +vgic_irq_unlock(p_r, flags); } +vgic_irq_lock(p, flags); gic_set_lr(lr, p, GICH_LR_PENDING); list_del_init(>lr_queue); +vgic_irq_unlock(p, flags); set_bit(lr, _cpu(lr_mask)); /* We can only evict nr_lrs entries */ @@ -640,7 +648,7 @@ found: } out: -spin_unlock_irqrestore(>arch.vgic.lock, flags); +spin_unlock_irqrestore(>arch.vgic.lock, vcpu_flags); } void gic_clear_pending_irqs(struct vcpu *v) -- 2.9.0 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [RFC PATCH v2 16/22] ARM: vITS: rename lpi_vcpu_id to vcpu_id
Since we will soon store a virtual IRQ's target VCPU in struct pending_irq, generalise the existing storage for an LPI's target to cover all IRQs. This just renames "lpi_vcpu_id" to "vcpu_id", but doesn't change anything else yet. Signed-off-by: Andre Przywara--- xen/arch/arm/gic-v3-lpi.c | 2 +- xen/arch/arm/vgic-v3-its.c | 7 +++ xen/arch/arm/vgic.c| 6 +++--- xen/include/asm-arm/vgic.h | 2 +- 4 files changed, 8 insertions(+), 9 deletions(-) diff --git a/xen/arch/arm/gic-v3-lpi.c b/xen/arch/arm/gic-v3-lpi.c index c3474f5..2306b58 100644 --- a/xen/arch/arm/gic-v3-lpi.c +++ b/xen/arch/arm/gic-v3-lpi.c @@ -149,7 +149,7 @@ void vgic_vcpu_inject_lpi(struct domain *d, unsigned int virq) if ( !p ) return; -vcpu_id = ACCESS_ONCE(p->lpi_vcpu_id); +vcpu_id = ACCESS_ONCE(p->vcpu_id); if ( vcpu_id >= d->max_vcpus ) return; diff --git a/xen/arch/arm/vgic-v3-its.c b/xen/arch/arm/vgic-v3-its.c index 705708a..682ce10 100644 --- a/xen/arch/arm/vgic-v3-its.c +++ b/xen/arch/arm/vgic-v3-its.c @@ -560,7 +560,7 @@ static int its_handle_invall(struct virt_its *its, uint64_t *cmdptr) { vgic_irq_lock(pirqs[i], flags); /* We only care about LPIs on our VCPU. */ -if ( pirqs[i]->lpi_vcpu_id != vcpu->vcpu_id ) +if ( pirqs[i]->vcpu_id != vcpu->vcpu_id ) { vgic_irq_unlock(pirqs[i], flags); continue; @@ -781,7 +781,7 @@ static int its_handle_mapti(struct virt_its *its, uint64_t *cmdptr) if ( ret ) goto out_remove_host_entry; -pirq->lpi_vcpu_id = vcpu->vcpu_id; +pirq->vcpu_id = vcpu->vcpu_id; /* * Mark this LPI as new, so any older (now unmapped) LPI in any LR * can be easily recognised as such. @@ -852,8 +852,7 @@ static int its_handle_movi(struct virt_its *its, uint64_t *cmdptr) */ spin_lock_irqsave(>arch.vgic.lock, flags); -/* Update our cached vcpu_id in the pending_irq. */ -p->lpi_vcpu_id = nvcpu->vcpu_id; +p->vcpu_id = nvcpu->vcpu_id; spin_unlock_irqrestore(>arch.vgic.lock, flags); diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c index 6722924..1ba0010 100644 --- a/xen/arch/arm/vgic.c +++ b/xen/arch/arm/vgic.c @@ -63,15 +63,15 @@ struct vgic_irq_rank *vgic_rank_irq(struct vcpu *v, unsigned int irq) void vgic_init_pending_irq(struct pending_irq *p, unsigned int virq) { -/* The lpi_vcpu_id field must be big enough to hold a VCPU ID. */ -BUILD_BUG_ON(BIT(sizeof(p->lpi_vcpu_id) * 8) < MAX_VIRT_CPUS); +/* The vcpu_id field must be big enough to hold a VCPU ID. */ +BUILD_BUG_ON(BIT(sizeof(p->vcpu_id) * 8) < MAX_VIRT_CPUS); memset(p, 0, sizeof(*p)); INIT_LIST_HEAD(>inflight); INIT_LIST_HEAD(>lr_queue); spin_lock_init(>lock); p->irq = virq; -p->lpi_vcpu_id = INVALID_VCPU_ID; +p->vcpu_id = INVALID_VCPU_ID; } static void vgic_rank_init(struct vgic_irq_rank *rank, uint8_t index, diff --git a/xen/include/asm-arm/vgic.h b/xen/include/asm-arm/vgic.h index 7c6067d..ffd9a95 100644 --- a/xen/include/asm-arm/vgic.h +++ b/xen/include/asm-arm/vgic.h @@ -81,7 +81,7 @@ struct pending_irq uint8_t lr; uint8_t cur_priority; /* Holds the priority of an injected IRQ. */ uint8_t priority; /* Holds the priority for any new IRQ. */ -uint8_t lpi_vcpu_id;/* The VCPU for an LPI. */ +uint8_t vcpu_id;/* The VCPU target for any new IRQ. */ /* inflight is used to append instances of pending_irq to * vgic.inflight_irqs */ struct list_head inflight; -- 2.9.0 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [RFC PATCH v2 05/22] ARM: vITS: rename pending_irq->lpi_priority to priority
Since we will soon store a virtual IRQ's priority in struct pending_irq, generalise the existing storage for an LPI's priority to cover all IRQs. This just renames "lpi_priority" to "priority", but doesn't change anything else yet. Signed-off-by: Andre Przywara--- xen/arch/arm/vgic-v3-its.c | 4 ++-- xen/arch/arm/vgic-v3.c | 2 +- xen/include/asm-arm/vgic.h | 2 +- 3 files changed, 4 insertions(+), 4 deletions(-) diff --git a/xen/arch/arm/vgic-v3-its.c b/xen/arch/arm/vgic-v3-its.c index 9ef792f..66095d4 100644 --- a/xen/arch/arm/vgic-v3-its.c +++ b/xen/arch/arm/vgic-v3-its.c @@ -419,7 +419,7 @@ static int update_lpi_property(struct domain *d, struct pending_irq *p) if ( ret ) return ret; -write_atomic(>lpi_priority, property & LPI_PROP_PRIO_MASK); +write_atomic(>priority, property & LPI_PROP_PRIO_MASK); if ( property & LPI_PROP_ENABLED ) set_bit(GIC_IRQ_GUEST_ENABLED, >status); @@ -445,7 +445,7 @@ static void update_lpi_vgic_status(struct vcpu *v, struct pending_irq *p) { if ( !list_empty(>inflight) && !test_bit(GIC_IRQ_GUEST_VISIBLE, >status) ) -gic_raise_guest_irq(v, p->irq, p->lpi_priority); +gic_raise_guest_irq(v, p->irq, p->priority); } else gic_remove_from_lr_pending(v, p); diff --git a/xen/arch/arm/vgic-v3.c b/xen/arch/arm/vgic-v3.c index 48c7682..ad9019e 100644 --- a/xen/arch/arm/vgic-v3.c +++ b/xen/arch/arm/vgic-v3.c @@ -1784,7 +1784,7 @@ static int vgic_v3_lpi_get_priority(struct domain *d, uint32_t vlpi) ASSERT(p); -return p->lpi_priority; +return p->priority; } static const struct vgic_ops v3_ops = { diff --git a/xen/include/asm-arm/vgic.h b/xen/include/asm-arm/vgic.h index 0df4ac7..27b5e37 100644 --- a/xen/include/asm-arm/vgic.h +++ b/xen/include/asm-arm/vgic.h @@ -79,7 +79,7 @@ struct pending_irq #define GIC_INVALID_LR (uint8_t)~0 uint8_t lr; uint8_t cur_priority; /* Holds the priority of an injected IRQ. */ -uint8_t lpi_priority; /* Caches the priority if this is an LPI. */ +uint8_t priority; /* Holds the priority for any new IRQ. */ uint8_t lpi_vcpu_id;/* The VCPU for an LPI. */ /* inflight is used to append instances of pending_irq to * vgic.inflight_irqs */ -- 2.9.0 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [RFC PATCH v2 01/22] ARM: vGIC: introduce and initialize pending_irq lock
Currently we protect the pending_irq structure with the corresponding VGIC VCPU lock. There are problems in certain corner cases (for instance if an IRQ is migrating), so let's introduce a per-IRQ lock, which will protect the consistency of this structure independent from any VCPU. For now this just introduces and initializes the lock, also adds wrapper macros to simplify its usage (and help debugging). Signed-off-by: Andre Przywara--- xen/arch/arm/vgic.c| 1 + xen/include/asm-arm/vgic.h | 11 +++ 2 files changed, 12 insertions(+) diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c index 1e5107b..38dacd3 100644 --- a/xen/arch/arm/vgic.c +++ b/xen/arch/arm/vgic.c @@ -69,6 +69,7 @@ void vgic_init_pending_irq(struct pending_irq *p, unsigned int virq) memset(p, 0, sizeof(*p)); INIT_LIST_HEAD(>inflight); INIT_LIST_HEAD(>lr_queue); +spin_lock_init(>lock); p->irq = virq; p->lpi_vcpu_id = INVALID_VCPU_ID; } diff --git a/xen/include/asm-arm/vgic.h b/xen/include/asm-arm/vgic.h index d4ed23d..1c38b9a 100644 --- a/xen/include/asm-arm/vgic.h +++ b/xen/include/asm-arm/vgic.h @@ -90,6 +90,14 @@ struct pending_irq * TODO: when implementing irq migration, taking only the current * vgic lock is not going to be enough. */ struct list_head lr_queue; +/* The lock protects the consistency of this structure. A single status bit + * can be read and/or set without holding the lock using the atomic + * set_bit/clear_bit/test_bit functions, however accessing multiple bits or + * relating to other members in this struct requires the lock. + * The list_head members are protected by their corresponding VCPU lock, + * it is not sufficient to hold this pending_irq lock here to query or + * change list order or affiliation. */ +spinlock_t lock; }; #define NR_INTERRUPT_PER_RANK 32 @@ -156,6 +164,9 @@ struct vgic_ops { #define vgic_lock(v) spin_lock_irq(&(v)->domain->arch.vgic.lock) #define vgic_unlock(v) spin_unlock_irq(&(v)->domain->arch.vgic.lock) +#define vgic_irq_lock(p, flags) spin_lock_irqsave(&(p)->lock, flags) +#define vgic_irq_unlock(p, flags) spin_unlock_irqrestore(&(p)->lock, flags) + #define vgic_lock_rank(v, r, flags) spin_lock_irqsave(&(r)->lock, flags) #define vgic_unlock_rank(v, r, flags) spin_unlock_irqrestore(&(r)->lock, flags) -- 2.9.0 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [RFC PATCH v2 22/22] ARM: vGIC: remove remaining irq_rank code
Now that we no longer need the struct vgic_irq_rank, we can remove the definition and all the helper functions. Signed-off-by: Andre Przywara--- xen/arch/arm/vgic.c | 54 xen/include/asm-arm/domain.h | 6 + xen/include/asm-arm/vgic.h | 48 --- 3 files changed, 1 insertion(+), 107 deletions(-) diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c index dd969e2..8ce3ce5 100644 --- a/xen/arch/arm/vgic.c +++ b/xen/arch/arm/vgic.c @@ -32,35 +32,6 @@ #include #include -static inline struct vgic_irq_rank *vgic_get_rank(struct vcpu *v, int rank) -{ -if ( rank == 0 ) -return v->arch.vgic.private_irqs; -else if ( rank <= DOMAIN_NR_RANKS(v->domain) ) -return >domain->arch.vgic.shared_irqs[rank - 1]; -else -return NULL; -} - -/* - * Returns rank corresponding to a GICD_ register for - * GICD_ with -bits-per-interrupt. - */ -struct vgic_irq_rank *vgic_rank_offset(struct vcpu *v, int b, int n, - int s) -{ -int rank = REG_RANK_NR(b, (n >> s)); - -return vgic_get_rank(v, rank); -} - -struct vgic_irq_rank *vgic_rank_irq(struct vcpu *v, unsigned int irq) -{ -int rank = irq/32; - -return vgic_get_rank(v, rank); -} - void vgic_init_pending_irq(struct pending_irq *p, unsigned int virq, unsigned int vcpu_id) { @@ -75,14 +46,6 @@ void vgic_init_pending_irq(struct pending_irq *p, unsigned int virq, p->vcpu_id = vcpu_id; } -static void vgic_rank_init(struct vgic_irq_rank *rank, uint8_t index, - unsigned int vcpu) -{ -spin_lock_init(>lock); - -rank->index = index; -} - int domain_vgic_register(struct domain *d, int *mmio_count) { switch ( d->arch.vgic.version ) @@ -121,11 +84,6 @@ int domain_vgic_init(struct domain *d, unsigned int nr_spis) spin_lock_init(>arch.vgic.lock); -d->arch.vgic.shared_irqs = -xzalloc_array(struct vgic_irq_rank, DOMAIN_NR_RANKS(d)); -if ( d->arch.vgic.shared_irqs == NULL ) -return -ENOMEM; - d->arch.vgic.pending_irqs = xzalloc_array(struct pending_irq, d->arch.vgic.nr_spis); if ( d->arch.vgic.pending_irqs == NULL ) @@ -134,9 +92,6 @@ int domain_vgic_init(struct domain *d, unsigned int nr_spis) /* SPIs are routed to VCPU0 by default */ for (i=0; iarch.vgic.nr_spis; i++) vgic_init_pending_irq(>arch.vgic.pending_irqs[i], i + 32, 0); -/* SPIs are routed to VCPU0 by default */ -for ( i = 0; i < DOMAIN_NR_RANKS(d); i++ ) -vgic_rank_init(>arch.vgic.shared_irqs[i], i + 1, 0); ret = d->arch.vgic.handler->domain_init(d); if ( ret ) @@ -178,7 +133,6 @@ void domain_vgic_free(struct domain *d) } d->arch.vgic.handler->domain_free(d); -xfree(d->arch.vgic.shared_irqs); xfree(d->arch.vgic.pending_irqs); xfree(d->arch.vgic.allocated_irqs); } @@ -187,13 +141,6 @@ int vcpu_vgic_init(struct vcpu *v) { int i; -v->arch.vgic.private_irqs = xzalloc(struct vgic_irq_rank); -if ( v->arch.vgic.private_irqs == NULL ) - return -ENOMEM; - -/* SGIs/PPIs are always routed to this VCPU */ -vgic_rank_init(v->arch.vgic.private_irqs, 0, v->vcpu_id); - v->domain->arch.vgic.handler->vcpu_init(v); memset(>arch.vgic.pending_irqs, 0, sizeof(v->arch.vgic.pending_irqs)); @@ -210,7 +157,6 @@ int vcpu_vgic_init(struct vcpu *v) int vcpu_vgic_free(struct vcpu *v) { -xfree(v->arch.vgic.private_irqs); return 0; } diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h index 8dfc1d1..418400f 100644 --- a/xen/include/asm-arm/domain.h +++ b/xen/include/asm-arm/domain.h @@ -83,15 +83,12 @@ struct arch_domain * shared_irqs where each member contains its own locking. * * If both class of lock is required then this lock must be - * taken first. If multiple rank locks are required (including - * the per-vcpu private_irqs rank) then they must be taken in - * rank order. + * taken first. */ spinlock_t lock; uint32_t ctlr; int nr_spis; /* Number of SPIs */ unsigned long *allocated_irqs; /* bitmap of IRQs allocated */ -struct vgic_irq_rank *shared_irqs; /* * SPIs are domain global, SGIs and PPIs are per-VCPU and stored in * struct arch_vcpu. @@ -248,7 +245,6 @@ struct arch_vcpu * struct arch_domain. */ struct pending_irq pending_irqs[32]; -struct vgic_irq_rank *private_irqs; /* This list is ordered by IRQ priority and it is used to keep * track of the IRQs that the VGIC injected into the guest. diff --git a/xen/include/asm-arm/vgic.h b/xen/include/asm-arm/vgic.h index 233ff1f..9c79c5e 100644 --- a/xen/include/asm-arm/vgic.h +++ b/xen/include/asm-arm/vgic.h @@ -101,16
[Xen-devel] [RFC PATCH v2 15/22] ARM: vGIC: rework vgic_get_target_vcpu to take a pending_irq
For now vgic_get_target_vcpu takes a VCPU and an IRQ number, because this is what we need for finding the proper rank and the VCPU in there. In the future the VCPU will be looked up in the struct pending_irq. To avoid locking issues, let's pass the pointer to the pending_irq instead. We can read the IRQ number from there, and all but one caller know that pointer already anyway. This simplifies future code changes. Signed-off-by: Andre Przywara--- xen/arch/arm/gic.c | 2 +- xen/arch/arm/vgic.c| 22 -- xen/include/asm-arm/vgic.h | 2 +- 3 files changed, 14 insertions(+), 12 deletions(-) diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c index 84b282b..38e998a 100644 --- a/xen/arch/arm/gic.c +++ b/xen/arch/arm/gic.c @@ -559,7 +559,7 @@ void gic_update_one_lr(struct vcpu *v, int i) smp_wmb(); if ( test_bit(GIC_IRQ_GUEST_MIGRATING, >status) ) { -struct vcpu *v_target = vgic_get_target_vcpu(v, irq); +struct vcpu *v_target = vgic_get_target_vcpu(v, p); irq_set_affinity(p->desc, cpumask_of(v_target->processor)); clear_bit(GIC_IRQ_GUEST_MIGRATING, >status); } diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c index e5a4765..6722924 100644 --- a/xen/arch/arm/vgic.c +++ b/xen/arch/arm/vgic.c @@ -224,10 +224,11 @@ int vcpu_vgic_free(struct vcpu *v) return 0; } -struct vcpu *vgic_get_target_vcpu(struct vcpu *v, unsigned int virq) +struct vcpu *vgic_get_target_vcpu(struct vcpu *v, struct pending_irq *p) { -struct vgic_irq_rank *rank = vgic_rank_irq(v, virq); -int target = read_atomic(>vcpu[virq & INTERRUPT_RANK_MASK]); +struct vgic_irq_rank *rank = vgic_rank_irq(v, p->irq); +int target = read_atomic(>vcpu[p->irq & INTERRUPT_RANK_MASK]); + return v->domain->vcpu[target]; } @@ -391,8 +392,8 @@ void arch_move_irqs(struct vcpu *v) for ( i = 32; i < vgic_num_irqs(d); i++ ) { -v_target = vgic_get_target_vcpu(v, i); -p = irq_to_pending(v_target, i); +p = irq_to_pending(v, i); +v_target = vgic_get_target_vcpu(v, p); if ( v_target == v && !test_bit(GIC_IRQ_GUEST_MIGRATING, >status) ) irq_set_affinity(p->desc, cpu_mask); @@ -414,10 +415,10 @@ void vgic_disable_irqs(struct vcpu *v, uint32_t r, int n) while ( (i = find_next_bit(, 32, i)) < 32 ) { irq = i + (32 * n); -v_target = vgic_get_target_vcpu(v, irq); +p = irq_to_pending(v, irq); +v_target = vgic_get_target_vcpu(v, p); spin_lock_irqsave(_target->arch.vgic.lock, flags); -p = irq_to_pending(v_target, irq); clear_bit(GIC_IRQ_GUEST_ENABLED, >status); gic_remove_from_lr_pending(v_target, p); desc = p->desc; @@ -468,9 +469,9 @@ void vgic_enable_irqs(struct vcpu *v, uint32_t r, int n) while ( (i = find_next_bit(, 32, i)) < 32 ) { irq = i + (32 * n); -v_target = vgic_get_target_vcpu(v, irq); +p = irq_to_pending(v, irq); +v_target = vgic_get_target_vcpu(v, p); spin_lock_irqsave(_target->arch.vgic.lock, vcpu_flags); -p = irq_to_pending(v_target, irq); vgic_irq_lock(p, flags); set_bit(GIC_IRQ_GUEST_ENABLED, >status); int_type = test_bit(GIC_IRQ_GUEST_LEVEL, >status) ? @@ -666,12 +667,13 @@ out: void vgic_vcpu_inject_spi(struct domain *d, unsigned int virq) { +struct pending_irq *p = irq_to_pending(d->vcpu[0], virq); struct vcpu *v; /* the IRQ needs to be an SPI */ ASSERT(virq >= 32 && virq <= vgic_num_irqs(d)); -v = vgic_get_target_vcpu(d->vcpu[0], virq); +v = vgic_get_target_vcpu(d->vcpu[0], p); vgic_vcpu_inject_irq(v, virq); } diff --git a/xen/include/asm-arm/vgic.h b/xen/include/asm-arm/vgic.h index 14c22b2..7c6067d 100644 --- a/xen/include/asm-arm/vgic.h +++ b/xen/include/asm-arm/vgic.h @@ -213,7 +213,7 @@ enum gic_sgi_mode; extern int domain_vgic_init(struct domain *d, unsigned int nr_spis); extern void domain_vgic_free(struct domain *d); extern int vcpu_vgic_init(struct vcpu *v); -extern struct vcpu *vgic_get_target_vcpu(struct vcpu *v, unsigned int virq); +extern struct vcpu *vgic_get_target_vcpu(struct vcpu *v, struct pending_irq *p); extern void vgic_vcpu_inject_irq(struct vcpu *v, unsigned int virq); extern void vgic_vcpu_inject_spi(struct domain *d, unsigned int virq); extern void vgic_clear_pending_irqs(struct vcpu *v); -- 2.9.0 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [RFC PATCH v2 00/22] ARM: vGIC rework (attempt)
Hi, this is the first part of the attempt to rewrite the VGIC to solve the issues we discovered when adding the ITS emulation. The problems we identified resulted in the following list of things that need fixing: 1) introduce a per-IRQ lock 2) remove the IRQ rank scheme (of storing IRQ properties) 3) simplify the VCPU IRQ lists (getting rid of lr_queue) 4) introduce reference counting for struct pending_irq's 5) properly handle level triggered IRQs This series addresses the first two points. I tried to move point 3) up and fix that first, but that turned out to somehow depend on both points 1) and 2), so we have this order now. Still having the two lists makes things somewhat more complicated, though, but I think this is as best as it can get. After addressing point 3) (in a later post) the end result will look much better. I have some code for 3) and 5), mostly, but we need to agree on the first steps first. This is a bit of an open-heart surgery, as we try to change a locking scheme while staying bisectable (both in terms of compilability *and* runnability) and still having reviewable chunks. To help reviewing I tried to split the patches up as much as possible. Changes which are independent or introduce new functions are separate, the motivation for some of them becomes apparent only later. The rough idea of this series is to introduce the VGIC IRQ lock itself first, then move each of the rank members into struct pending_irq, adjusting the locking for that at the same time. To make the changes a bit smaller, I fixed some read locks in separate patches after the "move" patch. Also patch 09 adjusts the locking for setting the priority in the ITS, which is technially needed in patch 08 already, but moved out for the sake of reviewability. It might be squashed into patch 08 upon merging. As hinted above still having to cope with two lists leads to some atrocities, namely patch 03. This hideousness will vanish when the whole requirement of queueing an IRQ in that early state will go away. This is still somewhat work-in-progress, but I wanted to share the code anyway, since I spent way too much time on it (rewriting it several times on the way) and I am interested in some fresh pair of eyes to have a look. Currently the target VCPU move (patch 18) leads to a deadlock and I just ran out of time (before going on holidays) to debug this. So if someone could have a look to see if this approach in general looks good, I'd be grateful. I know that there is optimization potential (some functions can surely be refactored), but I'd rather do one step after the other. Cheers, Andre. Andre Przywara (22): ARM: vGIC: introduce and initialize pending_irq lock ARM: vGIC: route/remove_irq: replace rank lock with IRQ lock ARM: vGIC: move gic_raise_inflight_irq() into vgic_vcpu_inject_irq() ARM: vGIC: rename pending_irq->priority to cur_priority ARM: vITS: rename pending_irq->lpi_priority to priority ARM: vGIC: introduce locking routines for multiple IRQs ARM: vGIC: introduce priority setter/getter ARM: vGIC: move virtual IRQ priority from rank to pending_irq ARM: vITS: protect LPI priority update with pending_irq lock ARM: vGIC: protect gic_set_lr() with pending_irq lock ARM: vGIC: protect gic_events_need_delivery() with pending_irq lock ARM: vGIC: protect gic_update_one_lr() with pending_irq lock ARM: vITS: remove no longer needed lpi_priority wrapper ARM: vGIC: move virtual IRQ configuration from rank to pending_irq ARM: vGIC: rework vgic_get_target_vcpu to take a pending_irq ARM: vITS: rename lpi_vcpu_id to vcpu_id ARM: vGIC: introduce vgic_lock_vcpu_irq() ARM: vGIC: move virtual IRQ target VCPU from rank to pending_irq ARM: vGIC: rework vgic_get_target_vcpu to take a domain instead of vcpu ARM: vGIC: move virtual IRQ enable bit from rank to pending_irq ARM: vITS: injecting LPIs: use pending_irq lock ARM: vGIC: remove remaining irq_rank code xen/arch/arm/gic-v2.c| 2 +- xen/arch/arm/gic-v3-lpi.c| 14 +- xen/arch/arm/gic-v3.c| 2 +- xen/arch/arm/gic.c | 96 xen/arch/arm/vgic-v2.c | 161 - xen/arch/arm/vgic-v3-its.c | 42 ++-- xen/arch/arm/vgic-v3.c | 182 +-- xen/arch/arm/vgic.c | 521 +++ xen/include/asm-arm/domain.h | 6 +- xen/include/asm-arm/gic.h| 2 +- xen/include/asm-arm/vgic.h | 114 +++--- 11 files changed, 540 insertions(+), 602 deletions(-) -- 2.9.0 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [RFC PATCH v2 13/22] ARM: vITS: remove no longer needed lpi_priority wrapper
For LPIs we stored the priority value in struct pending_irq, but all other type of IRQs were using the irq_rank structure for that. Now that every IRQ using pending_irq, we can remove the special handling we had in place for LPIs and just use the now unified access wrappers. Signed-off-by: Andre Przywara--- xen/arch/arm/vgic-v2.c | 7 --- xen/arch/arm/vgic-v3.c | 11 --- xen/include/asm-arm/vgic.h | 1 - 3 files changed, 19 deletions(-) diff --git a/xen/arch/arm/vgic-v2.c b/xen/arch/arm/vgic-v2.c index ed7ff3b..a3fd500 100644 --- a/xen/arch/arm/vgic-v2.c +++ b/xen/arch/arm/vgic-v2.c @@ -690,18 +690,11 @@ static struct pending_irq *vgic_v2_lpi_to_pending(struct domain *d, BUG(); } -static int vgic_v2_lpi_get_priority(struct domain *d, unsigned int vlpi) -{ -/* Dummy function, no LPIs on a VGICv2. */ -BUG(); -} - static const struct vgic_ops vgic_v2_ops = { .vcpu_init = vgic_v2_vcpu_init, .domain_init = vgic_v2_domain_init, .domain_free = vgic_v2_domain_free, .lpi_to_pending = vgic_v2_lpi_to_pending, -.lpi_get_priority = vgic_v2_lpi_get_priority, .max_vcpus = 8, }; diff --git a/xen/arch/arm/vgic-v3.c b/xen/arch/arm/vgic-v3.c index e58e77e..d3356ae 100644 --- a/xen/arch/arm/vgic-v3.c +++ b/xen/arch/arm/vgic-v3.c @@ -1757,23 +1757,12 @@ static struct pending_irq *vgic_v3_lpi_to_pending(struct domain *d, return pirq; } -/* Retrieve the priority of an LPI from its struct pending_irq. */ -static int vgic_v3_lpi_get_priority(struct domain *d, uint32_t vlpi) -{ -struct pending_irq *p = vgic_v3_lpi_to_pending(d, vlpi); - -ASSERT(p); - -return p->priority; -} - static const struct vgic_ops v3_ops = { .vcpu_init = vgic_v3_vcpu_init, .domain_init = vgic_v3_domain_init, .domain_free = vgic_v3_domain_free, .emulate_reg = vgic_v3_emulate_reg, .lpi_to_pending = vgic_v3_lpi_to_pending, -.lpi_get_priority = vgic_v3_lpi_get_priority, /* * We use both AFF1 and AFF0 in (v)MPIDR. Thus, the max number of CPU * that can be supported is up to 4096(==256*16) in theory. diff --git a/xen/include/asm-arm/vgic.h b/xen/include/asm-arm/vgic.h index 59d52c6..6343c95 100644 --- a/xen/include/asm-arm/vgic.h +++ b/xen/include/asm-arm/vgic.h @@ -143,7 +143,6 @@ struct vgic_ops { bool (*emulate_reg)(struct cpu_user_regs *regs, union hsr hsr); /* lookup the struct pending_irq for a given LPI interrupt */ struct pending_irq *(*lpi_to_pending)(struct domain *d, unsigned int vlpi); -int (*lpi_get_priority)(struct domain *d, uint32_t vlpi); /* Maximum number of vCPU supported */ const unsigned int max_vcpus; }; -- 2.9.0 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [RFC PATCH v2 02/22] ARM: vGIC: route/remove_irq: replace rank lock with IRQ lock
So far the rank lock is protecting the physical IRQ routing for a particular virtual IRQ (though this doesn't seem to be documented anywhere). So although these functions don't really touch the rank structure, the lock prevents them from running concurrently. This seems a bit like a kludge, so as we now have our newly introduced per-IRQ lock, we can use that instead to get a more natural protection (and remove the first rank user). Signed-off-by: Andre Przywara--- xen/arch/arm/gic.c | 18 +++--- 1 file changed, 7 insertions(+), 11 deletions(-) diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c index 6c803bf..2c99d71 100644 --- a/xen/arch/arm/gic.c +++ b/xen/arch/arm/gic.c @@ -139,9 +139,7 @@ int gic_route_irq_to_guest(struct domain *d, unsigned int virq, unsigned long flags; /* Use vcpu0 to retrieve the pending_irq struct. Given that we only * route SPIs to guests, it doesn't make any difference. */ -struct vcpu *v_target = vgic_get_target_vcpu(d->vcpu[0], virq); -struct vgic_irq_rank *rank = vgic_rank_irq(v_target, virq); -struct pending_irq *p = irq_to_pending(v_target, virq); +struct pending_irq *p = irq_to_pending(d->vcpu[0], virq); int res = -EBUSY; ASSERT(spin_is_locked(>lock)); @@ -150,7 +148,7 @@ int gic_route_irq_to_guest(struct domain *d, unsigned int virq, ASSERT(virq < vgic_num_irqs(d)); ASSERT(!is_lpi(virq)); -vgic_lock_rank(v_target, rank, flags); +vgic_irq_lock(p, flags); if ( p->desc || /* The VIRQ should not be already enabled by the guest */ @@ -168,7 +166,7 @@ int gic_route_irq_to_guest(struct domain *d, unsigned int virq, res = 0; out: -vgic_unlock_rank(v_target, rank, flags); +vgic_irq_unlock(p, flags); return res; } @@ -177,9 +175,7 @@ out: int gic_remove_irq_from_guest(struct domain *d, unsigned int virq, struct irq_desc *desc) { -struct vcpu *v_target = vgic_get_target_vcpu(d->vcpu[0], virq); -struct vgic_irq_rank *rank = vgic_rank_irq(v_target, virq); -struct pending_irq *p = irq_to_pending(v_target, virq); +struct pending_irq *p = irq_to_pending(d->vcpu[0], virq); unsigned long flags; ASSERT(spin_is_locked(>lock)); @@ -187,7 +183,7 @@ int gic_remove_irq_from_guest(struct domain *d, unsigned int virq, ASSERT(p->desc == desc); ASSERT(!is_lpi(virq)); -vgic_lock_rank(v_target, rank, flags); +vgic_irq_lock(p, flags); if ( d->is_dying ) { @@ -207,7 +203,7 @@ int gic_remove_irq_from_guest(struct domain *d, unsigned int virq, if ( test_bit(_IRQ_INPROGRESS, >status) || !test_bit(_IRQ_DISABLED, >status) ) { -vgic_unlock_rank(v_target, rank, flags); +vgic_irq_unlock(p, flags); return -EBUSY; } } @@ -217,7 +213,7 @@ int gic_remove_irq_from_guest(struct domain *d, unsigned int virq, p->desc = NULL; -vgic_unlock_rank(v_target, rank, flags); +vgic_irq_unlock(p, flags); return 0; } -- 2.9.0 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [RFC PATCH v2 07/22] ARM: vGIC: introduce priority setter/getter
Since the GICs MMIO access always covers a number of IRQs at once, introduce wrapper functions which loop over those IRQs, take their locks and read or update the priority values. This will be used in a later patch. Signed-off-by: Andre Przywara--- xen/arch/arm/vgic.c| 37 + xen/include/asm-arm/vgic.h | 5 + 2 files changed, 42 insertions(+) diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c index 434b7e2..b2c9632 100644 --- a/xen/arch/arm/vgic.c +++ b/xen/arch/arm/vgic.c @@ -243,6 +243,43 @@ static int vgic_get_virq_priority(struct vcpu *v, unsigned int virq) return ACCESS_ONCE(rank->priority[virq & INTERRUPT_RANK_MASK]); } +#define MAX_IRQS_PER_IPRIORITYR 4 +uint32_t vgic_fetch_irq_priority(struct vcpu *v, unsigned int nrirqs, + unsigned int first_irq) +{ +struct pending_irq *pirqs[MAX_IRQS_PER_IPRIORITYR]; +unsigned long flags; +uint32_t ret = 0, i; + +local_irq_save(flags); +vgic_lock_irqs(v, nrirqs, first_irq, pirqs); + +for ( i = 0; i < nrirqs; i++ ) +ret |= pirqs[i]->priority << (i * 8); + +vgic_unlock_irqs(pirqs, nrirqs); +local_irq_restore(flags); + +return ret; +} + +void vgic_store_irq_priority(struct vcpu *v, unsigned int nrirqs, + unsigned int first_irq, uint32_t value) +{ +struct pending_irq *pirqs[MAX_IRQS_PER_IPRIORITYR]; +unsigned long flags; +unsigned int i; + +local_irq_save(flags); +vgic_lock_irqs(v, nrirqs, first_irq, pirqs); + +for ( i = 0; i < nrirqs; i++, value >>= 8 ) +pirqs[i]->priority = value & 0xff; + +vgic_unlock_irqs(pirqs, nrirqs); +local_irq_restore(flags); +} + bool vgic_migrate_irq(struct vcpu *old, struct vcpu *new, unsigned int irq) { unsigned long flags; diff --git a/xen/include/asm-arm/vgic.h b/xen/include/asm-arm/vgic.h index ecf4969..f3791c8 100644 --- a/xen/include/asm-arm/vgic.h +++ b/xen/include/asm-arm/vgic.h @@ -198,6 +198,11 @@ void vgic_lock_irqs(struct vcpu *v, unsigned int nrirqs, unsigned int first_irq, struct pending_irq **pirqs); void vgic_unlock_irqs(struct pending_irq **pirqs, unsigned int nrirqs); +uint32_t vgic_fetch_irq_priority(struct vcpu *v, unsigned int nrirqs, + unsigned int first_irq); +void vgic_store_irq_priority(struct vcpu *v, unsigned int nrirqs, + unsigned int first_irq, uint32_t reg); + enum gic_sgi_mode; /* -- 2.9.0 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [RFC PATCH v2 06/22] ARM: vGIC: introduce locking routines for multiple IRQs
When replacing the rank lock with individual per-IRQs lock soon, we will still need the ability to lock multiple IRQs. Provide two helper routines which lock and unlock a number of consecutive IRQs in the right order. Forward-looking the locking function fills an array of pending_irq pointers, so the lookup has only to be done once. These routines expect that local_irq_save() has been called before the lock routine and the respective local_irq_restore() after the unlock function. Signed-off-by: Andre Przywara--- xen/arch/arm/vgic.c| 20 xen/include/asm-arm/vgic.h | 4 2 files changed, 24 insertions(+) diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c index 21b545e..434b7e2 100644 --- a/xen/arch/arm/vgic.c +++ b/xen/arch/arm/vgic.c @@ -375,6 +375,26 @@ static inline unsigned int vgic_get_virq_type(struct vcpu *v, int n, int index) return IRQ_TYPE_LEVEL_HIGH; } +void vgic_lock_irqs(struct vcpu *v, unsigned int nrirqs, +unsigned int first_irq, struct pending_irq **pirqs) +{ +unsigned int i; + +for ( i = 0; i < nrirqs; i++ ) +{ +pirqs[i] = irq_to_pending(v, first_irq + i); +spin_lock([i]->lock); +} +} + +void vgic_unlock_irqs(struct pending_irq **pirqs, unsigned int nrirqs) +{ +int i; + +for ( i = nrirqs - 1; i >= 0; i-- ) +spin_unlock([i]->lock); +} + void vgic_enable_irqs(struct vcpu *v, uint32_t r, int n) { const unsigned long mask = r; diff --git a/xen/include/asm-arm/vgic.h b/xen/include/asm-arm/vgic.h index 27b5e37..ecf4969 100644 --- a/xen/include/asm-arm/vgic.h +++ b/xen/include/asm-arm/vgic.h @@ -194,6 +194,10 @@ static inline int REG_RANK_NR(int b, uint32_t n) } } +void vgic_lock_irqs(struct vcpu *v, unsigned int nrirqs, unsigned int first_irq, +struct pending_irq **pirqs); +void vgic_unlock_irqs(struct pending_irq **pirqs, unsigned int nrirqs); + enum gic_sgi_mode; /* -- 2.9.0 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 6/6] xen: sched: optimize exclusive pinning case (Credit1 & 2)
On Fri, 2017-07-21 at 18:19 +0100, George Dunlap wrote: > On 06/23/2017 11:55 AM, Dario Faggioli wrote: > > diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c > > index 4f6330e..85e014d 100644 > > --- a/xen/common/sched_credit.c > > +++ b/xen/common/sched_credit.c > > @@ -429,6 +429,24 @@ static inline void __runq_tickle(struct > > csched_vcpu *new) > > idlers_empty = cpumask_empty(_mask); > > > > /* > > + * Exclusive pinning is when a vcpu has hard-affinity with > > only one > > + * cpu, and there is no other vcpu that has hard-affinity with > > that > > + * same cpu. This is infrequent, but if it happens, is for > > achieving > > + * the most possible determinism, and least possible overhead > > for > > + * the vcpus in question. > > + * > > + * Try to identify the vast majority of these situations, and > > deal > > + * with them quickly. > > + */ > > +if ( unlikely(cpumask_cycle(cpu, new->vcpu->cpu_hard_affinity) > > == cpu && > > Won't this check entail a full "loop" of the cpumask? It's cheap > enough > if nr_cpu_ids is small; but don't we support (theoretically) 4096 > logical cpus? > > It seems like having a vcpu flag that identifies a vcpu as being > pinned > would be a more efficient way to do this. That way we could run this > check once whenever the hard affinity changed, rather than every time > we > want to think about where to run this vcpu. > > What do you think? > Right. We actually should get some help from the hardware (ffs & firends)... but I think you're right. Implementing this with a flag, as you're suggesting, is most likely better, and easy enough. I'll go for that! Regards, Dario -- <> (Raistlin Majere) - Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R Ltd., Cambridge (UK) signature.asc Description: This is a digitally signed message part ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 4/6] xen: credit2: rearrange members of control structures
On Fri, 2017-07-21 at 18:05 +0100, George Dunlap wrote: > On 06/23/2017 11:55 AM, Dario Faggioli wrote: > > > > While there, improve the wording, style and alignment > > of comments too. > > > > Signed-off-by: Dario Faggioli> > I haven't taken a careful look at these; the idea sounds good and > I'll > trust that you've taken a careful look at them: > Hehe... thanks! :-) I've even done the whole thing twice. In fact, I was about to submit the series, when I discovered that I did optimize the cache layout of a debug build, and hence had to redo everything from the beginning! :-P Regards, Dario -- <> (Raistlin Majere) - Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R Ltd., Cambridge (UK) signature.asc Description: This is a digitally signed message part ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 5/6] xen: RTDS: rearrange members of control structures
On Fri, 2017-07-21 at 13:51 -0400, Meng Xu wrote: > On Fri, Jun 23, 2017 at 6:55 AM, Dario Faggioli >wrote: > > > > Nothing changed in `pahole` output, in terms of holes > > and padding, but some fields have been moved, to put > > related members in same cache line. > > > > Signed-off-by: Dario Faggioli > > --- > > Cc: Meng Xu > > Cc: George Dunlap > > --- > > xen/common/sched_rt.c | 13 - > > 1 file changed, 8 insertions(+), 5 deletions(-) > > > > diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c > > index 1b30014..39f6bee 100644 > > --- a/xen/common/sched_rt.c > > +++ b/xen/common/sched_rt.c > > @@ -171,11 +171,14 @@ static void repl_timer_handler(void *data); > > struct rt_private { > > spinlock_t lock;/* the global coarse-grained lock > > */ > > struct list_head sdom; /* list of availalbe domains, used > > for dump */ > > + > > struct list_head runq; /* ordered list of runnable vcpus > > */ > > struct list_head depletedq; /* unordered list of depleted > > vcpus */ > > + > > +struct timer *repl_timer; /* replenishment timer */ > > struct list_head replq; /* ordered list of vcpus that need > > replenishment */ > > + > > cpumask_t tickled; /* cpus been tickled */ > > -struct timer *repl_timer; /* replenishment timer */ > > }; > > > > /* > > @@ -185,10 +188,6 @@ struct rt_vcpu { > > struct list_head q_elem; /* on the runq/depletedq list */ > > struct list_head replq_elem; /* on the replenishment events > > list */ > > > > -/* Up-pointers */ > > -struct rt_dom *sdom; > > -struct vcpu *vcpu; > > - > > /* VCPU parameters, in nanoseconds */ > > s_time_t period; > > s_time_t budget; > > @@ -198,6 +197,10 @@ struct rt_vcpu { > > s_time_t last_start; /* last start time */ > > s_time_t cur_deadline; /* current deadline for EDF */ > > > > +/* Up-pointers */ > > +struct rt_dom *sdom; > > +struct vcpu *vcpu; > > + > > unsigned flags; /* mark __RTDS_scheduled, etc.. > > */ > > }; > > > > Reviewed-by: Meng Xu > > BTW, Dario, I'm wondering if you used any tool to give hints about > how > to arrange the fields in a structure or you just did it manually? > I used pahole for figuring out the cache layout. But just that. So, basically, I --manually-- tried to move the fields around, and check the result with pahole (and then did it again, and again. :-D). TBH, the improvement for RTDS is probably not even noticeable, as we access almost all the fields anyway. But it still makes sense, IMO. Thanks for the review, Dario -- <> (Raistlin Majere) - Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R Ltd., Cambridge (UK) signature.asc Description: This is a digitally signed message part ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [xen-unstable-smoke test] 112104: tolerable trouble: broken/pass - PUSHED
flight 112104 xen-unstable-smoke real [real] http://logs.test-lab.xenproject.org/osstest/logs/112104/ Failures :-/ but no regressions. Tests which did not succeed, but are not blocking: test-arm64-arm64-xl-xsm 1 build-check(1) blocked n/a test-amd64-amd64-libvirt 13 migrate-support-checkfail never pass test-armhf-armhf-xl 13 migrate-support-checkfail never pass test-armhf-armhf-xl 14 saverestore-support-checkfail never pass version targeted for testing: xen 647de517b08e77b9b5f76d6853dddc759b8df0b4 baseline version: xen 73771b89fd9d89a23d5c7b760056fdaf94946be9 Last test of basis 112062 2017-07-20 18:14:31 Z1 days Testing same since 112104 2017-07-21 18:18:21 Z0 days1 attempts People who touched revisions under test: Dario FaggioliGeorge Dunlap jobs: build-amd64 pass build-armhf pass build-amd64-libvirt pass test-armhf-armhf-xl pass test-arm64-arm64-xl-xsm broken test-amd64-amd64-xl-qemuu-debianhvm-i386 pass test-amd64-amd64-libvirt pass sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Pushing revision : + branch=xen-unstable-smoke + revision=647de517b08e77b9b5f76d6853dddc759b8df0b4 + . ./cri-lock-repos ++ . ./cri-common +++ . ./cri-getconfig +++ umask 002 +++ getrepos getconfig Repos perl -e ' use Osstest; readglobalconfig(); print $c{"Repos"} or die $!; ' +++ local repos=/home/osstest/repos +++ '[' -z /home/osstest/repos ']' +++ '[' '!' -d /home/osstest/repos ']' +++ echo /home/osstest/repos ++ repos=/home/osstest/repos ++ repos_lock=/home/osstest/repos/lock ++ '[' x '!=' x/home/osstest/repos/lock ']' ++ OSSTEST_REPOS_LOCK_LOCKED=/home/osstest/repos/lock ++ exec with-lock-ex -w /home/osstest/repos/lock ./ap-push xen-unstable-smoke 647de517b08e77b9b5f76d6853dddc759b8df0b4 + branch=xen-unstable-smoke + revision=647de517b08e77b9b5f76d6853dddc759b8df0b4 + . ./cri-lock-repos ++ . ./cri-common +++ . ./cri-getconfig +++ umask 002 +++ getrepos getconfig Repos perl -e ' use Osstest; readglobalconfig(); print $c{"Repos"} or die $!; ' +++ local repos=/home/osstest/repos +++ '[' -z /home/osstest/repos ']' +++ '[' '!' -d /home/osstest/repos ']' +++ echo /home/osstest/repos ++ repos=/home/osstest/repos ++ repos_lock=/home/osstest/repos/lock ++ '[' x/home/osstest/repos/lock '!=' x/home/osstest/repos/lock ']' + . ./cri-common ++ . ./cri-getconfig ++ umask 002 + select_xenbranch + case "$branch" in + tree=xen + xenbranch=xen-unstable-smoke + qemuubranch=qemu-upstream-unstable + '[' xxen = xlinux ']' + linuxbranch= + '[' xqemu-upstream-unstable = x ']' + select_prevxenbranch ++ ./cri-getprevxenbranch xen-unstable-smoke + prevxenbranch=xen-4.9-testing + '[' x647de517b08e77b9b5f76d6853dddc759b8df0b4 = x ']' + : tested/2.6.39.x + . ./ap-common ++ : osst...@xenbits.xen.org +++ getconfig OsstestUpstream +++ perl -e ' use Osstest; readglobalconfig(); print $c{"OsstestUpstream"} or die $!; ' ++ : ++ : git://xenbits.xen.org/xen.git ++ : osst...@xenbits.xen.org:/home/xen/git/xen.git ++ : git://xenbits.xen.org/qemu-xen-traditional.git ++ : git://git.kernel.org ++ : git://git.kernel.org/pub/scm/linux/kernel/git ++ : git ++ : git://xenbits.xen.org/xtf.git ++ : osst...@xenbits.xen.org:/home/xen/git/xtf.git ++ : git://xenbits.xen.org/xtf.git ++ : git://xenbits.xen.org/libvirt.git ++ : osst...@xenbits.xen.org:/home/xen/git/libvirt.git ++ : git://xenbits.xen.org/libvirt.git ++ : git://xenbits.xen.org/osstest/rumprun.git ++ : git ++ : git://xenbits.xen.org/osstest/rumprun.git ++ : osst...@xenbits.xen.org:/home/xen/git/osstest/rumprun.git ++ : git://git.seabios.org/seabios.git ++ : osst...@xenbits.xen.org:/home/xen/git/osstest/seabios.git ++ : git://xenbits.xen.org/osstest/seabios.git ++ : https://github.com/tianocore/edk2.git ++ : osst...@xenbits.xen.org:/home/xen/git/osstest/ovmf.git ++ :
Re: [Xen-devel] [PATCH 22/25 v6] xen/arm: vpl011: Add support for vuart console in xenconsole
On Fri, 21 Jul 2017, Julien Grall wrote: > Hi, > > On 18/07/17 21:07, Stefano Stabellini wrote: > > On Mon, 17 Jul 2017, Bhupinder Thakur wrote: > > > This patch finally adds the support for vuart console. It adds > > > two new fields in the console initialization: > > > > > > - optional > > > - prefer_gnttab > > > > > > optional flag tells whether the console is optional. > > > > > > prefer_gnttab tells whether the ring buffer should be allocated using > > > grant table. > > > > > > Signed-off-by: Bhupinder Thakur> > > --- > > > CC: Ian Jackson > > > CC: Wei Liu > > > CC: Stefano Stabellini > > > CC: Julien Grall > > > > > > Changes since v4: > > > - Renamed VUART_CFLAGS- to CFLAGS_vuart- in the Makefile as per the > > > convention. > > > > > > config/arm32.mk | 1 + > > > config/arm64.mk | 1 + > > > tools/console/Makefile| 3 ++- > > > tools/console/daemon/io.c | 29 - > > > 4 files changed, 32 insertions(+), 2 deletions(-) > > > > > > diff --git a/config/arm32.mk b/config/arm32.mk > > > index f95228e..b9f23fe 100644 > > > --- a/config/arm32.mk > > > +++ b/config/arm32.mk > > > @@ -1,5 +1,6 @@ > > > CONFIG_ARM := y > > > CONFIG_ARM_32 := y > > > +CONFIG_VUART_CONSOLE := y > > > CONFIG_ARM_$(XEN_OS) := y > > > > > > CONFIG_XEN_INSTALL_SUFFIX := > > > > What about leaving this off for ARM32 by default? > > Why? This will only disable xenconsole changes and not the hypervisor. The > changes are quite tiny, so I would even be in favor of enabling for all > architectures. > > Or are you suggesting to disable the VPL011 emulation in the hypervisor? But I > don't see the emulation AArch64 specific, and a user could disable it if he > doesn't want it... I was thinking that the virtual pl011 is mostly useful for SBSA compliance, which doesn't really apply to ARM32 (there are no ARM32 SBSA compliant platforms as far as I am aware). Given that we don't need vpl011 on ARM32, I thought we might as well disable it. Less code the better. I wouldn't go as far as introducing more #ifdefs to disable it, but I would make use of the existing config options to turn it off by default on ARM32. Does it make sense? That said, you are right that there is no point in disabling only CONFIG_VUART_CONSOLE, which affects the tools only. We should really disable SBSA_VUART_CONSOLE by default on ARM32. In fact, ideally CONFIG_VUART_CONSOLE would be set dependning on the value of SBSA_VUART_CONSOLE. What do you think? ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [RFC v3]Proposal to allow setting up shared memory areas between VMs from xl config file
On Fri, 21 Jul 2017, Julien Grall wrote: > > > @x86_cacheattrcan be 'uc', 'wc', 'wt', 'wp', 'wb' or 'suc'. > > > Default > > > is 'wb'. > > > > Also here, I would write: > > > > @x86_cacheattr Only 'wb' (write-back) is supported today. > > > > Like you wrote later, begin and end addresses need to be multiple of 4K. > > This is not true. The addresses should be a multiple of the hypervisor page > granularity. > > I will not be possible to map a 4K chunk in stage-2 when the hypervisor is > using 16K or 64K page granularity. Yes, but there are no 16K or 64K hypervisor pages now. So far, we have not really attemped to say "granularity" for hypevisor pages rather than 4K, given that 4K has always been a solid assumption. But this doc could be the right time to start doing that :-) ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] xen/pvcalls: use WARN_ON(1) instead of __WARN()
On Fri, 21 Jul 2017, Arnd Bergmann wrote: > __WARN() is an internal helper that is only available on > some architectures, but causes a build error e.g. on ARM64 > in some configurations: > > drivers/xen/pvcalls-back.c: In function 'set_backend_state': > drivers/xen/pvcalls-back.c:1097:5: error: implicit declaration of function > '__WARN' [-Werror=implicit-function-declaration] > > Unfortunately, there is no equivalent of BUG() that takes no > arguments, but WARN_ON(1) is commonly used in other drivers > and works on all configurations. > > Fixes: 7160378206b2 ("xen/pvcalls: xenbus state handling") > Signed-off-by: Arnd BergmannReviewed-by: Stefano Stabellini > --- > drivers/xen/pvcalls-back.c | 10 +- > 1 file changed, 5 insertions(+), 5 deletions(-) > > diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c > index d6c4c4aecb41..00c1a2344330 100644 > --- a/drivers/xen/pvcalls-back.c > +++ b/drivers/xen/pvcalls-back.c > @@ -1094,7 +1094,7 @@ static void set_backend_state(struct xenbus_device *dev, > xenbus_switch_state(dev, XenbusStateClosing); > break; > default: > - __WARN(); > + WARN_ON(1); > } > break; > case XenbusStateInitWait: > @@ -1109,7 +1109,7 @@ static void set_backend_state(struct xenbus_device *dev, > xenbus_switch_state(dev, XenbusStateClosing); > break; > default: > - __WARN(); > + WARN_ON(1); > } > break; > case XenbusStateConnected: > @@ -1123,7 +1123,7 @@ static void set_backend_state(struct xenbus_device *dev, > xenbus_switch_state(dev, XenbusStateClosing); > break; > default: > - __WARN(); > + WARN_ON(1); > } > break; > case XenbusStateClosing: > @@ -1134,11 +1134,11 @@ static void set_backend_state(struct xenbus_device > *dev, > xenbus_switch_state(dev, XenbusStateClosed); > break; > default: > - __WARN(); > + WARN_ON(1); > } > break; > default: > - __WARN(); > + WARN_ON(1); > } > } > } > -- > 2.9.0 > ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PULL for-2.10 6/7] xen/mapcache: introduce xen_replace_cache_entry()
On 21/07/17 14:50, Anthony PERARD wrote: On Tue, Jul 18, 2017 at 03:22:41PM -0700, Stefano Stabellini wrote: From: Igor Druzhinin... +static uint8_t *xen_replace_cache_entry_unlocked(hwaddr old_phys_addr, + hwaddr new_phys_addr, + hwaddr size) +{ +MapCacheEntry *entry; +hwaddr address_index, address_offset; +hwaddr test_bit_size, cache_size = size; + +address_index = old_phys_addr >> MCACHE_BUCKET_SHIFT; +address_offset = old_phys_addr & (MCACHE_BUCKET_SIZE - 1); + +assert(size); +/* test_bit_size is always a multiple of XC_PAGE_SIZE */ +test_bit_size = size + (old_phys_addr & (XC_PAGE_SIZE - 1)); +if (test_bit_size % XC_PAGE_SIZE) { +test_bit_size += XC_PAGE_SIZE - (test_bit_size % XC_PAGE_SIZE); +} +cache_size = size + address_offset; +if (cache_size % MCACHE_BUCKET_SIZE) { +cache_size += MCACHE_BUCKET_SIZE - (cache_size % MCACHE_BUCKET_SIZE); +} + +entry = >entry[address_index % mapcache->nr_buckets]; +while (entry && !(entry->paddr_index == address_index && + entry->size == cache_size)) { +entry = entry->next; +} +if (!entry) { +DPRINTF("Trying to update an entry for %lx " \ +"that is not in the mapcache!\n", old_phys_addr); +return NULL; +} + +address_index = new_phys_addr >> MCACHE_BUCKET_SHIFT; +address_offset = new_phys_addr & (MCACHE_BUCKET_SIZE - 1); + +fprintf(stderr, "Replacing a dummy mapcache entry for %lx with %lx\n", +old_phys_addr, new_phys_addr); Looks likes this does not build on 32bits. in: http://logs.test-lab.xenproject.org/osstest/logs/112041/build-i386/6.ts-xen-build.log /home/osstest/build.112041.build-i386/xen/tools/qemu-xen-dir/hw/i386/xen/xen-mapcache.c: In function 'xen_replace_cache_entry_unlocked': /home/osstest/build.112041.build-i386/xen/tools/qemu-xen-dir/hw/i386/xen/xen-mapcache.c:539:13: error: format '%lx' expects argument of type 'long unsigned int', but argument 3 has type 'hwaddr' [-Werror=format=] old_phys_addr, new_phys_addr); ^ /home/osstest/build.112041.build-i386/xen/tools/qemu-xen-dir/hw/i386/xen/xen-mapcache.c:539:13: error: format '%lx' expects argument of type 'long unsigned int', but argument 4 has type 'hwaddr' [-Werror=format=] cc1: all warnings being treated as errors CC i386-softmmu/target/i386/gdbstub.o /home/osstest/build.112041.build-i386/xen/tools/qemu-xen-dir/rules.mak:66: recipe for target 'hw/i386/xen/xen-mapcache.o' failed + +xen_remap_bucket(entry, entry->vaddr_base, + cache_size, address_index, false); +if (!test_bits(address_offset >> XC_PAGE_SHIFT, +test_bit_size >> XC_PAGE_SHIFT, +entry->valid_mapping)) { +DPRINTF("Unable to update a mapcache entry for %lx!\n", old_phys_addr); +return NULL; +} + +return entry->vaddr_base + address_offset; +} + Please, accept the attached patch to fix the issue. Igor >From 69a3afa453e283e92ddfd76109b203a20a02524c Mon Sep 17 00:00:00 2001 From: Igor Druzhinin Date: Fri, 21 Jul 2017 19:27:41 +0100 Subject: [PATCH] xen: fix compilation on 32-bit hosts Signed-off-by: Igor Druzhinin --- hw/i386/xen/xen-mapcache.c | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/hw/i386/xen/xen-mapcache.c b/hw/i386/xen/xen-mapcache.c index 84cc4a2..540406a 100644 --- a/hw/i386/xen/xen-mapcache.c +++ b/hw/i386/xen/xen-mapcache.c @@ -529,7 +529,7 @@ static uint8_t *xen_replace_cache_entry_unlocked(hwaddr old_phys_addr, entry = entry->next; } if (!entry) { -DPRINTF("Trying to update an entry for %lx " \ +DPRINTF("Trying to update an entry for "TARGET_FMT_plx \ "that is not in the mapcache!\n", old_phys_addr); return NULL; } @@ -537,15 +537,16 @@ static uint8_t *xen_replace_cache_entry_unlocked(hwaddr old_phys_addr, address_index = new_phys_addr >> MCACHE_BUCKET_SHIFT; address_offset = new_phys_addr & (MCACHE_BUCKET_SIZE - 1); -fprintf(stderr, "Replacing a dummy mapcache entry for %lx with %lx\n", -old_phys_addr, new_phys_addr); +fprintf(stderr, "Replacing a dummy mapcache entry for "TARGET_FMT_plx \ +" with "TARGET_FMT_plx"\n", old_phys_addr, new_phys_addr); xen_remap_bucket(entry, entry->vaddr_base, cache_size, address_index, false); if(!test_bits(address_offset >> XC_PAGE_SHIFT, test_bit_size >> XC_PAGE_SHIFT, entry->valid_mapping)) { -DPRINTF("Unable to update a mapcache entry for %lx!\n", old_phys_addr); +DPRINTF("Unable to update a mapcache entry for "TARGET_FMT_plx"!\n", +old_phys_addr);
Re: [Xen-devel] [PATCH] xen/pvcalls: use WARN_ON(1) instead of __WARN()
On 07/21/2017 12:17 PM, Arnd Bergmann wrote: > __WARN() is an internal helper that is only available on > some architectures, but causes a build error e.g. on ARM64 > in some configurations: > > drivers/xen/pvcalls-back.c: In function 'set_backend_state': > drivers/xen/pvcalls-back.c:1097:5: error: implicit declaration of function > '__WARN' [-Werror=implicit-function-declaration] > > Unfortunately, there is no equivalent of BUG() that takes no > arguments, but WARN_ON(1) is commonly used in other drivers > and works on all configurations. > > Fixes: 7160378206b2 ("xen/pvcalls: xenbus state handling") > Signed-off-by: Arnd BergmannReviewed-by: Boris Ostrovsky ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 5/6] xen: RTDS: rearrange members of control structures
On Fri, Jun 23, 2017 at 6:55 AM, Dario Faggioliwrote: > > Nothing changed in `pahole` output, in terms of holes > and padding, but some fields have been moved, to put > related members in same cache line. > > Signed-off-by: Dario Faggioli > --- > Cc: Meng Xu > Cc: George Dunlap > --- > xen/common/sched_rt.c | 13 - > 1 file changed, 8 insertions(+), 5 deletions(-) > > diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c > index 1b30014..39f6bee 100644 > --- a/xen/common/sched_rt.c > +++ b/xen/common/sched_rt.c > @@ -171,11 +171,14 @@ static void repl_timer_handler(void *data); > struct rt_private { > spinlock_t lock;/* the global coarse-grained lock */ > struct list_head sdom; /* list of availalbe domains, used for dump > */ > + > struct list_head runq; /* ordered list of runnable vcpus */ > struct list_head depletedq; /* unordered list of depleted vcpus */ > + > +struct timer *repl_timer; /* replenishment timer */ > struct list_head replq; /* ordered list of vcpus that need > replenishment */ > + > cpumask_t tickled; /* cpus been tickled */ > -struct timer *repl_timer; /* replenishment timer */ > }; > > /* > @@ -185,10 +188,6 @@ struct rt_vcpu { > struct list_head q_elem; /* on the runq/depletedq list */ > struct list_head replq_elem; /* on the replenishment events list */ > > -/* Up-pointers */ > -struct rt_dom *sdom; > -struct vcpu *vcpu; > - > /* VCPU parameters, in nanoseconds */ > s_time_t period; > s_time_t budget; > @@ -198,6 +197,10 @@ struct rt_vcpu { > s_time_t last_start; /* last start time */ > s_time_t cur_deadline; /* current deadline for EDF */ > > +/* Up-pointers */ > +struct rt_dom *sdom; > +struct vcpu *vcpu; > + > unsigned flags; /* mark __RTDS_scheduled, etc.. */ > }; > Reviewed-by: Meng Xu BTW, Dario, I'm wondering if you used any tool to give hints about how to arrange the fields in a structure or you just did it manually? Thanks, Meng --- Meng Xu PhD Candidate in Computer and Information Science University of Pennsylvania http://www.cis.upenn.edu/~mengxu/ ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [linux-3.18 test] 112085: regressions - trouble: blocked/broken/fail/pass
flight 112085 linux-3.18 real [real] http://logs.test-lab.xenproject.org/osstest/logs/112085/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-armhf-armhf-xl-arndale 4 host-install(4)broken REGR. vs. 111920 test-armhf-armhf-libvirt-raw 7 xen-boot fail REGR. vs. 111920 test-amd64-i386-xl-qemuu-debianhvm-amd64 16 guest-localmigrate/x10 fail REGR. vs. 111920 Tests which did not succeed, but are not blocking: test-arm64-arm64-libvirt-xsm 1 build-check(1) blocked n/a test-arm64-arm64-xl 1 build-check(1) blocked n/a test-arm64-arm64-examine 1 build-check(1) blocked n/a test-arm64-arm64-xl-credit2 1 build-check(1) blocked n/a test-arm64-arm64-xl-xsm 1 build-check(1) blocked n/a test-amd64-amd64-xl-qemuu-win7-amd64 16 guest-localmigrate/x10 fail like 111893 test-amd64-i386-xl-qemut-win7-amd64 16 guest-localmigrate/x10 fail like 111893 test-amd64-amd64-xl-qemut-win7-amd64 17 guest-stopfail like 111893 test-amd64-i386-freebsd10-amd64 19 guest-start/freebsd.repeat fail like 111920 test-armhf-armhf-libvirt-xsm 14 saverestore-support-checkfail like 111920 test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop fail like 111920 test-armhf-armhf-libvirt 14 saverestore-support-checkfail like 111920 test-amd64-amd64-xl-rtds 10 debian-install fail like 111920 test-amd64-i386-libvirt 13 migrate-support-checkfail never pass test-amd64-i386-libvirt-xsm 13 migrate-support-checkfail never pass test-amd64-amd64-xl-qemuu-ws16-amd64 10 windows-installfail never pass test-amd64-amd64-libvirt 13 migrate-support-checkfail never pass test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-amd64-i386-xl-qemuu-ws16-amd64 13 guest-saverestore fail never pass test-amd64-amd64-libvirt-vhd 12 migrate-support-checkfail never pass test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2 fail never pass test-amd64-i386-xl-qemut-ws16-amd64 13 guest-saverestore fail never pass build-arm64-pvops 6 kernel-build fail never pass test-armhf-armhf-xl 13 migrate-support-checkfail never pass test-armhf-armhf-xl-xsm 13 migrate-support-checkfail never pass test-armhf-armhf-xl-xsm 14 saverestore-support-checkfail never pass test-armhf-armhf-xl 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-cubietruck 13 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-multivcpu 13 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 14 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-xsm 13 migrate-support-checkfail never pass test-amd64-amd64-xl-qemut-ws16-amd64 10 windows-installfail never pass test-armhf-armhf-xl-rtds 13 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit2 13 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 14 saverestore-support-checkfail never pass test-armhf-armhf-libvirt 13 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 12 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 13 saverestore-support-checkfail never pass test-amd64-amd64-xl-qemut-win10-i386 10 windows-installfail never pass test-amd64-i386-xl-qemuu-win10-i386 10 windows-install fail never pass test-amd64-amd64-xl-qemuu-win10-i386 10 windows-installfail never pass test-amd64-i386-xl-qemut-win10-i386 10 windows-install fail never pass version targeted for testing: linuxdd8b674caeef9381345a6369fba29d425ff433f3 baseline version: linux4d29e8c0e9319ce9d391c57d3133306c05b6cef5 Last test of basis 111920 2017-07-17 06:21:48 Z4 days Testing same since 112085 2017-07-21 06:22:28 Z0 days1 attempts People who touched revisions under test: Adam BorowskiAmit Pundir Andrew Morton Andrey Konovalov Arend van Spriel Ben Hutchings Cong Wang Cyril Bur Dan Carpenter David Ahern
Re: [Xen-devel] [PATCH 6/6] xen: sched: optimize exclusive pinning case (Credit1 & 2)
On 06/23/2017 11:55 AM, Dario Faggioli wrote: > Exclusive pinning of vCPUs is used, sometimes, for > achieving the highest level of determinism, and the > least possible overhead, for the vCPUs in question. > > Although static 1:1 pinning is not recommended, for > general use cases, optimizing the tickling code (of > Credit1 and Credit2) is easy and cheap enough, so go > for it. > > Signed-off-by: Dario Faggioli> --- > Cc: George Dunlap > Cc: Anshul Makkar > --- > xen/common/sched_credit.c| 19 +++ > xen/common/sched_credit2.c | 21 - > xen/include/xen/perfc_defn.h |1 + > 3 files changed, 40 insertions(+), 1 deletion(-) > > diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c > index 4f6330e..85e014d 100644 > --- a/xen/common/sched_credit.c > +++ b/xen/common/sched_credit.c > @@ -429,6 +429,24 @@ static inline void __runq_tickle(struct csched_vcpu *new) > idlers_empty = cpumask_empty(_mask); > > /* > + * Exclusive pinning is when a vcpu has hard-affinity with only one > + * cpu, and there is no other vcpu that has hard-affinity with that > + * same cpu. This is infrequent, but if it happens, is for achieving > + * the most possible determinism, and least possible overhead for > + * the vcpus in question. > + * > + * Try to identify the vast majority of these situations, and deal > + * with them quickly. > + */ > +if ( unlikely(cpumask_cycle(cpu, new->vcpu->cpu_hard_affinity) == cpu && Won't this check entail a full "loop" of the cpumask? It's cheap enough if nr_cpu_ids is small; but don't we support (theoretically) 4096 logical cpus? It seems like having a vcpu flag that identifies a vcpu as being pinned would be a more efficient way to do this. That way we could run this check once whenever the hard affinity changed, rather than every time we want to think about where to run this vcpu. What do you think? -George ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH] xen-blkfront: Fix handling of non-supported operations
This patch fixes the following sparse warnings: drivers/block/xen-blkfront.c:916:45: warning: incorrect type in argument 2 (different base types) drivers/block/xen-blkfront.c:916:45:expected restricted blk_status_t [usertype] error drivers/block/xen-blkfront.c:916:45:got int [signed] error drivers/block/xen-blkfront.c:1599:47: warning: incorrect type in assignment (different base types) drivers/block/xen-blkfront.c:1599:47:expected int [signed] error drivers/block/xen-blkfront.c:1599:47:got restricted blk_status_t [usertype] drivers/block/xen-blkfront.c:1607:55: warning: incorrect type in assignment (different base types) drivers/block/xen-blkfront.c:1607:55:expected int [signed] error drivers/block/xen-blkfront.c:1607:55:got restricted blk_status_t [usertype] drivers/block/xen-blkfront.c:1625:55: warning: incorrect type in assignment (different base types) drivers/block/xen-blkfront.c:1625:55:expected int [signed] error drivers/block/xen-blkfront.c:1625:55:got restricted blk_status_t [usertype] drivers/block/xen-blkfront.c:1628:62: warning: restricted blk_status_t degrades to integer Compile-tested only. Fixes: commit 2a842acab109 ("block: introduce new block status code type") Signed-off-by: Bart Van AsscheCc: Christoph Hellwig Cc: Konrad Rzeszutek Wilk Cc: Roger Pau Monné Cc: Cc: --- drivers/block/xen-blkfront.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c index c852ed3c01d5..1799bba74390 100644 --- a/drivers/block/xen-blkfront.c +++ b/drivers/block/xen-blkfront.c @@ -111,7 +111,7 @@ struct blk_shadow { }; struct blkif_req { - int error; + blk_status_terror; }; static inline struct blkif_req *blkif_req(struct request *rq) @@ -1616,7 +1616,7 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id) if (unlikely(bret->status == BLKIF_RSP_EOPNOTSUPP)) { printk(KERN_WARNING "blkfront: %s: %s op failed\n", info->gd->disk_name, op_name(bret->operation)); - blkif_req(req)->error = -EOPNOTSUPP; + blkif_req(req)->error = BLK_STS_NOTSUPP; } if (unlikely(bret->status == BLKIF_RSP_ERROR && rinfo->shadow[id].req.u.rw.nr_segments == 0)) { -- 2.13.2 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 4/6] xen: credit2: rearrange members of control structures
On 06/23/2017 11:55 AM, Dario Faggioli wrote: > With the aim of improving memory size and layout, and > at the same time trying to put related fields reside > in the same cacheline. > > Here's a summary of the output of `pahole`, with and > without this patch, for the affected data structures. > > csched2_runqueue_data: > * Before: > size: 216, cachelines: 4, members: 14 > sum members: 208, holes: 2, sum holes: 8 > last cacheline: 24 bytes > * After: > size: 208, cachelines: 4, members: 14 > last cacheline: 16 bytes > > csched2_private: > * Before: > size: 120, cachelines: 2, members: 8 > sum members: 112, holes: 1, sum holes: 4 > padding: 4 > last cacheline: 56 bytes > * After: > size: 112, cachelines: 2, members: 8 > last cacheline: 48 bytes > > csched2_vcpu: > * Before: > size: 112, cachelines: 2, members: 14 > sum members: 108, holes: 1, sum holes: 4 > last cacheline: 48 bytes > * After: > size: 112, cachelines: 2, members: 14 > padding: 4 > last cacheline: 48 bytes > > While there, improve the wording, style and alignment > of comments too. > > Signed-off-by: Dario FaggioliI haven't taken a careful look at these; the idea sounds good and I'll trust that you've taken a careful look at them: Acked-by: George Dunlap ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 5/6] xen: RTDS: rearrange members of control structures
On 06/23/2017 11:55 AM, Dario Faggioli wrote: > Nothing changed in `pahole` output, in terms of holes > and padding, but some fields have been moved, to put > related members in same cache line. > > Signed-off-by: Dario FaggioliAcked-by: George Dunlap > --- > Cc: Meng Xu > Cc: George Dunlap > --- > xen/common/sched_rt.c | 13 - > 1 file changed, 8 insertions(+), 5 deletions(-) > > diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c > index 1b30014..39f6bee 100644 > --- a/xen/common/sched_rt.c > +++ b/xen/common/sched_rt.c > @@ -171,11 +171,14 @@ static void repl_timer_handler(void *data); > struct rt_private { > spinlock_t lock;/* the global coarse-grained lock */ > struct list_head sdom; /* list of availalbe domains, used for dump > */ > + > struct list_head runq; /* ordered list of runnable vcpus */ > struct list_head depletedq; /* unordered list of depleted vcpus */ > + > +struct timer *repl_timer; /* replenishment timer */ > struct list_head replq; /* ordered list of vcpus that need > replenishment */ > + > cpumask_t tickled; /* cpus been tickled */ > -struct timer *repl_timer; /* replenishment timer */ > }; > > /* > @@ -185,10 +188,6 @@ struct rt_vcpu { > struct list_head q_elem; /* on the runq/depletedq list */ > struct list_head replq_elem; /* on the replenishment events list */ > > -/* Up-pointers */ > -struct rt_dom *sdom; > -struct vcpu *vcpu; > - > /* VCPU parameters, in nanoseconds */ > s_time_t period; > s_time_t budget; > @@ -198,6 +197,10 @@ struct rt_vcpu { > s_time_t last_start; /* last start time */ > s_time_t cur_deadline; /* current deadline for EDF */ > > +/* Up-pointers */ > +struct rt_dom *sdom; > +struct vcpu *vcpu; > + > unsigned flags; /* mark __RTDS_scheduled, etc.. */ > }; > > ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] docs: fix superpage default value
On Fri, Jul 21, 2017 at 05:51:02PM +0100, Wei Liu wrote: > On Fri, Jul 21, 2017 at 12:44:18PM -0400, Konrad Rzeszutek Wilk wrote: > > On Thu, Jul 20, 2017 at 01:57:17PM +0100, Wei Liu wrote: > > > On Thu, Jul 20, 2017 at 12:49:37PM +0100, Andrew Cooper wrote: > > > > On 20/07/17 12:47, Wei Liu wrote: > > > > > On Thu, Jul 20, 2017 at 12:45:38PM +0100, Roger Pau Monné wrote: > > > > > > On Thu, Jul 20, 2017 at 12:35:56PM +0100, Wei Liu wrote: > > > > > > > The code says it defaults to false. > > > > > > > > > > > > > > Signed-off-by: Wei Liu> > > > > > > --- > > > > > > > Cc: Andrew Cooper > > > > > > > Cc: George Dunlap > > > > > > > Cc: Ian Jackson > > > > > > > Cc: Jan Beulich > > > > > > > Cc: Konrad Rzeszutek Wilk > > > > > > > Cc: Stefano Stabellini > > > > > > > Cc: Tim Deegan > > > > > > > Cc: Wei Liu > > > > > > > --- > > > > > > > docs/misc/xen-command-line.markdown | 2 +- > > > > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > > > > > > > diff --git a/docs/misc/xen-command-line.markdown > > > > > > > b/docs/misc/xen-command-line.markdown > > > > > > > index 3f90c3b7a8..f524294aa6 100644 > > > > > > > --- a/docs/misc/xen-command-line.markdown > > > > > > > +++ b/docs/misc/xen-command-line.markdown > > > > > > > @@ -136,7 +136,7 @@ mode during S3 resume. > > > > > > > ### allowsuperpage > > > > > > > > `= ` > > > > > > > -> Default: `true` > > > > > > > +> Default: `false` > > > > > > > Permit Xen to use superpages when performing memory management. > > > > > > I'm not an expert on Xen MM code, but isn't this intended for PV > > > > > > guests? The description above makes it look like this is for Xen > > > > > > itself, but AFAICT from skimming over the code this seems to be a PV > > > > > > feature, in which case the text above should be fixed to prevent > > > > > > confusion. > > > > > I believe it is PV only, but I'm not 100% sure. > > > > > > > > > > I would love to fix the text as well if possible. > > > > > > > > I'm fairly sure this option applies exclusively to PV superpages. Double > > > > check the logic through the code, but I think (since dropping 32bit > > > > support), we have no configuration where Xen might not be able to use > > > > superpages. > > > > > > > > > > So we can just delete this option and make Xen always use superpage? > > > That would be fine by me, too. > > > > Can we just nuke the code altogther? > > > > Oracle is not using it anymore. > > Sure! I was about to ask you about that. > > I'm happy to submit patches to nuke it from both the hypervisor and toolstack. Feel free to add Acked-by: Konrad Rzeszutek Wilk on them :-) ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [xen-devel][xen/Arm]xen fail to boot on omap5 board
Hello Julien, On 21.07.17 15:52, Julien Grall wrote: This is very early boot in head.S so having the full log will not really help here... What is more interesting is where the different modules have been loaded in memory: - Device Tree - Kernel - Xen - Initramfs (if any) Well, actually I supposed HYP mode is not enabled. It was tricky some time ago, not sure if it was upstreamed to u-boot. But yep, mentioned print is after HYP mode check. IMHO the log starting from the board power on moment will provide more precise info about the situation. -- *Andrii Anisov* ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 3/6] xen: credit: rearrange members of control structures
On 06/23/2017 11:55 AM, Dario Faggioli wrote: > With the aim of improving memory size and layout, and > at the same time trying to put related fields reside > in the same cacheline. > > Here's a summary of the output of `pahole`, with and > without this patch, for the affected data structures. > > csched_pcpu: > * Before: > size: 88, cachelines: 2, members: 6 > sum members: 80, holes: 1, sum holes: 4 > padding: 4 > paddings: 1, sum paddings: 5 > last cacheline: 24 bytes > * After: > size: 80, cachelines: 2, members: 6 > paddings: 1, sum paddings: 5 > last cacheline: 16 bytes > > csched_vcpu: > * Before: > size: 72, cachelines: 2, members: 9 > padding: 2 > last cacheline: 8 bytes > * After: > same numbers, but move some fields to put > related fields in same cache line. > > csched_private: > * Before: > size: 152, cachelines: 3, members: 17 > sum members: 140, holes: 2, sum holes: 8 > padding: 4 > paddings: 1, sum paddings: 5 > last cacheline: 24 bytes > * After: > same numbers, but move some fields to put > related fields in same cache line. > > Signed-off-by: Dario FaggioliAcked-by: George Dunlap > --- > Cc: George Dunlap > Cc: Anshul Makkar > --- > xen/common/sched_credit.c | 41 ++--- > 1 file changed, 26 insertions(+), 15 deletions(-) > > diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c > index efdf6bf..4f6330e 100644 > --- a/xen/common/sched_credit.c > +++ b/xen/common/sched_credit.c > @@ -169,10 +169,12 @@ integer_param("sched_credit_tslice_ms", > sched_credit_tslice_ms); > struct csched_pcpu { > struct list_head runq; > uint32_t runq_sort_last; > -struct timer ticker; > -unsigned int tick; > + > unsigned int idle_bias; > unsigned int nr_runnable; > + > +unsigned int tick; > +struct timer ticker; > }; > > /* > @@ -181,13 +183,18 @@ struct csched_pcpu { > struct csched_vcpu { > struct list_head runq_elem; > struct list_head active_vcpu_elem; > + > +/* Up-pointers */ > struct csched_dom *sdom; > struct vcpu *vcpu; > -atomic_t credit; > -unsigned int residual; > + > s_time_t start_time; /* When we were scheduled (used for credit) */ > unsigned flags; > -int16_t pri; > +int pri; > + > +atomic_t credit; > +unsigned int residual; > + > #ifdef CSCHED_STATS > struct { > int credit_last; > @@ -219,21 +226,25 @@ struct csched_dom { > struct csched_private { > /* lock for the whole pluggable scheduler, nests inside cpupool_lock */ > spinlock_t lock; > -struct list_head active_sdom; > -uint32_t ncpus; > -struct timer master_ticker; > -unsigned int master; > + > cpumask_var_t idlers; > cpumask_var_t cpus; > +uint32_t *balance_bias; > +uint32_t runq_sort; > +unsigned int ratelimit_us; > + > +/* Period of master and tick in milliseconds */ > +unsigned int tslice_ms, tick_period_us, ticks_per_tslice; > +uint32_t ncpus; > + > +struct list_head active_sdom; > uint32_t weight; > uint32_t credit; > int credit_balance; > -uint32_t runq_sort; > -uint32_t *balance_bias; > -unsigned ratelimit_us; > -/* Period of master and tick in milliseconds */ > -unsigned tslice_ms, tick_period_us, ticks_per_tslice; > -unsigned credits_per_tslice; > +unsigned int credits_per_tslice; > + > +unsigned int master; > +struct timer master_ticker; > }; > > static void csched_tick(void *_cpu); > ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 2/6] xen: credit2: make the cpu to runqueue map per-cpu
On 06/23/2017 11:54 AM, Dario Faggioli wrote: > Instead of keeping an NR_CPUS big array of int-s, > directly inside csched2_private, use a per-cpu > variable. > > That's especially beneficial (in terms of saved > memory) when there are more instance of Credit2 (in > different cpupools), and also helps fitting > csched2_private itself into CPU caches. > > Signed-off-by: Dario FaggioliSounds good: Acked-by: George Dunlap > --- > Cc: George Dunlap > Cc: Anshul Makkar > --- > xen/common/sched_credit2.c | 33 - > 1 file changed, 20 insertions(+), 13 deletions(-) > > diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c > index 10d9488..15862f2 100644 > --- a/xen/common/sched_credit2.c > +++ b/xen/common/sched_credit2.c > @@ -383,7 +383,6 @@ struct csched2_private { > > struct list_head sdom; /* Used mostly for dump keyhandler. */ > > -int runq_map[NR_CPUS]; > cpumask_t active_queues; /* Queues which may have active cpus */ > struct csched2_runqueue_data *rqd; > > @@ -393,6 +392,14 @@ struct csched2_private { > }; > > /* > + * Physical CPU > + * > + * The only per-pCPU information we need to maintain is of which runqueue > + * each CPU is part of. > + */ > +static DEFINE_PER_CPU(int, runq_map); > + > +/* > * Virtual CPU > */ > struct csched2_vcpu { > @@ -448,16 +455,16 @@ static inline struct csched2_dom *csched2_dom(const > struct domain *d) > } > > /* CPU to runq_id macro */ > -static inline int c2r(const struct scheduler *ops, unsigned int cpu) > +static inline int c2r(unsigned int cpu) > { > -return csched2_priv(ops)->runq_map[(cpu)]; > +return per_cpu(runq_map, cpu); > } > > /* CPU to runqueue struct macro */ > static inline struct csched2_runqueue_data *c2rqd(const struct scheduler > *ops, >unsigned int cpu) > { > -return _priv(ops)->rqd[c2r(ops, cpu)]; > +return _priv(ops)->rqd[c2r(cpu)]; > } > > /* > @@ -1082,7 +1089,7 @@ runq_insert(const struct scheduler *ops, struct > csched2_vcpu *svc) > ASSERT(spin_is_locked(per_cpu(schedule_data, cpu).schedule_lock)); > > ASSERT(!vcpu_on_runq(svc)); > -ASSERT(c2r(ops, cpu) == c2r(ops, svc->vcpu->processor)); > +ASSERT(c2r(cpu) == c2r(svc->vcpu->processor)); > > ASSERT(>rqd->runq == runq); > ASSERT(!is_idle_vcpu(svc->vcpu)); > @@ -1733,7 +1740,7 @@ csched2_cpu_pick(const struct scheduler *ops, struct > vcpu *vc) > if ( min_rqi == -1 ) > { > new_cpu = get_fallback_cpu(svc); > -min_rqi = c2r(ops, new_cpu); > +min_rqi = c2r(new_cpu); > min_avgload = prv->rqd[min_rqi].b_avgload; > goto out_up; > } > @@ -2622,7 +2629,7 @@ csched2_schedule( > unsigned tasklet:8, idle:8, smt_idle:8, tickled:8; > } d; > d.cpu = cpu; > -d.rq_id = c2r(ops, cpu); > +d.rq_id = c2r(cpu); > d.tasklet = tasklet_work_scheduled; > d.idle = is_idle_vcpu(current); > d.smt_idle = cpumask_test_cpu(cpu, >smt_idle); > @@ -2783,7 +2790,7 @@ dump_pcpu(const struct scheduler *ops, int cpu) > #define cpustr keyhandler_scratch > > cpumask_scnprintf(cpustr, sizeof(cpustr), per_cpu(cpu_sibling_mask, > cpu)); > -printk("CPU[%02d] runq=%d, sibling=%s, ", cpu, c2r(ops, cpu), cpustr); > +printk("CPU[%02d] runq=%d, sibling=%s, ", cpu, c2r(cpu), cpustr); > cpumask_scnprintf(cpustr, sizeof(cpustr), per_cpu(cpu_core_mask, cpu)); > printk("core=%s\n", cpustr); > > @@ -2930,7 +2937,7 @@ init_pdata(struct csched2_private *prv, unsigned int > cpu) > } > > /* Set the runqueue map */ > -prv->runq_map[cpu] = rqi; > +per_cpu(runq_map, cpu) = rqi; > > __cpumask_set_cpu(cpu, >idle); > __cpumask_set_cpu(cpu, >active); > @@ -3034,7 +3041,7 @@ csched2_deinit_pdata(const struct scheduler *ops, void > *pcpu, int cpu) > ASSERT(!pcpu && cpumask_test_cpu(cpu, >initialized)); > > /* Find the old runqueue and remove this cpu from it */ > -rqi = prv->runq_map[cpu]; > +rqi = per_cpu(runq_map, cpu); > > rqd = prv->rqd + rqi; > > @@ -3055,6 +3062,8 @@ csched2_deinit_pdata(const struct scheduler *ops, void > *pcpu, int cpu) > else if ( rqd->pick_bias == cpu ) > rqd->pick_bias = cpumask_first(>active); > > +per_cpu(runq_map, cpu) = -1; > + > spin_unlock(>lock); > > __cpumask_clear_cpu(cpu, >initialized); > @@ -3121,10 +3130,8 @@ csched2_init(struct scheduler *ops) > return -ENOMEM; > } > for ( i = 0; i < nr_cpu_ids; i++ ) > -{ > -prv->runq_map[i] = -1; > prv->rqd[i].id = -1; > -} > + > /* initialize ratelimit */ > prv->ratelimit_us = sched_ratelimit_us; > > ___
Re: [Xen-devel] [PATCH] docs: fix superpage default value
On Fri, Jul 21, 2017 at 12:44:18PM -0400, Konrad Rzeszutek Wilk wrote: > On Thu, Jul 20, 2017 at 01:57:17PM +0100, Wei Liu wrote: > > On Thu, Jul 20, 2017 at 12:49:37PM +0100, Andrew Cooper wrote: > > > On 20/07/17 12:47, Wei Liu wrote: > > > > On Thu, Jul 20, 2017 at 12:45:38PM +0100, Roger Pau Monné wrote: > > > > > On Thu, Jul 20, 2017 at 12:35:56PM +0100, Wei Liu wrote: > > > > > > The code says it defaults to false. > > > > > > > > > > > > Signed-off-by: Wei Liu> > > > > > --- > > > > > > Cc: Andrew Cooper > > > > > > Cc: George Dunlap > > > > > > Cc: Ian Jackson > > > > > > Cc: Jan Beulich > > > > > > Cc: Konrad Rzeszutek Wilk > > > > > > Cc: Stefano Stabellini > > > > > > Cc: Tim Deegan > > > > > > Cc: Wei Liu > > > > > > --- > > > > > > docs/misc/xen-command-line.markdown | 2 +- > > > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > > > > > diff --git a/docs/misc/xen-command-line.markdown > > > > > > b/docs/misc/xen-command-line.markdown > > > > > > index 3f90c3b7a8..f524294aa6 100644 > > > > > > --- a/docs/misc/xen-command-line.markdown > > > > > > +++ b/docs/misc/xen-command-line.markdown > > > > > > @@ -136,7 +136,7 @@ mode during S3 resume. > > > > > > ### allowsuperpage > > > > > > > `= ` > > > > > > -> Default: `true` > > > > > > +> Default: `false` > > > > > > Permit Xen to use superpages when performing memory management. > > > > > I'm not an expert on Xen MM code, but isn't this intended for PV > > > > > guests? The description above makes it look like this is for Xen > > > > > itself, but AFAICT from skimming over the code this seems to be a PV > > > > > feature, in which case the text above should be fixed to prevent > > > > > confusion. > > > > I believe it is PV only, but I'm not 100% sure. > > > > > > > > I would love to fix the text as well if possible. > > > > > > I'm fairly sure this option applies exclusively to PV superpages. Double > > > check the logic through the code, but I think (since dropping 32bit > > > support), we have no configuration where Xen might not be able to use > > > superpages. > > > > > > > So we can just delete this option and make Xen always use superpage? > > That would be fine by me, too. > > Can we just nuke the code altogther? > > Oracle is not using it anymore. Sure! I was about to ask you about that. I'm happy to submit patches to nuke it from both the hypervisor and toolstack. ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 1/6] xen: credit2: allocate runqueue data structure dynamically
On 06/23/2017 11:54 AM, Dario Faggioli wrote: > Instead of keeping an NR_CPUS big array of csched2_runqueue_data > elements, directly inside the csched2_private structure, allocate > it dynamically. > > This has two positive effects: > - reduces the size of csched2_private sensibly, which is > especially good in case there are more instance of Credit2 > (in different cpupools), and is also good from the point > of view of fitting the struct into CPU caches; > - we can use nr_cpu_ids as array size, which may be sensibly > smaller than NR_CPUS > > Signed-off-by: Dario FaggioliLooks good, thanks: Acked-by: George Dunlap > --- > Cc: George Dunlap > Cc: Anshul Makkar > --- > xen/common/sched_credit2.c | 16 > 1 file changed, 12 insertions(+), 4 deletions(-) > > diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c > index 126417c..10d9488 100644 > --- a/xen/common/sched_credit2.c > +++ b/xen/common/sched_credit2.c > @@ -385,7 +385,7 @@ struct csched2_private { > > int runq_map[NR_CPUS]; > cpumask_t active_queues; /* Queues which may have active cpus */ > -struct csched2_runqueue_data rqd[NR_CPUS]; > +struct csched2_runqueue_data *rqd; > > unsigned int load_precision_shift; > unsigned int load_window_shift; > @@ -3099,9 +3099,11 @@ csched2_init(struct scheduler *ops) > printk(XENLOG_INFO "load tracking window length %llu ns\n", > 1ULL << opt_load_window_shift); > > -/* Basically no CPU information is available at this point; just > +/* > + * Basically no CPU information is available at this point; just > * set up basic structures, and a callback when the CPU info is > - * available. */ > + * available. > + */ > > prv = xzalloc(struct csched2_private); > if ( prv == NULL ) > @@ -3111,7 +3113,13 @@ csched2_init(struct scheduler *ops) > rwlock_init(>lock); > INIT_LIST_HEAD(>sdom); > > -/* But un-initialize all runqueues */ > +/* Allocate all runqueues and mark them as un-initialized */ > +prv->rqd = xzalloc_array(struct csched2_runqueue_data, nr_cpu_ids); > +if ( !prv->rqd ) > +{ > +xfree(prv); > +return -ENOMEM; > +} > for ( i = 0; i < nr_cpu_ids; i++ ) > { > prv->runq_map[i] = -1; > ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] docs: fix superpage default value
On Thu, Jul 20, 2017 at 01:57:17PM +0100, Wei Liu wrote: > On Thu, Jul 20, 2017 at 12:49:37PM +0100, Andrew Cooper wrote: > > On 20/07/17 12:47, Wei Liu wrote: > > > On Thu, Jul 20, 2017 at 12:45:38PM +0100, Roger Pau Monné wrote: > > > > On Thu, Jul 20, 2017 at 12:35:56PM +0100, Wei Liu wrote: > > > > > The code says it defaults to false. > > > > > > > > > > Signed-off-by: Wei Liu> > > > > --- > > > > > Cc: Andrew Cooper > > > > > Cc: George Dunlap > > > > > Cc: Ian Jackson > > > > > Cc: Jan Beulich > > > > > Cc: Konrad Rzeszutek Wilk > > > > > Cc: Stefano Stabellini > > > > > Cc: Tim Deegan > > > > > Cc: Wei Liu > > > > > --- > > > > > docs/misc/xen-command-line.markdown | 2 +- > > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > > > diff --git a/docs/misc/xen-command-line.markdown > > > > > b/docs/misc/xen-command-line.markdown > > > > > index 3f90c3b7a8..f524294aa6 100644 > > > > > --- a/docs/misc/xen-command-line.markdown > > > > > +++ b/docs/misc/xen-command-line.markdown > > > > > @@ -136,7 +136,7 @@ mode during S3 resume. > > > > > ### allowsuperpage > > > > > > `= ` > > > > > -> Default: `true` > > > > > +> Default: `false` > > > > > Permit Xen to use superpages when performing memory management. > > > > I'm not an expert on Xen MM code, but isn't this intended for PV > > > > guests? The description above makes it look like this is for Xen > > > > itself, but AFAICT from skimming over the code this seems to be a PV > > > > feature, in which case the text above should be fixed to prevent > > > > confusion. > > > I believe it is PV only, but I'm not 100% sure. > > > > > > I would love to fix the text as well if possible. > > > > I'm fairly sure this option applies exclusively to PV superpages. Double > > check the logic through the code, but I think (since dropping 32bit > > support), we have no configuration where Xen might not be able to use > > superpages. > > > > So we can just delete this option and make Xen always use superpage? > That would be fine by me, too. Can we just nuke the code altogther? Oracle is not using it anymore. > > ___ > Xen-devel mailing list > Xen-devel@lists.xen.org > https://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] xen/link: Move .data.rel.ro sections into .rodata for final link
On 21/07/17 11:43, Julien Grall wrote: On 20/07/17 17:54, Wei Liu wrote: On Thu, Jul 20, 2017 at 05:46:50PM +0100, Wei Liu wrote: CC relevant maintainers On Thu, Jul 20, 2017 at 05:20:43PM +0200, David Woodhouse wrote: From: David WoodhouseThis includes stuff lke the hypercall tables which we really want lke -> like to be read-only. And they were going into .data.read-mostly. Signed-off-by: David Woodhouse Reviewed-by: Wei Liu Acked-by: Julien Grall Acked-by: Andrew Cooper ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] Regarding hdmi sharing in xen
Dear George, First I would state terms as following: * Sharing HW - using the same hardware by different domains using PV drivers, so actually one domain accessing the HW directly and serves other domains. * Assigning HW - providing access to some particular HW for some particular domain. E.g. peripherals by default are assigned to Dom0, but using Passthrough some could be assigned to DomU. On 19.07.17 07:41, George John wrote: Our plan is to run Linux as Dom0 and Android as DomU. The Linux portion will be having 1 HDMI display and the Android porion will be having 1 HDMI. Can we share the DU and use the HDMI port as it is in the guests? IIRC last year it was shown a setup with Salvator-X board, where one HDMI display was *assigned* to Linux (Dom0) and one HDMI display was *assigned* to Android. So such setup is technically feasible. If a domain has assigned a display to use solely, it can share that display to other domains using displif protocol [1]. [1] https://lists.xenproject.org/archives/html/xen-devel/2017-04/msg00470.html -- *Andrii Anisov* ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH XTF v3] Implement pv_read_some
On 21/07/17 08:01, Felix Schmoll wrote: Much better. Just one final question. Do you intend this function to block until data becomes available? (because that appears to be how it behaves.) Yes. I could split it up into two functions if that bothers you. Or do you just want me to include that in the comment? Just include it in the comment. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH XTF] Functional: Add a UMIP test
On 21/07/17 02:42, Boqun Feng wrote: On Thu, Jul 20, 2017 at 10:38:59AM +0100, Andrew Cooper wrote: On 20/07/17 06:29, Boqun Feng (Intel) wrote: Add a "umip" test for the User-Model Instruction Prevention. The test simply tries to run sgdt/sidt/sldt/str/smsw in guest user-mode with CR4_UMIP = 1. Signed-off-by: Boqun Feng (Intel)Thankyou very much for providing a test. As a general remark, how have you found XTF to use? Great tool! Especially when you need to run Xen in a simulated environment like simics and want to test something, bringing up even a simple Linux domainU would be a lot of pain. ;-) While XTF just works like a charm and it's easy to write a test case, though according to your comments I'm now very good at it now ;-) I'm glad to hear this. +void test_main(void) +{ +unsigned long exp; +unsigned long cr4 = read_cr4(); This is all good. However, it is insufficient to properly test the UMIP behaviour. Please look at the cpuid-faulting to see how I structured things. In particular, you should: 1) Test the regular behaviour of the instructions. 2) Search for UMIP, skipping if it isn't available. 3) Enable UMIP. Maybe I also need to provide a write_cr4_safe() similar as wrmsr_safe(), in case that cpuid indicates UMIP supported while UMIP CR4 bit is not allowed to set, which means a bug? Yes. You are entirely correct. Feel free to put write_cr4_safe() in lib.h along with the other *_safe() variants. 4) Test the instructions again, this time checking for #GP in userspace. 5) Disable UMIP. 6) Check again for regular behaviour. This way, you also check that turning it off works as well as turning it on. In addition, each test needs to check more than just the block of tests below. 1) The tests should run the instructions natively, and forced through the instruction emulator. See the FPU Exception Emulation test which is along the same lines. One thing to be aware of though is that in older versions of Xen, the s??? instructions weren't implemented in the instruction emulator, so the test should tolerate and skip if it gets #UD back. Rogar that. :) Roger. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [ovmf test] 112091: all pass - PUSHED
flight 112091 ovmf real [real] http://logs.test-lab.xenproject.org/osstest/logs/112091/ Perfect :-) All tests in this flight passed as required version targeted for testing: ovmf 1683ecec41a7c944783c51efa75375f1e0a71d08 baseline version: ovmf 79aac4dd756bb2809cdcb74f7d2ae8a630457c99 Last test of basis 112039 2017-07-20 06:18:11 Z1 days Testing same since 112091 2017-07-21 10:17:54 Z0 days1 attempts People who touched revisions under test: Star Zengjobs: build-amd64-xsm pass build-i386-xsm pass build-amd64 pass build-i386 pass build-amd64-libvirt pass build-i386-libvirt pass build-amd64-pvopspass build-i386-pvops pass test-amd64-amd64-xl-qemuu-ovmf-amd64 pass test-amd64-i386-xl-qemuu-ovmf-amd64 pass sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Pushing revision : + branch=ovmf + revision=1683ecec41a7c944783c51efa75375f1e0a71d08 + . ./cri-lock-repos ++ . ./cri-common +++ . ./cri-getconfig +++ umask 002 +++ getrepos getconfig Repos perl -e ' use Osstest; readglobalconfig(); print $c{"Repos"} or die $!; ' +++ local repos=/home/osstest/repos +++ '[' -z /home/osstest/repos ']' +++ '[' '!' -d /home/osstest/repos ']' +++ echo /home/osstest/repos ++ repos=/home/osstest/repos ++ repos_lock=/home/osstest/repos/lock ++ '[' x '!=' x/home/osstest/repos/lock ']' ++ OSSTEST_REPOS_LOCK_LOCKED=/home/osstest/repos/lock ++ exec with-lock-ex -w /home/osstest/repos/lock ./ap-push ovmf 1683ecec41a7c944783c51efa75375f1e0a71d08 + branch=ovmf + revision=1683ecec41a7c944783c51efa75375f1e0a71d08 + . ./cri-lock-repos ++ . ./cri-common +++ . ./cri-getconfig +++ umask 002 +++ getrepos getconfig Repos perl -e ' use Osstest; readglobalconfig(); print $c{"Repos"} or die $!; ' +++ local repos=/home/osstest/repos +++ '[' -z /home/osstest/repos ']' +++ '[' '!' -d /home/osstest/repos ']' +++ echo /home/osstest/repos ++ repos=/home/osstest/repos ++ repos_lock=/home/osstest/repos/lock ++ '[' x/home/osstest/repos/lock '!=' x/home/osstest/repos/lock ']' + . ./cri-common ++ . ./cri-getconfig ++ umask 002 + select_xenbranch + case "$branch" in + tree=ovmf + xenbranch=xen-unstable + '[' xovmf = xlinux ']' + linuxbranch= + '[' x = x ']' + qemuubranch=qemu-upstream-unstable + select_prevxenbranch ++ ./cri-getprevxenbranch xen-unstable + prevxenbranch=xen-4.9-testing + '[' x1683ecec41a7c944783c51efa75375f1e0a71d08 = x ']' + : tested/2.6.39.x + . ./ap-common ++ : osst...@xenbits.xen.org +++ getconfig OsstestUpstream +++ perl -e ' use Osstest; readglobalconfig(); print $c{"OsstestUpstream"} or die $!; ' ++ : ++ : git://xenbits.xen.org/xen.git ++ : osst...@xenbits.xen.org:/home/xen/git/xen.git ++ : git://xenbits.xen.org/qemu-xen-traditional.git ++ : git://git.kernel.org ++ : git://git.kernel.org/pub/scm/linux/kernel/git ++ : git ++ : git://xenbits.xen.org/xtf.git ++ : osst...@xenbits.xen.org:/home/xen/git/xtf.git ++ : git://xenbits.xen.org/xtf.git ++ : git://xenbits.xen.org/libvirt.git ++ : osst...@xenbits.xen.org:/home/xen/git/libvirt.git ++ : git://xenbits.xen.org/libvirt.git ++ : git://xenbits.xen.org/osstest/rumprun.git ++ : git ++ : git://xenbits.xen.org/osstest/rumprun.git ++ : osst...@xenbits.xen.org:/home/xen/git/osstest/rumprun.git ++ : git://git.seabios.org/seabios.git ++ : osst...@xenbits.xen.org:/home/xen/git/osstest/seabios.git ++ : git://xenbits.xen.org/osstest/seabios.git ++ : https://github.com/tianocore/edk2.git ++ : osst...@xenbits.xen.org:/home/xen/git/osstest/ovmf.git ++ : git://xenbits.xen.org/osstest/ovmf.git ++ : git://xenbits.xen.org/osstest/linux-firmware.git ++ : osst...@xenbits.xen.org:/home/osstest/ext/linux-firmware.git ++ : git://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git ++ :
[Xen-devel] [qemu-mainline test] 112072: regressions - FAIL
flight 112072 qemu-mainline real [real] http://logs.test-lab.xenproject.org/osstest/logs/112072/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: build-i386-xsm6 xen-buildfail REGR. vs. 111765 build-i3866 xen-buildfail REGR. vs. 111765 build-armhf-xsm 6 xen-buildfail REGR. vs. 111765 test-amd64-amd64-xl-qemuu-win7-amd64 10 windows-install fail REGR. vs. 111765 build-armhf 6 xen-buildfail REGR. vs. 111765 Tests which did not succeed, but are not blocking: test-armhf-armhf-xl-multivcpu 1 build-check(1) blocked n/a test-amd64-i386-freebsd10-i386 1 build-check(1) blocked n/a test-amd64-i386-xl-xsm1 build-check(1) blocked n/a test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked n/a test-amd64-i386-xl-qemuu-ovmf-amd64 1 build-check(1) blocked n/a test-amd64-i386-xl-raw1 build-check(1) blocked n/a test-amd64-i386-qemuu-rhel6hvm-amd 1 build-check(1) blocked n/a test-armhf-armhf-libvirt 1 build-check(1) blocked n/a test-amd64-i386-xl-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked n/a build-armhf-libvirt 1 build-check(1) blocked n/a test-amd64-i386-libvirt 1 build-check(1) blocked n/a test-amd64-i386-xl-qemuu-win10-i386 1 build-check(1) blocked n/a test-armhf-armhf-libvirt-raw 1 build-check(1) blocked n/a test-amd64-i386-libvirt-xsm 1 build-check(1) blocked n/a test-armhf-armhf-xl 1 build-check(1) blocked n/a test-amd64-i386-qemuu-rhel6hvm-intel 1 build-check(1) blocked n/a test-armhf-armhf-xl-vhd 1 build-check(1) blocked n/a test-amd64-i386-xl-qemuu-win7-amd64 1 build-check(1) blocked n/a test-amd64-i386-freebsd10-amd64 1 build-check(1) blocked n/a test-amd64-i386-pair 1 build-check(1) blocked n/a test-armhf-armhf-xl-credit2 1 build-check(1) blocked n/a test-armhf-armhf-xl-cubietruck 1 build-check(1) blocked n/a test-amd64-i386-xl-qemuu-ws16-amd64 1 build-check(1) blocked n/a test-armhf-armhf-xl-rtds 1 build-check(1) blocked n/a test-armhf-armhf-xl-arndale 1 build-check(1) blocked n/a test-amd64-i386-xl-qemuu-debianhvm-amd64 1 build-check(1) blocked n/a test-armhf-armhf-libvirt-xsm 1 build-check(1) blocked n/a test-amd64-i386-xl1 build-check(1) blocked n/a build-i386-libvirt1 build-check(1) blocked n/a test-amd64-i386-libvirt-pair 1 build-check(1) blocked n/a test-armhf-armhf-xl-xsm 1 build-check(1) blocked n/a test-amd64-amd64-xl-rtds 10 debian-install fail like 111765 test-amd64-amd64-xl-qemuu-ws16-amd64 10 windows-installfail never pass test-amd64-amd64-libvirt 13 migrate-support-checkfail never pass test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail never pass test-arm64-arm64-xl 13 migrate-support-checkfail never pass test-arm64-arm64-xl 14 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit2 13 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 14 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-amd64-amd64-libvirt-vhd 12 migrate-support-checkfail never pass test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2 fail never pass test-arm64-arm64-xl-xsm 13 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 14 saverestore-support-checkfail never pass test-arm64-arm64-libvirt-xsm 13 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 14 saverestore-support-checkfail never pass test-amd64-amd64-xl-qemuu-win10-i386 10 windows-installfail never pass version targeted for testing: qemuu25d0233c1ac6cd14a15fcc834f1de3b179037b1d baseline version: qemuu31fe1c414501047cbb91b695bdccc0068496dcf6 Last test of basis 111765 2017-07-13 10:20:16 Z8 days Failing since111790 2017-07-14 04:20:46 Z7 days 10 attempts Testing same since 112072 2017-07-21 00:49:48 Z0 days1 attempts People who touched revisions under test: Alex BennéeAlex Williamson Alexander Graf Alexey Kardashevskiy Alistair Francis
Re: [Xen-devel] [PATCH] docs: fix superpage default value
On Fri, Jul 21, 2017 at 05:21:26PM +0100, Andrew Cooper wrote: > On 20/07/17 13:57, Wei Liu wrote: > > On Thu, Jul 20, 2017 at 12:49:37PM +0100, Andrew Cooper wrote: > > > On 20/07/17 12:47, Wei Liu wrote: > > > > On Thu, Jul 20, 2017 at 12:45:38PM +0100, Roger Pau Monné wrote: > > > > > On Thu, Jul 20, 2017 at 12:35:56PM +0100, Wei Liu wrote: > > > > > > The code says it defaults to false. > > > > > > > > > > > > Signed-off-by: Wei Liu> > > > > > --- > > > > > > Cc: Andrew Cooper > > > > > > Cc: George Dunlap > > > > > > Cc: Ian Jackson > > > > > > Cc: Jan Beulich > > > > > > Cc: Konrad Rzeszutek Wilk > > > > > > Cc: Stefano Stabellini > > > > > > Cc: Tim Deegan > > > > > > Cc: Wei Liu > > > > > > --- > > > > > >docs/misc/xen-command-line.markdown | 2 +- > > > > > >1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > > > > > diff --git a/docs/misc/xen-command-line.markdown > > > > > > b/docs/misc/xen-command-line.markdown > > > > > > index 3f90c3b7a8..f524294aa6 100644 > > > > > > --- a/docs/misc/xen-command-line.markdown > > > > > > +++ b/docs/misc/xen-command-line.markdown > > > > > > @@ -136,7 +136,7 @@ mode during S3 resume. > > > > > >### allowsuperpage > > > > > >> `= ` > > > > > > -> Default: `true` > > > > > > +> Default: `false` > > > > > >Permit Xen to use superpages when performing memory management. > > > > > I'm not an expert on Xen MM code, but isn't this intended for PV > > > > > guests? The description above makes it look like this is for Xen > > > > > itself, but AFAICT from skimming over the code this seems to be a PV > > > > > feature, in which case the text above should be fixed to prevent > > > > > confusion. > > > > I believe it is PV only, but I'm not 100% sure. > > > > > > > > I would love to fix the text as well if possible. > > > I'm fairly sure this option applies exclusively to PV superpages. Double > > > check the logic through the code, but I think (since dropping 32bit > > > support), we have no configuration where Xen might not be able to use > > > superpages. > > > > > So we can just delete this option and make Xen always use superpage? > > That would be fine by me, too. > > No - my point was that this option now exclusively controls PV superpages, > IIRC. > OK. I misunderstood. In that case. We can change the text to: Permit PV guests to use suerpages. ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v4 4/4] Xentrace: add support for HVM's PI blocking list operation
On Fri, Jul 7, 2017 at 7:49 AM, Chao Gaowrote: > In order to analyze PI blocking list operation frequence and obtain > the list length, add some relevant events to xentrace and some > associated code in xenalyze. Event ASYNC_PI_LIST_DEL may happen in interrupt > context, which incurs current assumptions checked in toplevel_assert_check() > are not suitable any more. Thus, this patch extends the > toplevel_assert_check() > to remove such assumptions for events of type ASYNC_PI_LIST_DEL. > > Signed-off-by: Chao Gao Hey Chao Gao, Thanks for doing the work to add this tracing support to xentrace -- and in particular taking the effort to adapt the assert mechanism to be able to handle asynchronous events. I think in this case though, having a separate HVM sub-class for asynchronous events isn't really the right approach. The main purpose of sub-classes is to help filter the events you want; and I can't think of any time you'd want to trace PI_LIST_DEL and not PI_LIST_ADD (or vice versa). Secondly, the "asynchronous event" problem will be an issue for other contexts as well, and the solution will be the same. I think a better solution would be to do something similar to TRC_64_FLAG and TRC_HVM_IOMEM_[read,write], and claim another bit to create a TRC_ASYNC_FLAG (0x400 probably). Then we can filter the "not_idle_domain" and "vcpu_data_mode" asserts on that. What do you think? -George ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] docs: fix superpage default value
On 20/07/17 13:57, Wei Liu wrote: On Thu, Jul 20, 2017 at 12:49:37PM +0100, Andrew Cooper wrote: On 20/07/17 12:47, Wei Liu wrote: On Thu, Jul 20, 2017 at 12:45:38PM +0100, Roger Pau Monné wrote: On Thu, Jul 20, 2017 at 12:35:56PM +0100, Wei Liu wrote: The code says it defaults to false. Signed-off-by: Wei Liu--- Cc: Andrew Cooper Cc: George Dunlap Cc: Ian Jackson Cc: Jan Beulich Cc: Konrad Rzeszutek Wilk Cc: Stefano Stabellini Cc: Tim Deegan Cc: Wei Liu --- docs/misc/xen-command-line.markdown | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown index 3f90c3b7a8..f524294aa6 100644 --- a/docs/misc/xen-command-line.markdown +++ b/docs/misc/xen-command-line.markdown @@ -136,7 +136,7 @@ mode during S3 resume. ### allowsuperpage > `= ` -> Default: `true` +> Default: `false` Permit Xen to use superpages when performing memory management. I'm not an expert on Xen MM code, but isn't this intended for PV guests? The description above makes it look like this is for Xen itself, but AFAICT from skimming over the code this seems to be a PV feature, in which case the text above should be fixed to prevent confusion. I believe it is PV only, but I'm not 100% sure. I would love to fix the text as well if possible. I'm fairly sure this option applies exclusively to PV superpages. Double check the logic through the code, but I think (since dropping 32bit support), we have no configuration where Xen might not be able to use superpages. So we can just delete this option and make Xen always use superpage? That would be fine by me, too. No - my point was that this option now exclusively controls PV superpages, IIRC. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH] xen/pvcalls: use WARN_ON(1) instead of __WARN()
__WARN() is an internal helper that is only available on some architectures, but causes a build error e.g. on ARM64 in some configurations: drivers/xen/pvcalls-back.c: In function 'set_backend_state': drivers/xen/pvcalls-back.c:1097:5: error: implicit declaration of function '__WARN' [-Werror=implicit-function-declaration] Unfortunately, there is no equivalent of BUG() that takes no arguments, but WARN_ON(1) is commonly used in other drivers and works on all configurations. Fixes: 7160378206b2 ("xen/pvcalls: xenbus state handling") Signed-off-by: Arnd Bergmann--- drivers/xen/pvcalls-back.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c index d6c4c4aecb41..00c1a2344330 100644 --- a/drivers/xen/pvcalls-back.c +++ b/drivers/xen/pvcalls-back.c @@ -1094,7 +1094,7 @@ static void set_backend_state(struct xenbus_device *dev, xenbus_switch_state(dev, XenbusStateClosing); break; default: - __WARN(); + WARN_ON(1); } break; case XenbusStateInitWait: @@ -1109,7 +1109,7 @@ static void set_backend_state(struct xenbus_device *dev, xenbus_switch_state(dev, XenbusStateClosing); break; default: - __WARN(); + WARN_ON(1); } break; case XenbusStateConnected: @@ -1123,7 +1123,7 @@ static void set_backend_state(struct xenbus_device *dev, xenbus_switch_state(dev, XenbusStateClosing); break; default: - __WARN(); + WARN_ON(1); } break; case XenbusStateClosing: @@ -1134,11 +1134,11 @@ static void set_backend_state(struct xenbus_device *dev, xenbus_switch_state(dev, XenbusStateClosed); break; default: - __WARN(); + WARN_ON(1); } break; default: - __WARN(); + WARN_ON(1); } } } -- 2.9.0 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v4 3/4] VT-d PI: restrict the vcpu number on a given pcpu
On Fri, Jul 7, 2017 at 7:48 AM, Chao Gaowrote: > Currently, a blocked vCPU is put in its pCPU's pi blocking list. If > too many vCPUs are blocked on a given pCPU, it will incur that the list > grows too long. After a simple analysis, there are 32k domains and > 128 vcpu per domain, thus about 4M vCPUs may be blocked in one pCPU's > PI blocking list. When a wakeup interrupt arrives, the list is > traversed to find some specific vCPUs to wake them up. This traversal in > that case would consume much time. > > To mitigate this issue, this patch limits the number of vCPUs tracked on a > given pCPU's blocking list, taking factors such as perfomance of common case, > current hvm vCPU count and current pCPU count into consideration. With this > method, for the common case, it works fast and for some extreme cases, the > list length is under control. > > With this patch, when a vcpu is to be blocked, we check whether the pi > blocking list's length of the pcpu where the vcpu is running exceeds > the limit which is the average vcpus per pcpu ratio plus a constant. > If no, the vcpu is added to this pcpu's pi blocking list. Otherwise, > another online pcpu is chosen to accept the vcpu. > > Signed-off-by: Chao Gao > --- > v4: > - use a new lock to avoid adding a blocked vcpu to a offline pcpu's blocking > list. > > --- > xen/arch/x86/hvm/vmx/vmx.c | 136 > + > 1 file changed, 114 insertions(+), 22 deletions(-) > > diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c > index ecd6485..04e9aa6 100644 > --- a/xen/arch/x86/hvm/vmx/vmx.c > +++ b/xen/arch/x86/hvm/vmx/vmx.c > @@ -95,22 +95,91 @@ static DEFINE_PER_CPU(struct vmx_pi_blocking_vcpu, > vmx_pi_blocking); > uint8_t __read_mostly posted_intr_vector; > static uint8_t __read_mostly pi_wakeup_vector; > > +/* > + * Protect critical sections to avoid adding a blocked vcpu to a destroyed > + * blocking list. > + */ > +static DEFINE_SPINLOCK(remote_pbl_operation); > + > +#define remote_pbl_operation_begin(flags) \ > +({ \ > +spin_lock_irqsave(_pbl_operation, flags);\ > +}) > + > +#define remote_pbl_operation_done(flags)\ > +({ \ > +spin_unlock_irqrestore(_pbl_operation, flags); \ > +}) > + > void vmx_pi_per_cpu_init(unsigned int cpu) > { > INIT_LIST_HEAD(_cpu(vmx_pi_blocking, cpu).list); > spin_lock_init(_cpu(vmx_pi_blocking, cpu).lock); > } > > +/* > + * By default, the local pcpu (means the one the vcpu is currently running > on) > + * is chosen as the destination of wakeup interrupt. But if the vcpu number > of > + * the pcpu exceeds a limit, another pcpu is chosen until we find a suitable > + * one. > + * > + * Currently, choose (v_tot/p_tot) + K as the limit of vcpu count, where > + * v_tot is the total number of hvm vcpus on the system, p_tot is the total > + * number of pcpus in the system, and K is a fixed number. An experment on a > + * skylake server which has 112 cpus and 64G memory shows the maximum time to > + * wakeup a vcpu from a 128-entry blocking list takes about 22us, which is > + * tolerable. So choose 128 as the fixed number K. > + * > + * This policy makes sure: > + * 1) for common cases, the limit won't be reached and the local pcpu is used > + * which is beneficial to performance (at least, avoid an IPI when unblocking > + * vcpu). > + * 2) for the worst case, the blocking list length scales with the vcpu count > + * divided by the pcpu count. > + */ > +#define PI_LIST_FIXED_NUM 128 > +#define PI_LIST_LIMIT (atomic_read(_hvm_vcpus) / num_online_cpus() + > \ > + PI_LIST_FIXED_NUM) > +static inline bool pi_over_limit(int cpu) > +{ > +return per_cpu(vmx_pi_blocking, cpu).counter > PI_LIST_LIMIT; Is there any reason to hide this calculation behind a #define, when it's only used once anyway? Also -- the vast majority of the time, .counter will be < PI_LIST_FIXED_NUM; there's no reason to do an atomic read and an integer division in that case. I would do this: if ( likely(per_cpu(vm_pi_blocking, cpu).counter <= PI_LIST_FIXED_LIMIT) ) return 0; return per_cpu(vm_pi_blocking, cpu).counter < PI_LIST_FIXED_LIMIT + (atomic_read(_hvm_vcpus) / num_online_cpus)); Also, I personally think it would make the code more readable to say, "pi_under_limit()" instead; that way... > +} > + > static void vmx_vcpu_block(struct vcpu *v) > { > -unsigned long flags; > -unsigned int dest; > +unsigned long flags[2]; > +unsigned int dest, pi_cpu; > spinlock_t *old_lock; > -spinlock_t *pi_blocking_list_lock = > - _cpu(vmx_pi_blocking, v->processor).lock; > struct pi_desc *pi_desc = >arch.hvm_vmx.pi_desc; > +spinlock_t *pi_blocking_list_lock; > +bool in_remote_operation = false; > + > +pi_cpu
[Xen-devel] Notes from Design Session: Solving Community Problems: Patch Volume vs Review Bandwidth, Community Meetings ... and other problems
Hi all, please find attached my notes. Lars Session URL: http://sched.co/AjB3 ACTIONS on Lars, Andy and Juergen ACTIONS on Stefano and Julien Community Call == This was a discussion about whether we should do more community calls, in critical areas. The background was whether we should have an x86 call to mirror the ARM call. Jan and Andy asked whether the ARM calls are useful Julien: They are very useful. On average about 10 people attend. On ARM we don't yet have a real plan of what's needed for the future. We are hoping to use the call to establish a firmer plan. Lars: Was asking whether we always have an agenda at the beginning. Julien: Sometimes, but often the agenda is established/refined in the first 5 minutes at the beginning of the call. Typically Julien or Stefano handle this at the beginning Lars asks whether we need one for tools Ian: there is currently not much a need for technical coordination Lars: it feels that a call on x86 would be helpful But we can only cover non-NDA information as with the other calls Jan and Andy agree that they are happy to try this, but are concerned that it may fizzle out. Also neither want to own agenda and note-taking (notes and call info are posted on xen-devel@) ACTION: Lars to work with Intel on setting this up (note, I was asked by Susie Li to include John Ji and Chao Peng on this thread and discuss with them at a separate call) Timing wise, a call at from 9-10 UK time once a month should work. Example of ARM call minutes: * http://markmail.org/message/myjllcngy3lqveji * http://markmail.org/message/d4kuqxxhj6dfnf23 * There also ought to be a reminder of call details (someone to highlight an example) Contributions vs. Review Bandwidth == A potential bottleneck issue was raised in the area of ARM and x86 ARM --- Lars asks what issue have been observed Julien: Lots of new features and lots of design discussion Stefano: Design discussions are creating trouble: sometimes we have complex proposals without a clear answer on the right way forward. Complicated design => 2/3 options => not clear which way is the best forward => ARM maintainers can provide advice, can say what is going to work Right now ARM maintainers expect the contributor has to lead and drive it (e.g. an example where we were stuck was BIG.Little support) A pattern we have seen is: - Complex problem - Not an obviously clear answer - Gets stuck - Design discussion fizzles out without an artefact in the codebase (in other words, there is an unfinished mail thread) Lars: Asks whether maybe the issue is one of sufficient confidence by the contributor to move the discussion further or whether expectations were not communicated clearly (e.g. tell contributors to pick a solution and move forward). Stefano and Julien: Agree that this may indeed be the case It is unusual to be in a technical leadership position when it comes to driving designs and new solutions, but not from a process perspective. Contributors need to be reminded of that. It is also possible that embedded vendors may want to contribute, but have only a small time window to do this. Agreements: * Create a couple of boilerplate mails or checklists to set expectations better ACTION: on ARM maintainers to trial * Agreed to allow draft design into the git tree, as long as interface status (Draft and unresolved issues) are clearly documented. In that case, contributors can show progress and others - even if a design is not finished - can build on it. Feature docs already allow for that and so do Design Docs (although there is no example). ACTION: on ARM maintainers to trial and pick a suitable location in tree. x86 --- Lars prompts Jan, Andy on some of the challenges Jan, Andy: A Typically series are large and fully formed (e.g. 30 size series) B Often we don't have enough context to understand design behind code This has improved through Hackathons, meetings under NDA, ... C In the past, series have existed for 2 years earlier in private (e.g. SGX was developed against 4.6) and is posted against a newer version. At that point, some assumptions may have changed: e.g. on 5-level-paging we agreed at the summit that PV support is not needed (only HVM and PVH) D There is not normally lack of driving and managing the submission of an issue Roger: feels that when he is reviewing x86 stuff it does not actually take work off Jan or Andrew, as sometimes one of them will pick up and re-reviews. That sometimes puts him off. Jan: that is a risk to take and shouldn't put you off. Wei: says that when his responsibility on a patch is not clear, he says "subject to the agreement of XXX". That sets expectations with other maintainers and contributors. Then we went a little bit onto reasons behind bandwidth issues Jan: large series are often hard to understand and consume. Also, sometimes there is a lack of understanding that there
Re: [Xen-devel] [PATCH v4 1/4] VT-d PI: track the vcpu number on pi blocking list
On Fri, Jul 7, 2017 at 7:48 AM, Chao Gaowrote: > This patch adds a field, counter, in struct vmx_pi_blocking_vcpu to track > how many entries are on the pi blocking list. > > Signed-off-by: Chao Gao Minor nit: The grammar in the title isn't quite right; "vcpu number" would be "the number identifying a particular vcpu", not "the number of vcpus". It should be, "VT-d PI: Track the number of vcpus on pi blocking list". With that: Reviewed-by: George Dunlap > --- > v4: > - non-trace part of Patch 1 in v3 > > --- > xen/arch/x86/hvm/vmx/vmx.c | 14 +++--- > 1 file changed, 11 insertions(+), 3 deletions(-) > > diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c > index 69ce3aa..ecd6485 100644 > --- a/xen/arch/x86/hvm/vmx/vmx.c > +++ b/xen/arch/x86/hvm/vmx/vmx.c > @@ -83,6 +83,7 @@ static int vmx_vmfunc_intercept(struct cpu_user_regs *regs); > struct vmx_pi_blocking_vcpu { > struct list_head list; > spinlock_t lock; > +unsigned int counter; > }; > > /* > @@ -120,6 +121,7 @@ static void vmx_vcpu_block(struct vcpu *v) > */ > ASSERT(old_lock == NULL); > > +per_cpu(vmx_pi_blocking, v->processor).counter++; > list_add_tail(>arch.hvm_vmx.pi_blocking.list, >_cpu(vmx_pi_blocking, v->processor).list); > spin_unlock_irqrestore(pi_blocking_list_lock, flags); > @@ -187,6 +189,8 @@ static void vmx_pi_unblock_vcpu(struct vcpu *v) > { > ASSERT(v->arch.hvm_vmx.pi_blocking.lock == pi_blocking_list_lock); > list_del(>arch.hvm_vmx.pi_blocking.list); > +container_of(pi_blocking_list_lock, > + struct vmx_pi_blocking_vcpu, lock)->counter--; > v->arch.hvm_vmx.pi_blocking.lock = NULL; > } > > @@ -235,6 +239,7 @@ void vmx_pi_desc_fixup(unsigned int cpu) > if ( pi_test_on(>pi_desc) ) > { > list_del(>pi_blocking.list); > +per_cpu(vmx_pi_blocking, cpu).counter--; > vmx->pi_blocking.lock = NULL; > vcpu_unblock(container_of(vmx, struct vcpu, arch.hvm_vmx)); > } > @@ -259,6 +264,8 @@ void vmx_pi_desc_fixup(unsigned int cpu) > > list_move(>pi_blocking.list, >_cpu(vmx_pi_blocking, new_cpu).list); > +per_cpu(vmx_pi_blocking, cpu).counter--; > +per_cpu(vmx_pi_blocking, new_cpu).counter++; > vmx->pi_blocking.lock = new_lock; > > spin_unlock(new_lock); > @@ -2358,9 +2365,9 @@ static struct hvm_function_table __initdata > vmx_function_table = { > static void pi_wakeup_interrupt(struct cpu_user_regs *regs) > { > struct arch_vmx_struct *vmx, *tmp; > -spinlock_t *lock = _cpu(vmx_pi_blocking, smp_processor_id()).lock; > -struct list_head *blocked_vcpus = > - _cpu(vmx_pi_blocking, smp_processor_id()).list; > +unsigned int cpu = smp_processor_id(); > +spinlock_t *lock = _cpu(vmx_pi_blocking, cpu).lock; > +struct list_head *blocked_vcpus = _cpu(vmx_pi_blocking, cpu).list; > > ack_APIC_irq(); > this_cpu(irq_count)++; > @@ -2377,6 +2384,7 @@ static void pi_wakeup_interrupt(struct cpu_user_regs > *regs) > if ( pi_test_on(>pi_desc) ) > { > list_del(>pi_blocking.list); > +per_cpu(vmx_pi_blocking, cpu).counter--; > ASSERT(vmx->pi_blocking.lock == lock); > vmx->pi_blocking.lock = NULL; > vcpu_unblock(container_of(vmx, struct vcpu, arch.hvm_vmx)); > -- > 1.8.3.1 > > > ___ > Xen-devel mailing list > Xen-devel@lists.xen.org > https://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [xen-unstable test] 112065: regressions - FAIL
flight 112065 xen-unstable real [real] http://logs.test-lab.xenproject.org/osstest/logs/112065/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-amd64-xl-qemuu-win7-amd64 16 guest-localmigrate/x10 fail REGR. vs. 112004 Regressions which are regarded as allowable (not blocking): test-armhf-armhf-xl-rtds16 guest-start/debian.repeat fail REGR. vs. 112004 Tests which did not succeed, but are not blocking: test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop fail blocked in 112004 test-armhf-armhf-libvirt 14 saverestore-support-checkfail like 112004 test-amd64-amd64-xl-qemut-win7-amd64 17 guest-stopfail like 112004 test-amd64-i386-xl-qemut-win7-amd64 17 guest-stop fail like 112004 test-armhf-armhf-libvirt-xsm 14 saverestore-support-checkfail like 112004 test-armhf-armhf-libvirt-raw 13 saverestore-support-checkfail like 112004 test-amd64-amd64-xl-rtds 10 debian-install fail like 112004 test-amd64-amd64-xl-qemut-ws16-amd64 10 windows-installfail never pass test-amd64-amd64-xl-qemuu-ws16-amd64 10 windows-installfail never pass test-amd64-amd64-libvirt 13 migrate-support-checkfail never pass test-amd64-i386-libvirt 13 migrate-support-checkfail never pass test-arm64-arm64-xl 13 migrate-support-checkfail never pass test-arm64-arm64-xl 14 saverestore-support-checkfail never pass test-arm64-arm64-xl-xsm 13 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 14 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail never pass test-amd64-i386-libvirt-xsm 13 migrate-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-amd64-amd64-libvirt-vhd 12 migrate-support-checkfail never pass test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2 fail never pass test-amd64-i386-xl-qemuu-ws16-amd64 13 guest-saverestore fail never pass test-armhf-armhf-xl-xsm 13 migrate-support-checkfail never pass test-armhf-armhf-xl-xsm 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit2 13 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 14 saverestore-support-checkfail never pass test-armhf-armhf-xl 13 migrate-support-checkfail never pass test-armhf-armhf-xl 14 saverestore-support-checkfail never pass test-armhf-armhf-libvirt 13 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 13 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 14 saverestore-support-checkfail never pass test-arm64-arm64-libvirt-xsm 13 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-arndale 13 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 14 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-xsm 13 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 13 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-rtds 13 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 14 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-raw 12 migrate-support-checkfail never pass test-amd64-i386-xl-qemut-ws16-amd64 13 guest-saverestore fail never pass test-armhf-armhf-xl-vhd 12 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 13 saverestore-support-checkfail never pass test-amd64-amd64-xl-qemut-win10-i386 10 windows-installfail never pass test-amd64-i386-xl-qemuu-win10-i386 10 windows-install fail never pass test-amd64-amd64-xl-qemuu-win10-i386 10 windows-installfail never pass test-amd64-i386-xl-qemut-win10-i386 10 windows-install fail never pass test-armhf-armhf-xl-cubietruck 13 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 14 saverestore-support-checkfail never pass version targeted for testing: xen 64c3fce24585740a43eb0d589de6e329ca454502 baseline version: xen d535d8922f571502252deaf607e82e7475cd1728 Last test of basis 112004 2017-07-19 06:51:03 Z2 days Failing since112033 2017-07-20 02:24:27 Z1 days2 attempts Testing same since 112065 2017-07-20 19:20:15 Z0 days1 attempts People who touched revisions under test:
Re: [Xen-devel] [xen-unstable test] 112033: regressions - trouble: broken/fail/pass
Hi, On 20/07/17 20:01, osstest service owner wrote: > flight 112033 xen-unstable real [real] > http://logs.test-lab.xenproject.org/osstest/logs/112033/ > > Regressions :-( > > Tests which did not succeed and are blocking, > including tests which could not be run: > test-amd64-i386-xl-qemuu-ovmf-amd64 4 host-install(4) broken REGR. vs. > 112004 > test-armhf-armhf-xl-credit2 16 guest-start/debian.repeat fail REGR. vs. > 112004 I have looked at the failure for this test. It is happening on one of the cubietruck and seems to reproduce fairly reliably ([1]). It is failing when creating the 6th domain. Looking at the guest console logs, I only see 5 domains logs. Nothing for the 6th. The guest seem to received a prefetch abort (see trace below) probably after a data abort. I am not sure to understand why and the stack trace seem awfully blank. I've look at other available logs with similar failure. All end up with a prefetch abort, although not necessarily after a data abort. Ian, I am wondering if I could borrow one of the cubietruck on Monday to try and reproduce the bug? Cheers, Jul 20 06:44:03.543038 (XEN) [ Xen-4.10-unstable arm32 debug=y Not tainted ] Jul 20 06:44:03.548785 (XEN) CPU:0 Jul 20 06:44:03.550283 (XEN) PC: 000c Jul 20 06:44:03.552407 (XEN) CPSR: 61d7 MODE:32-bit Guest ABT Jul 20 06:44:03.556288 (XEN) R0: dcffe000 R1: 5c00065f R2: R3: c031c4a8 Jul 20 06:44:03.561910 (XEN) R4: dc00 R5: R6: c0f4d264 R7: dc001000 Jul 20 06:44:03.567413 (XEN) R8: dcffe000 R9: 0005c000 R10:dc20 R11:c0f4d000 R12: Jul 20 06:44:03.574161 (XEN) USR: SP: LR: Jul 20 06:44:03.577411 (XEN) SVC: SP: c1201e60 LR: c1007d68 SPSR:41d3 Jul 20 06:44:03.581903 (XEN) ABT: SP: c1318acc LR: 0010 SPSR:61d7 Jul 20 06:44:03.586403 (XEN) UND: SP: c1318ad8 LR: c1318ad8 SPSR: Jul 20 06:44:03.590909 (XEN) IRQ: SP: c1318ac0 LR: c1318ac0 SPSR: Jul 20 06:44:03.595404 (XEN) FIQ: SP: c1318ae4 LR: c1318ae4 SPSR: Jul 20 06:44:03.599909 (XEN) FIQ: R8: R9: R10: R11: R12: Jul 20 06:44:03.606657 (XEN) Jul 20 06:44:03.607279 (XEN) SCTLR: 10c5387d Jul 20 06:44:03.609775 (XEN)TCR: Jul 20 06:44:03.612153 (XEN) TTBR0: 4020406a Jul 20 06:44:03.615282 (XEN) TTBR1: 4020406a Jul 20 06:44:03.618404 (XEN) IFAR: 000c, IFSR: 0007 Jul 20 06:44:03.622166 (XEN) DFAR: dcffe000, DFSR: 0805 Jul 20 06:44:03.626073 (XEN) Jul 20 06:44:03.626683 (XEN) VTCR_EL2: 80003558 Jul 20 06:44:03.629208 (XEN) VTTBR_EL2: 0002bff24000 Jul 20 06:44:03.632310 (XEN) Jul 20 06:44:03.632931 (XEN) SCTLR_EL2: 30cd187f Jul 20 06:44:03.635432 (XEN)HCR_EL2: 0038663f Jul 20 06:44:03.638549 (XEN) TTBR0_EL2: bff12000 Jul 20 06:44:03.641663 (XEN) Jul 20 06:44:03.642421 (XEN)ESR_EL2: 07e0 Jul 20 06:44:03.644790 (XEN) HPFAR_EL2: 0001c810 Jul 20 06:44:03.647919 (XEN) HDFAR: e0800f00 Jul 20 06:44:03.650295 (XEN) HIFAR: 5cf18882 Jul 20 06:44:03.652665 (XEN) Jul 20 06:44:03.653526 (XEN) Guest stack trace from sp=c1318acc: Jul 20 06:44:03.657187 (XEN) Jul 20 06:44:03.664298 (XEN) Jul 20 06:44:03.671297 (XEN) Jul 20 06:44:03.678425 (XEN) Jul 20 06:44:03.685539 (XEN) Jul 20 06:44:03.692645 (XEN) Jul 20 06:44:03.699805 (XEN) Jul 20 06:44:03.706937 (XEN) Jul 20 06:44:03.714064 (XEN) Jul 20 06:44:03.721049 (XEN) Jul 20 06:44:03.728168 (XEN) Jul 20 06:44:03.735291 (XEN) Jul 20 06:44:03.742417 (XEN) Jul 20 06:44:03.749529 (XEN) Jul 20 06:44:03.756662 (XEN) Jul 20 06:44:03.763787 (XEN) Jul 20 06:44:03.770791 (XEN) Jul 20
Re: [Xen-devel] [Bug] Intel RMRR support with upstream Qemu
> On Fri, 21 Jul 2017 10:57:55 + > "Zhang, Xiong Y"wrote: > > > On an intel skylake machine with upstream qemu, if I add > > "rdm=strategy=host, policy=strict" to hvm.cfg, win 8.1 DomU couldn't > > boot up and continues reboot. > > > > Steps to reproduce this issue: > > > > 1) Boot xen with iommu=1 to enable iommu > > 2) hvm.cfg contain: > > > > builder="hvm" > > > > memory= > > > > disk=['win8.1 img'] > > > > device_model_override='qemu-system-i386' > > > > device_model_version='qemu-xen' > > > > rdm="strategy=host,policy=strict" > > > > 3) xl cr hvm.cfg > > > > Conditions to reproduce this issue: > > > > 1) DomU memory size > the top address of RMRR. Otherwise, this > > issue will disappear. > > 2) rdm=" strategy=host,policy=strict" should exist > > 3) Windows DomU. Linux DomU doesn't have such issue. > > 4) Upstream qemu. Traditional qemu doesn't have such issue. > > > > In this situation, hvmloader will relocate some guest ram below RMRR to > > high memory, and it seems window guest access an invalid address. Could > > someone give me some suggestions on how to debug this ? > > You're likely have RMRR range(s) below 2GB boundary. > > You may try the following: > > 1. Specify some large 'mmio_hole' value in your domain configuration file, > ex. mmio_hole=2560 > 2. If it won't help, 'xl dmesg' output might come useful > > Right now upstream QEMU still doesn't support relocation of parts > of guest RAM to >4GB boundary if they were overlapped by MMIO ranges. > AFAIR forcing allow_memory_relocate to 1 for hvmloader didn't bring > anything good for HVM guest. > > Setting the mmio_hole size manually allows to create a "predefined" > memory/MMIO hole layout for both QEMU (via 'max-ram-below-4g') and > hvmloader (via a XenStore param), effectively avoiding MMIO/RMRR overlaps > or RAM relocation in hvmloader, so this might help. Wrote too soon, "policy=strict" means that you won't be able to create a DomU if RMRR was below 2G... so it's actually should be above 2GB. Anyway, try setting mmio_hole size. ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PULL for-2.10 6/7] xen/mapcache: introduce xen_replace_cache_entry()
On Tue, Jul 18, 2017 at 03:22:41PM -0700, Stefano Stabellini wrote: > From: Igor Druzhinin... > +static uint8_t *xen_replace_cache_entry_unlocked(hwaddr old_phys_addr, > + hwaddr new_phys_addr, > + hwaddr size) > +{ > +MapCacheEntry *entry; > +hwaddr address_index, address_offset; > +hwaddr test_bit_size, cache_size = size; > + > +address_index = old_phys_addr >> MCACHE_BUCKET_SHIFT; > +address_offset = old_phys_addr & (MCACHE_BUCKET_SIZE - 1); > + > +assert(size); > +/* test_bit_size is always a multiple of XC_PAGE_SIZE */ > +test_bit_size = size + (old_phys_addr & (XC_PAGE_SIZE - 1)); > +if (test_bit_size % XC_PAGE_SIZE) { > +test_bit_size += XC_PAGE_SIZE - (test_bit_size % XC_PAGE_SIZE); > +} > +cache_size = size + address_offset; > +if (cache_size % MCACHE_BUCKET_SIZE) { > +cache_size += MCACHE_BUCKET_SIZE - (cache_size % MCACHE_BUCKET_SIZE); > +} > + > +entry = >entry[address_index % mapcache->nr_buckets]; > +while (entry && !(entry->paddr_index == address_index && > + entry->size == cache_size)) { > +entry = entry->next; > +} > +if (!entry) { > +DPRINTF("Trying to update an entry for %lx " \ > +"that is not in the mapcache!\n", old_phys_addr); > +return NULL; > +} > + > +address_index = new_phys_addr >> MCACHE_BUCKET_SHIFT; > +address_offset = new_phys_addr & (MCACHE_BUCKET_SIZE - 1); > + > +fprintf(stderr, "Replacing a dummy mapcache entry for %lx with %lx\n", > +old_phys_addr, new_phys_addr); Looks likes this does not build on 32bits. in: http://logs.test-lab.xenproject.org/osstest/logs/112041/build-i386/6.ts-xen-build.log /home/osstest/build.112041.build-i386/xen/tools/qemu-xen-dir/hw/i386/xen/xen-mapcache.c: In function 'xen_replace_cache_entry_unlocked': /home/osstest/build.112041.build-i386/xen/tools/qemu-xen-dir/hw/i386/xen/xen-mapcache.c:539:13: error: format '%lx' expects argument of type 'long unsigned int', but argument 3 has type 'hwaddr' [-Werror=format=] old_phys_addr, new_phys_addr); ^ /home/osstest/build.112041.build-i386/xen/tools/qemu-xen-dir/hw/i386/xen/xen-mapcache.c:539:13: error: format '%lx' expects argument of type 'long unsigned int', but argument 4 has type 'hwaddr' [-Werror=format=] cc1: all warnings being treated as errors CC i386-softmmu/target/i386/gdbstub.o /home/osstest/build.112041.build-i386/xen/tools/qemu-xen-dir/rules.mak:66: recipe for target 'hw/i386/xen/xen-mapcache.o' failed > + > +xen_remap_bucket(entry, entry->vaddr_base, > + cache_size, address_index, false); > +if (!test_bits(address_offset >> XC_PAGE_SHIFT, > +test_bit_size >> XC_PAGE_SHIFT, > +entry->valid_mapping)) { > +DPRINTF("Unable to update a mapcache entry for %lx!\n", > old_phys_addr); > +return NULL; > +} > + > +return entry->vaddr_base + address_offset; > +} > + -- Anthony PERARD ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [Bug] Intel RMRR support with upstream Qemu
Hi, On Fri, 21 Jul 2017 10:57:55 + "Zhang, Xiong Y"wrote: > On an intel skylake machine with upstream qemu, if I add > "rdm=strategy=host, policy=strict" to hvm.cfg, win 8.1 DomU couldn't boot > up and continues reboot. > > Steps to reproduce this issue: > > 1) Boot xen with iommu=1 to enable iommu > 2) hvm.cfg contain: > > builder="hvm" > > memory= > > disk=['win8.1 img'] > > device_model_override='qemu-system-i386' > > device_model_version='qemu-xen' > > rdm="strategy=host,policy=strict" > > 3) xl cr hvm.cfg > > Conditions to reproduce this issue: > > 1) DomU memory size > the top address of RMRR. Otherwise, this > issue will disappear. > 2) rdm=" strategy=host,policy=strict" should exist > 3) Windows DomU. Linux DomU doesn't have such issue. > 4) Upstream qemu. Traditional qemu doesn't have such issue. > > In this situation, hvmloader will relocate some guest ram below RMRR to > high memory, and it seems window guest access an invalid address. Could > someone give me some suggestions on how to debug this ? You're likely have RMRR range(s) below 2GB boundary. You may try the following: 1. Specify some large 'mmio_hole' value in your domain configuration file, ex. mmio_hole=2560 2. If it won't help, 'xl dmesg' output might come useful Right now upstream QEMU still doesn't support relocation of parts of guest RAM to >4GB boundary if they were overlapped by MMIO ranges. AFAIR forcing allow_memory_relocate to 1 for hvmloader didn't bring anything good for HVM guest. Setting the mmio_hole size manually allows to create a "predefined" memory/MMIO hole layout for both QEMU (via 'max-ram-below-4g') and hvmloader (via a XenStore param), effectively avoiding MMIO/RMRR overlaps or RAM relocation in hvmloader, so this might help. ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] Notes from Design Summit Hypervisor Fuzzing Session
Hi all, please find attached my notes. A lot of it went over my head, so I may have gotten things wrong and some are missing Feel free to modify, chip in, clarify, as needed Lars Session URL: http://sched.co/AjHN OPTION 1: Userspace Approach Dom0 Domu [AFL] [VM nested with Xen and XTF] [Xen ] Would need 1. nested HVM support 2. VM forking Not an option as too hard OPTION 2: = Dom0DomU [AFL ][VM XTF ] [ ] <> [ [e]] e = executor /\ || || \/ [Xen ] This approach would need 1. Tracing (instrument binary and write to shared memory for AFL) Almost done, but not completely deterministic yet 2. Implemented a special hypercall that returns return code that can be converted into expected AFL output for branching info Submitted 3. Communication channel between AFL and XTF Almost done 4. Using XTF because it should be the fastest option and allows us to restrict the scope of what to fuzz Key challenge: not making unnecessary indeterministic hyper calls in the background Use of XTF constrains the degrees of freedom and focusses the fuzzing 5. Need some way to feed info back into AFL I believe there was some discussion around this, which I did not get Discussion == Dismissed Option 1. All agreed that Option 2 is best. I missed quite a bit of this, because the discussion was quite fast at times George: recommends to test one thing at the time to reduce the problem space Such as iteration, feedback, ... Based on outcome iterate There was a little bit of discussion around determinism: Andy: blacklist shadop_??? with ??? = shutdown, suspend, watchdog, ... Possibly there are some more functions that need to be blacklisted This should help with determinism Andy: Going to have problems such as dealing with partial hypercall operations Wei: Already included this - only 1 thread in XTF => deterministic Andy: What happoens if HV gets interrupted Juergen: put XTF into null scheduler pool to minimise risk of interrupts and increase determinism Wei: That would exclude IRQs in such a scenario There was a little bit of around feedback loop and protocol between AFL and XTF Andy: easiest way to get a feedback loop starting. XTF to boot, wait on event channel (shadop call with - 0 timeout) AFL does the hypercall with edge tracing, ... Jurgen: starting measurement can be done be initiated AFL (Dom0), and disabled from XTF (DomU) Wei: follow the same pattern as xl already does (I don't know the sample code though) There was a bit of discussion on the impact pf QEMU Wei: can't use QEMU to emulate a machine with vhdx (following on from a question by Ian) Ian: this will be fast, not quite so reliable. But a good first step And some other topics Andy: there is also syzkaller, with fuzzing entity being some userspace calls Wei: used as a reference material as Oracle did something similar ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] xen:Kconfig: Make SCIF built by default for ARM
Hi Andrii, Please CC the relevant maintainers when sending a patch (or questions regarding a specific subsystems) on the ML. On 18/07/17 17:45, Andrii Anisov wrote: From: Andrii AnisovBoth Renesas R-Car Gen2(ARM32) and Gen3(ARM64) are utilizing SCIF IP, so make its serial driver built by default for ARM. Signed-off-by: Andrii Anisov Acked-by: Julien Grall --- xen/drivers/char/Kconfig | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/xen/drivers/char/Kconfig b/xen/drivers/char/Kconfig index 51343d0..fb53dd8 100644 --- a/xen/drivers/char/Kconfig +++ b/xen/drivers/char/Kconfig @@ -39,10 +39,10 @@ config HAS_OMAP config HAS_SCIF bool default y - depends on ARM_32 + depends on ARM help This selects the SuperH SCI(F) UART. If you have a SuperH based board, - say Y. + or Renesas R-Car Gen 2/3 based board say Y. config HAS_EHCI bool Cheers, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [xen-devel][xen/Arm]xen fail to boot on omap5 board
On 18/07/17 10:50, Andrii Anisov wrote: Dear Shishir, On 18.07.17 12:05, shishir tiwari wrote: Hi I want test and understand xen hypervisor implementation with dom0 and domU on omap5 board. I followed https://wiki.xenproject.org/wiki/Xen_ARM_with_Virtualization_Extensions/OMAP5432_uEVM with latest kernel(4.11.7) and xen(4.9.0) and device tree and but unable to boot dom0. xen stop on "Turning on pages". I guess you mean "- Turning on paging -" Please drop the whole log. This is very early boot in head.S so having the full log will not really help here... What is more interesting is where the different modules have been loaded in memory: - Device Tree - Kernel - Xen - Initramfs (if any) please tell what version on Xen and kernel is tested on omap5 board. IIRC it was XEN 4.5 and LK 3.18. An old and outdated stuff. The same as OMAP5, which is discontinued maybe three years ago. Even though OMAP5 is not sold anymore, we should still be able to boot Xen 4.9 on it. If it is not the case, then there is a bug in the code. BTW, I'm really surprised you have an OMAP5 based board. Which actually do you have? Cheers, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [RFC PATCH v3 13/24] ARM: NUMA: DT: Parse memory NUMA information
On 21/07/17 12:10, Vijay Kilari wrote: Hi Julien, On Thu, Jul 20, 2017 at 4:56 PM, Julien Grallwrote: On 19/07/17 19:39, Julien Grall wrote: cell = (const __be32 *)prop->data; banks = fdt32_to_cpu(prop->len) / (reg_cells * sizeof (u32)); -for ( i = 0; i < banks && bootinfo.mem.nr_banks < NR_MEM_BANKS; i++ ) +for ( i = 0; i < banks; i++ ) { device_tree_get_reg(, address_cells, size_cells, , ); if ( !size ) continue; -bootinfo.mem.bank[bootinfo.mem.nr_banks].start = start; -bootinfo.mem.bank[bootinfo.mem.nr_banks].size = size; -bootinfo.mem.nr_banks++; +if ( !efi_enabled(EFI_BOOT) && bootinfo.mem.nr_banks < NR_MEM_BANKS ) +{ +bootinfo.mem.bank[bootinfo.mem.nr_banks].start = start; +bootinfo.mem.bank[bootinfo.mem.nr_banks].size = size; +bootinfo.mem.nr_banks++; +} This change should be split. I thought a bit more about this code during the week. I think it would be nicer to write: #ifdef CONFIG_NUMA dt_numa_process_memory_node(nid, start, size); #endif if ( !efi_enabled(EFI_BOOT) ) continue; Should be if ( efi_enabled(EFI_BOOT) ) ? if ( bootinfo.mem.nr_banks < NR_MEM_BANKS ) Should be if ( bootinfo.mem.nr_banks >= NR_MEM_BANKS ) ? Yes for both. I wrote too quickly this e-mail. Cheers, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [RFC PATCH v3 13/24] ARM: NUMA: DT: Parse memory NUMA information
Hi Julien, On Thu, Jul 20, 2017 at 4:56 PM, Julien Grallwrote: > > > On 19/07/17 19:39, Julien Grall wrote: >>> >>> cell = (const __be32 *)prop->data; >>> banks = fdt32_to_cpu(prop->len) / (reg_cells * sizeof (u32)); >>> >>> -for ( i = 0; i < banks && bootinfo.mem.nr_banks < NR_MEM_BANKS; >>> i++ ) >>> +for ( i = 0; i < banks; i++ ) >>> { >>> device_tree_get_reg(, address_cells, size_cells, , >>> ); >>> if ( !size ) >>> continue; >>> -bootinfo.mem.bank[bootinfo.mem.nr_banks].start = start; >>> -bootinfo.mem.bank[bootinfo.mem.nr_banks].size = size; >>> -bootinfo.mem.nr_banks++; >>> +if ( !efi_enabled(EFI_BOOT) && bootinfo.mem.nr_banks < >>> NR_MEM_BANKS ) >>> +{ >>> +bootinfo.mem.bank[bootinfo.mem.nr_banks].start = start; >>> +bootinfo.mem.bank[bootinfo.mem.nr_banks].size = size; >>> +bootinfo.mem.nr_banks++; >>> +} >> >> >> This change should be split. > > > I thought a bit more about this code during the week. I think it would be > nicer to write: > > #ifdef CONFIG_NUMA > dt_numa_process_memory_node(nid, start, size); > #endif > > if ( !efi_enabled(EFI_BOOT) ) > continue; Should be if ( efi_enabled(EFI_BOOT) ) ? > > if ( bootinfo.mem.nr_banks < NR_MEM_BANKS ) Should be if ( bootinfo.mem.nr_banks >= NR_MEM_BANKS ) ? > break; > > bootinfo.mem.bank[]; > > > Also, you may want to add a stub for dt_numa_process_memory_node rather than > #ifdef in the code. > > Cheers, > > -- > Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [Bug] Intel RMRR support with upstream Qemu
On an intel skylake machine with upstream qemu, if I add "rdm=strategy=host, policy=strict" to hvm.cfg, win 8.1 DomU couldn't boot up and continues reboot. Steps to reproduce this issue: 1) Boot xen with iommu=1 to enable iommu 2) hvm.cfg contain: builder="hvm" memory= disk=['win8.1 img'] device_model_override='qemu-system-i386' device_model_version='qemu-xen' rdm="strategy=host,policy=strict" 3) xl cr hvm.cfg Conditions to reproduce this issue: 1) DomU memory size > the top address of RMRR. Otherwise, this issue will disappear. 2) rdm=" strategy=host,policy=strict" should exist 3) Windows DomU. Linux DomU doesn't have such issue. 4) Upstream qemu. Traditional qemu doesn't have such issue. In this situation, hvmloader will relocate some guest ram below RMRR to high memory, and it seems window guest access an invalid address. Could someone give me some suggestions on how to debug this ? thanks ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] xen/link: Move .data.rel.ro sections into .rodata for final link
On 20/07/17 17:54, Wei Liu wrote: On Thu, Jul 20, 2017 at 05:46:50PM +0100, Wei Liu wrote: CC relevant maintainers On Thu, Jul 20, 2017 at 05:20:43PM +0200, David Woodhouse wrote: From: David WoodhouseThis includes stuff lke the hypercall tables which we really want lke -> like to be read-only. And they were going into .data.read-mostly. Signed-off-by: David Woodhouse Reviewed-by: Wei Liu Acked-by: Julien Grall Cheers, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [RFC v3]Proposal to allow setting up shared memory areas between VMs from xl config file
Hi, On 18/07/17 19:30, Zhongze Liu wrote: 1. Motivation and Description Virtual machines use grant table hypercalls to setup a share page for inter-VMs communications. These hypercalls are used by all PV protocols today. However, very simple guests, such as baremetal applications, might not have the infrastructure to handle the grant table. This project is about setting up several shared memory areas for inter-VMs communications directly from the VM config file. So that the guest kernel doesn't have to have grant table support (in the embedded space, this is not unusual) to be able to communicate with other guests. 2. Implementation Plan: == 2.1 Introduce a new VM config option in xl: == 2.1.1 Design Goals ~~~ The shared areas should be shareable among several (>=2) VMs, so every shared physical memory area is assigned to a set of VMs. Therefore, a “token” or “identifier” should be used here to uniquely identify a backing memory area. A string no longer than 128 bytes is used here to serve the purpose. The backing area would be taken from one domain, which we will regard as the "master domain", and this domain should be created prior to any other "slave domain"s. Again, we have to use some kind of tag to tell who is the "master domain". And the ability to specify the permissions and cacheability (and shareability for arm HVM's) of the pages to be shared should be also given to the user. s/arm/ARM/. Furthermore it is called ARM guest and not HVM. 2.2.2 Syntax and Behavior ~ The following example illustrates the syntax of the proposed config entry: In xl config file of vm1: static_shm = [ 'id=ID1, begin=0x10, end=0x20, role=master, arm_shareattr=inner, arm_inner_cacheattr=wb, arm_outer_cacheattr=wb, x86_cacheattr=wb, prot=ro', 'id=ID2, begin=0x30, end=0x40, role=master, arm_shareattr=inner, arm_inner_cacheattr=wb, arm_outer_cacheattr=wb, x86_cacheattr=wb, prot=rw' ] In xl config file of vm2: static_shm = [ 'id=ID1, begin=0x50, end=0x60, role=slave, prot=ro' ] In xl config file of vm3: static_shm = [ 'id=ID2, begin=0x70, end=0x80, role=slave, prot=ro' ] where: @id can be any string that matches the regexp "[^ \t\n,]+" and no logner than 128 characters s/logner/longer/ @begin/endcan be decimals or hexidemicals of the form "0x2". s/hexidemicals/hexadecimals/ @role can only be 'master' or 'slave' @prot can be 'n', 'r', 'ro', 'w', 'wo', 'x', 'xo', 'rw', 'rx', 'wx' or 'rwx'. Default is 'rw'. @arm_shareattrcan be 'inner' our 'outter', this will be ignored and s/outter/outer/. If you really want to support shareability, you want to provide non-shareable too. But I think, as suggested on the answer to Stefano, I would be easier if we provide a set of policies that will configure the guest correctly. This would avoid to do sanity check on the options used by the user. a warning will be printed out to the screen if it is specified in an x86 HVM config file. Default is 'inner' @arm_outer_cacheattr can be 'uc', 'wt', 'wb', 'bufferable' or 'wa', this will be ignored and a warning will be printed out to the screen if it is specified in an x86 HVM config file. Default is 'inner' I guess you took the name from asm-arm/page.h? Those attributes are for stage-1 page-table and not stage-2 (i.e used for translated an intermediate physical address to a physical address). Actually nowhere you explain that this will be used to configure the mapping in stage-2. The possibility to configure the mappings are very different (see D4.5 in ARM DDI0487B.a). You can configure cacheability but not cache allocation hints. For instance wa (write-allocate) is a hint. You will also want to warn the user that this may not prevent memory attribute mismatch depending the the cacheability policy. @arm_inner_cacheattr can be 'uc', 'wt', 'wb', 'bufferable' or 'wa'. Default is 'wb'. @x86_cacheattrcan be 'uc', 'wc', 'wt', 'wp', 'wb' or 'suc'. Default is 'wb'. Besides, the sizes of the areas specified by @begin and @end in the slave domain's config file should be smaller than the corresponding sizes specified in its master's domain. And overlapping backing memory areas are allowed. In the example