Re: [Xen-devel] [PATCH v8 02/13] x86: detect and initialize Intel CAT feature
On 29.05.15 at 04:40, chao.p.p...@linux.intel.com wrote: On Thu, May 28, 2015 at 01:54:39PM +0100, Jan Beulich wrote: On 21.05.15 at 10:41, chao.p.p...@linux.intel.com wrote: + +if ( !cpu_has(c, X86_FEATURE_CAT) ) +return; + +socket = cpu_to_socket(cpu); +if ( test_bit(socket, cat_socket_enable) ) +return; + +cpuid_count(PSR_CPUID_LEVEL_CAT, 0, eax, ebx, ecx, edx); While one would hope that X86_FEATURE_CAT implies the respective CPUID leaf being available, I think explicitly checking this should still be done just like is the case elsewhere. Against cpuid_level? Of course. +static void __init init_psr_cat(void) +{ +if ( opt_cos_max 1 ) +{ +printk(XENLOG_INFO CAT: disabled, cos_max is too small\n); +return; +} Is opt_cos_max == 1 really useful for anything? That means two COSes are available. cos=0 is reserved and cos=1 can still be used anyway. Ah, sorry, this is _max_, not _count_. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v8 01/13] x86: add socket_cpumask
On Fri, May 29, 2015 at 09:01:53AM +0100, Jan Beulich wrote: On 29.05.15 at 04:35, chao.p.p...@linux.intel.com wrote: On Thu, May 28, 2015 at 01:38:05PM +0100, Jan Beulich wrote: On 21.05.15 at 10:41, chao.p.p...@linux.intel.com wrote: --- a/xen/arch/x86/mpparse.c +++ b/xen/arch/x86/mpparse.c @@ -87,6 +87,18 @@ void __init set_nr_cpu_ids(unsigned int max_cpus) #endif } +void __init set_nr_sockets(void) +{ +unsigned int cpus = bitmap_weight(phys_cpu_present_map.mask, + boot_cpu_data.x86_max_cores * + boot_cpu_data.x86_num_siblings); How did you come to this expression for the bitmap size? I.e. why not simply physids_weight(phys_cpu_present_map)? physids_weight(phys_cpu_present_map) gives me cpus for all sockets. While here the 'cpus' is actually _cpus_per_socket_. I used the max possible cpus indicated in cpuid as the upper bound so bitmap_weight() returns the actual available cpus on socket 0. In which case the variable name is badly chosen, or a respective comment is missing. + +if ( cpus == 0 ) +cpus = 1; + +nr_sockets = DIV_ROUND_UP(num_processors + disabled_cpus, cpus); +} Is there a reason why this can't just be added to the end of the immediately preceding set_nr_cpu_ids()? You mean the declaration or invocation? If the former I have no special reason for it (e.g. I can change it). Neither - I just don't see the need for a new function. In which case the invocation of set_nr_cpu_ids() should move to the place where now set_nr_sockets() is invoked, to make sure boot_cpu_data.x86_max_cores/x86_num_siblings available, which may not be your expectation. Chao ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [xen-4.3-testing test] 57458: regressions - trouble: blocked/broken/fail/pass
flight 57458 xen-4.3-testing real [real] http://logs.test-lab.xenproject.org/osstest/logs/57458/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-i386-xend-qemut-winxpsp3 16 guest-stop fail REGR. vs. 53768 Tests which are failing intermittently (not blocking): test-armhf-armhf-libvirt 3 host-install(3) broken pass in 57412 test-amd64-amd64-xl-qemuu-win7-amd64 12 guest-localmigrate fail pass in 57412 Regressions which are regarded as allowable (not blocking): test-amd64-amd64-xl-qemuu-win7-amd64 15 guest-localmigrate/x10 fail in 57412 like 53768 Tests which did not succeed, but are not blocking: test-amd64-amd64-rumpuserxen-amd64 1 build-check(1) blocked n/a test-amd64-i386-rumpuserxen-i386 1 build-check(1) blocked n/a test-armhf-armhf-libvirt 6 xen-boot fail in 57412 never pass test-amd64-i386-xl-qemuu-ovmf-amd64 9 debian-hvm-install fail never pass test-amd64-amd64-xl-qemuu-ovmf-amd64 9 debian-hvm-install fail never pass build-amd64-rumpuserxen 6 xen-buildfail never pass build-i386-rumpuserxen6 xen-buildfail never pass test-armhf-armhf-xl-sedf-pin 6 xen-boot fail never pass test-armhf-armhf-xl-sedf 6 xen-boot fail never pass test-armhf-armhf-xl-multivcpu 6 xen-boot fail never pass test-amd64-i386-libvirt 12 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 6 xen-boot fail never pass test-armhf-armhf-xl-credit2 6 xen-boot fail never pass test-armhf-armhf-xl 6 xen-boot fail never pass test-armhf-armhf-xl-cubietruck 6 xen-boot fail never pass test-amd64-amd64-libvirt 12 migrate-support-checkfail never pass test-amd64-i386-xl-qemut-win7-amd64 16 guest-stop fail never pass test-amd64-amd64-xl-qemut-win7-amd64 16 guest-stop fail never pass test-amd64-i386-xl-qemuu-win7-amd64 16 guest-stop fail never pass version targeted for testing: xen e580a92dd53dbba62518e17a3f4ebd57b626926c baseline version: xen 58db71c5cdd48126b9380c230dc5b61554bad7d8 People who touched revisions under test: Ian Jackson ian.jack...@eu.citrix.com Petr Matousek pmato...@redhat.com jobs: build-amd64 pass build-armhf pass build-i386 pass build-amd64-libvirt pass build-armhf-libvirt pass build-i386-libvirt pass build-amd64-pvopspass build-armhf-pvopspass build-i386-pvops pass build-amd64-rumpuserxen fail build-i386-rumpuserxen fail test-amd64-amd64-xl pass test-armhf-armhf-xl fail test-amd64-i386-xl pass test-amd64-i386-qemut-rhel6hvm-amd pass test-amd64-i386-qemuu-rhel6hvm-amd pass test-amd64-amd64-xl-qemut-debianhvm-amd64pass test-amd64-i386-xl-qemut-debianhvm-amd64 pass test-amd64-amd64-xl-qemuu-debianhvm-amd64pass test-amd64-i386-xl-qemuu-debianhvm-amd64 pass test-amd64-i386-freebsd10-amd64 pass test-amd64-amd64-xl-qemuu-ovmf-amd64 fail test-amd64-i386-xl-qemuu-ovmf-amd64 fail test-amd64-amd64-rumpuserxen-amd64 blocked test-amd64-amd64-xl-qemut-win7-amd64 fail test-amd64-i386-xl-qemut-win7-amd64 fail test-amd64-amd64-xl-qemuu-win7-amd64 fail test-amd64-i386-xl-qemuu-win7-amd64 fail test-armhf-armhf-xl-arndale fail test-amd64-amd64-xl-credit2 pass test-armhf-armhf-xl-credit2 fail test-armhf-armhf-xl-cubietruck fail test-amd64-i386-freebsd10-i386 pass
[Xen-devel] [PATCH] efi: Reallocate memory map if ExitBootServices() fails
If calling ExitBootServices() fails, the memory map size may have increased, so determine the new size and reallocate the memory map before calling GetMemoryMap() again. This was seen on the following machine when using the iscsidxe UEFI driver. The machine would consistently fail the first call to ExitBootServices(). System Information Manufacturer: Supermicro Product Name: X10SLE-F/HF BIOS Information Vendor: American Megatrends Inc. Version: 2.00 Release Date: 04/24/2014 Signed-off-by: Ross Lagerwall ross.lagerw...@citrix.com --- xen/common/efi/boot.c | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/xen/common/efi/boot.c b/xen/common/efi/boot.c index ef8476c..078f9b8 100644 --- a/xen/common/efi/boot.c +++ b/xen/common/efi/boot.c @@ -1053,14 +1053,14 @@ efi_start(EFI_HANDLE ImageHandle, EFI_SYSTEM_TABLE *SystemTable) efi_arch_video_init(gop, info_size, mode_info); } -efi_bs-GetMemoryMap(efi_memmap_size, NULL, map_key, - efi_mdesc_size, mdesc_ver); -efi_memmap = efi_arch_allocate_mmap_buffer(efi_memmap_size); -if ( !efi_memmap ) -blexit(LUnable to allocate memory for EFI memory map); - for ( retry = 0; ; retry = 1 ) { +efi_bs-GetMemoryMap(efi_memmap_size, NULL, map_key, + efi_mdesc_size, mdesc_ver); +efi_memmap = efi_arch_allocate_mmap_buffer(efi_memmap_size); +if ( !efi_memmap ) +blexit(LUnable to allocate memory for EFI memory map); + status = efi_bs-GetMemoryMap(efi_memmap_size, efi_memmap, map_key, efi_mdesc_size, mdesc_ver); if ( EFI_ERROR(status) ) -- 2.1.0 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [RFC][v2][PATCH 00/14] Fix RMRR
On 2015/5/28 15:55, Jan Beulich wrote: On 28.05.15 at 07:48, tiejun.c...@intel.com wrote: On 2015/5/22 17:46, Jan Beulich wrote: On 22.05.15 at 11:35, tiejun.c...@intel.com wrote: As you know all devices are owned by Dom0 firstly before we create any DomU, right? Do we allow Dom0 still own a group device while assign another device in the same group? Clearly not, or - just like anything else putting the security of a system at risk - only at explicit host admin request. You're right. After we discussed internally, we're intending to cover this simply since the case of shared RMRR is a rare case according to our previous experiences. Furthermore, Xen doesn't have a good existing API to directly assign this sort of group devices and even Xen doesn't identify these devices, so currently we always assign devices one by one, right? This means we have to put more efforts to concern a good implementation to address something like, identification, atomic, hotplug and so on. Obviously, this would involve hypervisor and tools at the same time so this has a little bit of difficulty to work along with 4.6. So could we do this separately? #1. Phase 1 to 4.6 #1.1. Do a simple implementation We just prevent from that device assignment if we're assigning this sort of group devices like this, Right. @@ -2291,6 +2291,16 @@ static int intel_iommu_assign_device( PCI_BUS(bdf) == bus PCI_DEVFN2(bdf) == devfn ) { +if ( rmrr-scope.devices_cnt 1 ) +{ +reassign_device_ownership(d, hardware_domain, devfn, pdev); I think if this is really needed here, the check comes too late. So we can do this at the begging of this function @@ -2277,13 +2277,37 @@ static int intel_iommu_assign_device( if ( list_empty(acpi_drhd_units) ) return -ENODEV; +seg = pdev-seg; +bus = pdev-bus; +/* + * In rare cases one given rmrr is shared by multiple devices but + * obviously this would put the security of a system at risk. So + * we should prevent from this sort of device assignment. + * + * TODO: actually we can group these devices which shared rmrr, and + * then allow all devices within a group to be assigned to same domain. + */ +for_each_rmrr_device( rmrr, bdf, i ) +{ +if ( rmrr-segment == seg + PCI_BUS(bdf) == bus + PCI_DEVFN2(bdf) == devfn ) +{ +if ( rmrr-scope.devices_cnt 1 ) +{ +ret = -EPERM; +printk(XENLOG_G_ERR VTDPREFIX +cannot assign this device with shared RMRR for Dom%d (%d)\n, + d-domain_id, ret); +return ret; +} +} +} + ret = reassign_device_ownership(hardware_domain, d, devfn, pdev); if ( ret ) return ret; -seg = pdev-seg; -bus = pdev-bus; - /* Setup rmrr identity mapping */ for_each_rmrr_device( rmrr, bdf, i ) { Thanks Tiejun ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 5/9] x86/intel_pstate: relocate the driver register/unregister function
On 29.05.15 at 04:47, wei.w.w...@intel.com wrote: On 26/05/2015 21:06, Jan Beulich wrote On 13.05.16 at 09:50, wei.w.w...@intel.com wrote: Register/unregister the CPU hotplug notifier when the driver is registered, and move the driver register/unregister function to the cpufreq.c. Without saying why I'm afraid I don't even see much reason to review this in any detail. --- a/xen/drivers/cpufreq/cpufreq.c +++ b/xen/drivers/cpufreq/cpufreq.c @@ -630,12 +630,31 @@ static struct notifier_block cpu_nfb = { .notifier_call = cpu_callback }; -static int __init cpufreq_presmp_init(void) +int cpufreq_register_driver(struct cpufreq_driver *driver_data) { -void *cpu = (void *)(long)smp_processor_id(); -cpu_callback(cpu_nfb, CPU_ONLINE, cpu); Why is this being removed without replacement? I think they are redundant here. If we go and check the hypercall code path (the bottom of set_px_pminfo()), the cpufreq_add_cpu() is called there (inside cpufreq_cpu_init()), too. Why do we need to initialize this CPU twice? If this is indeed being done twice, removing it is fine. But in a separate patch with a proper description. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH] libxl: Don't insert PCI device into xenstore for HVM guests
When doing passthrough of a PCI device for an HVM guest, don't insert the device into xenstore, otherwise pciback attempts to use it which conflicts with QEMU. This manifests itself such that the first time a device is passed to a domain, it succeeds. Subsequent attempts fail unless the device is unbound from pciback or the machine rebooted. Signed-off-by: Ross Lagerwall ross.lagerw...@citrix.com --- tools/libxl/libxl_pci.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c index e0743f8..2552889 100644 --- a/tools/libxl/libxl_pci.c +++ b/tools/libxl/libxl_pci.c @@ -994,7 +994,7 @@ out: } } -if (!starting) +if (!starting type == LIBXL_DOMAIN_TYPE_PV) rc = libxl__device_pci_add_xenstore(gc, domid, pcidev, starting); else rc = 0; -- 2.1.0 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v8 01/13] x86: add socket_cpumask
On 29.05.15 at 04:35, chao.p.p...@linux.intel.com wrote: On Thu, May 28, 2015 at 01:38:05PM +0100, Jan Beulich wrote: On 21.05.15 at 10:41, chao.p.p...@linux.intel.com wrote: --- a/xen/arch/x86/mpparse.c +++ b/xen/arch/x86/mpparse.c @@ -87,6 +87,18 @@ void __init set_nr_cpu_ids(unsigned int max_cpus) #endif } +void __init set_nr_sockets(void) +{ +unsigned int cpus = bitmap_weight(phys_cpu_present_map.mask, + boot_cpu_data.x86_max_cores * + boot_cpu_data.x86_num_siblings); How did you come to this expression for the bitmap size? I.e. why not simply physids_weight(phys_cpu_present_map)? physids_weight(phys_cpu_present_map) gives me cpus for all sockets. While here the 'cpus' is actually _cpus_per_socket_. I used the max possible cpus indicated in cpuid as the upper bound so bitmap_weight() returns the actual available cpus on socket 0. In which case the variable name is badly chosen, or a respective comment is missing. + +if ( cpus == 0 ) +cpus = 1; + +nr_sockets = DIV_ROUND_UP(num_processors + disabled_cpus, cpus); +} Is there a reason why this can't just be added to the end of the immediately preceding set_nr_cpu_ids()? You mean the declaration or invocation? If the former I have no special reason for it (e.g. I can change it). Neither - I just don't see the need for a new function. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 6/9] x86/intel_pstate: the main boby of the intel_pstate driver
On 26/05/2015 21:58, Jan Beulich wrote On 13.05.16 at 09:50, wei.w.w...@intel.com wrote: +static inline int ceiling_fp(int32_t x) { +int mask, ret; Please here and below, consider whether types really need to be signed. One exception: If you intend to import the Linux source file with minimal modifications, and if that indeed results in only very few differences, then keeping the types as they are is probably fine. But in that case you should extend the patch description to say that the goal is to have minimal changes. All comments below are to be taken with the possible minimal change goal in mind. +static int byt_get_min_pstate(void) +{ +u64 value; + +rdmsrl(BYT_RATIOS, value); +return (value 8) 0x7F; +} + +static int byt_get_max_pstate(void) +{ +u64 value; + +rdmsrl(BYT_RATIOS, value); +return (value 16) 0x7F; +} + +static int byt_get_turbo_pstate(void) { +u64 value; + +rdmsrl(BYT_TURBO_RATIOS, value); +return value 0x7F; +} + +static void byt_set_pstate(struct cpudata *cpudata, int pstate) { +u64 val; +int32_t vid_fp; +u32 vid; + +val = pstate 8; +if (limits.no_turbo !limits.turbo_disabled) +val |= (u64)1 32; All of the literal numbers above (and there are more further down) would better become self-documenting manifest constants. +#define BYT_BCLK_FREQS 5 +static int byt_freq_table[BYT_BCLK_FREQS] = { 833, 1000, 1333, 1167, +800}; This can be both const and local to the only function it's being used by. +static struct cpu_defaults core_params = { const? __initconst? +static struct cpu_defaults byt_params = { Same here. +static void intel_pstate_timer_func(void *__data) { +struct cpudata *cpu = (struct cpudata *) __data; Double underscores are completely unnecessary here. +#define ICPU(model, policy) \ +{ X86_VENDOR_INTEL, 6, model, X86_FEATURE_APERFMPERF,\ +(unsigned long)policy } + +static const struct x86_cpu_id intel_pstate_cpu_ids[] = { __initconst +ICPU(0x2a, core_params), +ICPU(0x2d, core_params), +ICPU(0x37, byt_params), +ICPU(0x3a, core_params), +ICPU(0x3c, core_params), +ICPU(0x3d, core_params), +ICPU(0x3e, core_params), +ICPU(0x3f, core_params), +ICPU(0x45, core_params), +ICPU(0x46, core_params), +ICPU(0x47, core_params), +ICPU(0x4c, byt_params), +ICPU(0x4e, core_params), +ICPU(0x4f, core_params), +ICPU(0x56, core_params), Please handle the _params name tag inside ICPU(). +static int intel_pstate_set_policy(struct cpufreq_policy *policy) { +if (!policy-cpuinfo.max_freq) +return -ENODEV; + +if (policy-policy == CPUFREQ_POLICY_PERFORMANCE) { switch(policy-policy) please. +limits.no_turbo = 0; +limits.max_perf_pct = 100; +limits.max_perf = int_tofp(1); +limits.min_perf_pct = 100; +limits.min_perf = int_tofp(1); +policy-max_perf_pct = 100; +policy-min_perf_pct = 100; +return 0; And no need for identical return statement in all four branches. +} else if (policy-policy == CPUFREQ_POLICY_USERSPACE) { +limits.max_perf_pct = max(limits.min_policy_pct, policy- max_perf_pct); +limits.max_perf_pct = min(limits.max_policy_pct, + limits.max_perf_pct); Why are you not using clamp() here? +} else { This should at least have a comment saying CPUFREQ_POLICY_ONDEMAND. +static int intel_pstate_verify_policy(struct cpufreq_policy *policy) +{ +cpufreq_verify_within_limits(policy, policy-cpuinfo.min_freq, + policy-cpuinfo.max_freq); + +if ( policy-policy != CPUFREQ_POLICY_POWERSAVE + policy-policy != CPUFREQ_POLICY_PERFORMANCE + policy-policy != CPUFREQ_POLICY_USERSPACE + policy-policy != CPUFREQ_POLICY_ONDEMAND ) switch() How would we use switch() here? +static int intel_pstate_cpu_init(struct cpufreq_policy *policy) { +struct cpudata *cpu; +int rc; + +rc = intel_pstate_init_cpu(policy-cpu); Having both intel_pstate_cpu_init() and intel_pstate_init_cpu() is at least odd, the more that the latter is being called from only here. Are you suggesting to change the function name? If so, how about changing intel_pstate_cpu_init() to intel_pstate_setup()? +if (rc) +return rc; + +cpu = all_cpu_data[policy-cpu]; +if (limits.min_perf_pct == 100 limits.max_perf_pct == 100) +policy-policy = CPUFREQ_POLICY_PERFORMANCE; +else +policy-policy = CPUFREQ_POLICY_ONDEMAND; + +policy-min = cpu-pstate.min_pstate * cpu-pstate.scaling; +policy-max = cpu-pstate.turbo_pstate * cpu-pstate.scaling; +policy-min_perf_pct = 0; +policy-max_perf_pct = 100;
Re: [Xen-devel] [PATCH v8 03/13] x86: maintain COS to CBM mapping for each socket
On Fri, May 29, 2015 at 09:06:46AM +0100, Jan Beulich wrote: On 29.05.15 at 04:43, chao.p.p...@linux.intel.com wrote: On Thu, May 28, 2015 at 02:17:54PM +0100, Jan Beulich wrote: On 21.05.15 at 10:41, chao.p.p...@linux.intel.com wrote: +static int cat_cpu_init(unsigned int cpu) +{ +int rc; +const struct cpuinfo_x86 *c = cpu_data + cpu; + +if ( !cpu_has(c, X86_FEATURE_CAT) ) +return 0; + +if ( test_bit(cpu_to_socket(cpu), cat_socket_enable) ) +return 0; + +if ( cpu == smp_processor_id() ) +do_cat_cpu_init(rc); +else +on_selected_cpus(cpumask_of(cpu), do_cat_cpu_init, rc, 1); This now being called in the context of CPU_UP_PREPARE, I can't see how this works at all: Neither would the CPU's cpu_data[] instance be initialized by that time, nor would you be able to IPI that CPU, nor can I see how the if() branch could ever get entered. Was this tested at all? Ah, yes! So it sounds really a little difficult to move the memory allocation from CPU_STARTING to CPU_PREPARA for this case. Not sure why you talk about memory allocation again. That should be done in CPU_UP_PREPARE. But stuff that needs to happen on the CPU should happen in CPU_STARTING. The memory allocation's size depending on a CPU characteristic of course makes this a little problematic, but (I think I said so before) since we're assuming symmetry in many other places, I don't see anything wrong with you assuming symmetry here too, and hence use e.g. the boot CPU's value to determine the allocation size. No problem, then I can just forget the support for asymmetry in XEN. Chao ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 6/9] x86/intel_pstate: the main boby of the intel_pstate driver
On 29.05.15 at 10:19, wei.w.w...@intel.com wrote: On 26/05/2015 21:58, Jan Beulich wrote On 13.05.16 at 09:50, wei.w.w...@intel.com wrote: +static int intel_pstate_verify_policy(struct cpufreq_policy *policy) +{ +cpufreq_verify_within_limits(policy, policy-cpuinfo.min_freq, + policy-cpuinfo.max_freq); + +if ( policy-policy != CPUFREQ_POLICY_POWERSAVE + policy-policy != CPUFREQ_POLICY_PERFORMANCE + policy-policy != CPUFREQ_POLICY_USERSPACE + policy-policy != CPUFREQ_POLICY_ONDEMAND ) switch() How would we use switch() here? switch ( policy-policy ) { case CPUFREQ_POLICY_POWERSAVE: etc. But I thought that to be obvious, so I'm not sure I understand what you don't understand. +static int intel_pstate_cpu_init(struct cpufreq_policy *policy) { +struct cpudata *cpu; +int rc; + +rc = intel_pstate_init_cpu(policy-cpu); Having both intel_pstate_cpu_init() and intel_pstate_init_cpu() is at least odd, the more that the latter is being called from only here. Are you suggesting to change the function name? If so, how about changing intel_pstate_cpu_init() to intel_pstate_setup()? Either that, or fold the called function into the caller. +if (rc) +return rc; + +cpu = all_cpu_data[policy-cpu]; +if (limits.min_perf_pct == 100 limits.max_perf_pct == 100) +policy-policy = CPUFREQ_POLICY_PERFORMANCE; +else +policy-policy = CPUFREQ_POLICY_ONDEMAND; + +policy-min = cpu-pstate.min_pstate * cpu-pstate.scaling; +policy-max = cpu-pstate.turbo_pstate * cpu-pstate.scaling; +policy-min_perf_pct = 0; +policy-max_perf_pct = 100; +policy-turbo_pct = get_turbo_pct(); + +/* cpuinfo and default policy values */ +policy-cpuinfo.min_freq = cpu-pstate.min_pstate * cpu- pstate.scaling; +policy-cpuinfo.max_freq = +cpu-pstate.turbo_pstate * cpu-pstate.scaling; +policy-cpuinfo.transition_latency = CPUFREQ_ETERNAL; +cpumask_set_cpu(policy-cpu, policy-cpus); + +/* the first cpu do the init work for limits.min/max_policy_pct */ +if (cpu-cpu == 0) { cpu-cpu == 0 doesn't mean this is the first CPU to come here. How about this one: If (limits.min_policy_pct == 0) { limits.min_policy_pct = limits.xx = } limits.min_policy_pct won't be 0 if it is initialized. If that's guaranteed - fine with me. +static int intel_pstate_msrs_not_valid(void) { +/* Check that all the msr's we are using are valid. */ +u64 aperf, mperf, tmp; + +rdmsrl(MSR_IA32_APERF, aperf); +rdmsrl(MSR_IA32_MPERF, mperf); + +if (!pstate_funcs.get_max() || +!pstate_funcs.get_min() || +!pstate_funcs.get_turbo()) +return -ENODEV; + +rdmsrl(MSR_IA32_APERF, tmp); +if (!(tmp - aperf)) Why not if(tmp == aperf)? And - is it guaranteed that APERF (and MPERF below) incremented in the meantime? Will change it to if (tmp == aperf). According to the SDM, IA32_MPERF MSR (E7H) increments in proportion to a fixed frequency, and IA32_APERF MSR increments in proportion to actual performance. So, as long as the two MSRs are valid, their values will be increased. The in proportion is what makes me nervous: What if the proportion is 1 in every 1000 cycles? +int __init intel_pstate_init(void) +{ +int cpu, rc = 0; +const struct x86_cpu_id *id; +struct cpu_defaults *cpu_info; + +if (cpuid_ecx(6) 0x1) +set_bit(X86_FEATURE_APERFMPERF, + boot_cpu_data.x86_capability); This should be consolidated out of the other cpufreq drivers into cpu/common.c. How about moving it to the bottom of init_intel() in cpu/intel.c ? It's not Intel specific, so it belongs in cpu/common.c. --- a/xen/include/asm-x86/acpi.h +++ b/xen/include/asm-x86/acpi.h @@ -32,6 +32,8 @@ #define COMPILER_DEPENDENT_INT64 long long #define COMPILER_DEPENDENT_UINT64 unsigned long long +extern int intel_pstate_init(void); Why in acpi.h? I was thinking that p-state is part of ACPI. Do you prefer to create a new .h called cpufreq.h, just like the cpuidle.h there? ACPI is a way to convey information about P- (and C- and T-) states, but the latter are imo not tied to ACPI. In fact your patch series here proves the opposite: You add code dealing with P-states without using information coming from ACPI. I think this should go into what currently is acpi/cpufreq/cpufreq.h, which is expected to be moved out of acpi/ anyway (I seem to even recall an ARM side series aiming at doing that). Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v2] x86: Don't crash when mapping a page using EFI runtime page tables
When an interrupt is received during an EFI runtime service call, Xen may call map_domain_page() while using the EFI runtime page tables. This fails because, although the EFI runtime page tables are a copy of the idle domain's page tables, current points at a different domain's vCPU. To fix this, return NULL from mapcache_current_vcpu() when using the EFI runtime page tables which is treated equivalently to running in an idle vCPU. This issue can be reproduced by repeatedly calling GetVariable() from dom0 while using VT-d, since VT-d frequently maps a page from interrupt context. Example call trace: [82d0801615dc] __find_next_zero_bit+0x28/0x60 [82d08016a10e] map_domain_page+0x4c6/0x4eb [82d080156ae6] map_vtd_domain_page+0xd/0xf [82d08015533a] msi_msg_read_remap_rte+0xe3/0x1d8 [82d08014e516] iommu_read_msi_from_ire+0x31/0x34 [82d08016ff6c] set_msi_affinity+0x134/0x17a [82d0801737b5] move_masked_irq+0x5c/0x98 [82d080173816] move_native_irq+0x25/0x36 [82d08016ffcb] ack_nonmaskable_msi_irq+0x19/0x20 [82d08016ffdb] ack_maskable_msi_irq+0x9/0x37 [82d080173e8b] do_IRQ+0x251/0x635 [82d080234502] common_interrupt+0x62/0x70 [df7ed2be] df7ed2be Signed-off-by: Ross Lagerwall ross.lagerw...@citrix.com --- Changed in v2: * Only read CR3 when checking if running on the runtime page tables if inside the efi runtime services lock and on the same CPU that took the lock. xen/arch/x86/domain_page.c | 13 + xen/arch/x86/efi/stub.c| 2 +- xen/common/efi/runtime.c | 10 -- xen/include/xen/efi.h | 2 +- 4 files changed, 19 insertions(+), 8 deletions(-) diff --git a/xen/arch/x86/domain_page.c b/xen/arch/x86/domain_page.c index 5f6f397..7954998 100644 --- a/xen/arch/x86/domain_page.c +++ b/xen/arch/x86/domain_page.c @@ -32,20 +32,25 @@ static inline struct vcpu *mapcache_current_vcpu(void) return NULL; /* + * When using efi runtime page tables, we have the equivalent of the idle + * domain's page tables but current may point at another domain's VCPU. + * Return NULL as though current is not properly set up yet. + */ +if ( efi_enabled efi_rs_using_pgtables() ) +return NULL; + +/* * If guest_table is NULL, and we are running a paravirtualised guest, * then it means we are running on the idle domain's page table and must * therefore use its mapcache. */ if ( unlikely(pagetable_is_null(v-arch.guest_table)) is_pv_vcpu(v) ) { -unsigned long cr3; - /* If we really are idling, perform lazy context switch now. */ if ( (v = idle_vcpu[smp_processor_id()]) == current ) sync_local_execstate(); /* We must now be running on the idle page table. */ -ASSERT((cr3 = read_cr3()) == __pa(idle_pg_table) || - (efi_enabled cr3 == efi_rs_page_table())); +ASSERT(read_cr3() == __pa(idle_pg_table)); } return v; diff --git a/xen/arch/x86/efi/stub.c b/xen/arch/x86/efi/stub.c index 627009f..07c2bd0 100644 --- a/xen/arch/x86/efi/stub.c +++ b/xen/arch/x86/efi/stub.c @@ -12,7 +12,7 @@ void __init efi_init_memory(void) { } void efi_update_l4_pgtable(unsigned int l4idx, l4_pgentry_t l4e) { } -paddr_t efi_rs_page_table(void) +bool_t efi_rs_using_pgtables(void) { BUG(); return 0; diff --git a/xen/common/efi/runtime.c b/xen/common/efi/runtime.c index 5ed8b01..ae87557 100644 --- a/xen/common/efi/runtime.c +++ b/xen/common/efi/runtime.c @@ -30,6 +30,7 @@ const CHAR16 *__read_mostly efi_fw_vendor; const EFI_RUNTIME_SERVICES *__read_mostly efi_rs; #ifndef CONFIG_ARM /* TODO - disabled until implemented on ARM */ static DEFINE_SPINLOCK(efi_rs_lock); +static unsigned int efi_rs_on_cpu = NR_CPUS; #endif UINTN __read_mostly efi_memmap_size; @@ -66,6 +67,8 @@ unsigned long efi_rs_enter(void) spin_lock(efi_rs_lock); +efi_rs_on_cpu = smp_processor_id(); + /* prevent fixup_page_fault() from doing anything */ irq_enter(); @@ -100,13 +103,16 @@ void efi_rs_leave(unsigned long cr3) asm volatile ( lgdt %0 : : m (gdt_desc) ); } irq_exit(); +efi_rs_on_cpu = NR_CPUS; spin_unlock(efi_rs_lock); stts(); } -paddr_t efi_rs_page_table(void) +bool_t efi_rs_using_pgtables(void) { -return efi_l4_pgtable ? virt_to_maddr(efi_l4_pgtable) : 0; +return efi_l4_pgtable + (smp_processor_id() == efi_rs_on_cpu) + (read_cr3() == virt_to_maddr(efi_l4_pgtable)); } unsigned long efi_get_time(void) diff --git a/xen/include/xen/efi.h b/xen/include/xen/efi.h index 48de8e0..e74dad1 100644 --- a/xen/include/xen/efi.h +++ b/xen/include/xen/efi.h @@ -29,7 +29,7 @@ struct xenpf_efi_runtime_call; struct compat_pf_efi_runtime_call; void efi_init_memory(void); -paddr_t efi_rs_page_table(void); +bool_t efi_rs_using_pgtables(void); unsigned long efi_get_time(void); void efi_halt_system(void); void
Re: [Xen-devel] [PATCH v8 03/13] x86: maintain COS to CBM mapping for each socket
On 29.05.15 at 04:43, chao.p.p...@linux.intel.com wrote: On Thu, May 28, 2015 at 02:17:54PM +0100, Jan Beulich wrote: On 21.05.15 at 10:41, chao.p.p...@linux.intel.com wrote: +static int cat_cpu_init(unsigned int cpu) +{ +int rc; +const struct cpuinfo_x86 *c = cpu_data + cpu; + +if ( !cpu_has(c, X86_FEATURE_CAT) ) +return 0; + +if ( test_bit(cpu_to_socket(cpu), cat_socket_enable) ) +return 0; + +if ( cpu == smp_processor_id() ) +do_cat_cpu_init(rc); +else +on_selected_cpus(cpumask_of(cpu), do_cat_cpu_init, rc, 1); This now being called in the context of CPU_UP_PREPARE, I can't see how this works at all: Neither would the CPU's cpu_data[] instance be initialized by that time, nor would you be able to IPI that CPU, nor can I see how the if() branch could ever get entered. Was this tested at all? Ah, yes! So it sounds really a little difficult to move the memory allocation from CPU_STARTING to CPU_PREPARA for this case. Not sure why you talk about memory allocation again. That should be done in CPU_UP_PREPARE. But stuff that needs to happen on the CPU should happen in CPU_STARTING. The memory allocation's size depending on a CPU characteristic of course makes this a little problematic, but (I think I said so before) since we're assuming symmetry in many other places, I don't see anything wrong with you assuming symmetry here too, and hence use e.g. the boot CPU's value to determine the allocation size. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v8 05/13] x86: expose CBM length and COS number information
On 29.05.15 at 04:47, chao.p.p...@linux.intel.com wrote: On Thu, May 28, 2015 at 02:26:03PM +0100, Jan Beulich wrote: --- a/xen/include/public/sysctl.h +++ b/xen/include/public/sysctl.h @@ -694,6 +694,20 @@ struct xen_sysctl_pcitopoinfo { typedef struct xen_sysctl_pcitopoinfo xen_sysctl_pcitopoinfo_t; DEFINE_XEN_GUEST_HANDLE(xen_sysctl_pcitopoinfo_t); +#define XEN_SYSCTL_PSR_CAT_get_l3_info 0 +struct xen_sysctl_psr_cat_op { +uint32_t cmd; /* IN: XEN_SYSCTL_PSR_CAT_* */ +uint32_t target;/* IN: socket to be operated on */ If this is always the socket number, why would the variable be named anything other than socket. If otoh subsequent patches use it differently, I think the comment should be omitted now rather than being dropped then (or it should be given its final wording from the beginning). Or 'target to be operated on'? Fine with me. Just not something that may end up being confusing. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCHv10 3/4] gnttab: make the grant table lock a read-write lock
On 28.05.15 at 18:09, dvra...@cantab.net wrote: On 28/05/15 15:55, Jan Beulich wrote: On 26.05.15 at 20:00, david.vra...@citrix.com wrote: @@ -254,23 +254,23 @@ double_gt_lock(struct grant_table *lgt, struct grant_table *rgt) { if ( lgt rgt ) { -spin_lock(lgt-lock); -spin_lock(rgt-lock); +write_lock(lgt-lock); +write_lock(rgt-lock); } else { if ( lgt != rgt ) -spin_lock(rgt-lock); -spin_lock(lgt-lock); +write_lock(rgt-lock); +write_lock(lgt-lock); } } So I looked at the two uses of double_gt_lock() again: in both cases only a read lock is needed on rgt (which is also the natural thing to expect: we aren't supposed to modify the remote domain's grant table in any way here). Albeit that's contradicting ... See comment below. @@ -568,10 +568,10 @@ static void mapcount( *wrc = *rdc = 0; /* - * Must have the remote domain's grant table lock while counting - * its active entries. + * Must have the remote domain's grant table write lock while + * counting its active entries. */ -ASSERT(spin_is_locked(rd-grant_table-lock)); +ASSERT(rw_is_write_locked(rd-grant_table-lock)); ... this: Why would we need to hold the write lock here? We're not changing anything in rd-grant_table. @@ -837,12 +838,22 @@ __gnttab_map_grant_ref( TRACE_1D(TRC_MEM_PAGE_GRANT_MAP, op-dom); +/* + * All maptrack entry users check mt-flags first before using the + * other fields so just ensure the flags field is stored last. + * + * However, if gnttab_need_iommu_mapping() then this would race + * with a concurrent mapcount() call (on an unmap, for example) + * and a lock is required. + */ mt = maptrack_entry(lgt, handle); mt-domid = op-dom; mt-ref = op-ref; -mt-flags = op-flags; +wmb(); +write_atomic(mt-flags, op-flags); Further, why are only races against mapcount() a problem, but not such against __gnttab_unmap_common() as a whole? I.e. what's the locking around the op-map-flags and op-map-domid accesses below good for? Or, alternatively, isn't this an indication of a problem with the previous patch splitting off the maptrack lock (effectively leaving some map track entry accesses without any guard)? The double_gt_lock() takes both write locks, thus does not race with __gnttab_unmap_common clearing the flag on the maptrack entry which is done while holding the remote read lock. The maptrack entries are items of the local domain, i.e. the state of the remote domain's lock shouldn't matter there at all. Anything else would be extremely counterintuitive and hence prone to future breakage. With that the earlier two comments (above) remain un- addressed too. -double_gt_unlock(lgt, rgt); +if ( gnttab_need_iommu_mapping(ld) ) +double_gt_unlock(lgt, rgt); I think you shouldn't rely on gnttab_need_iommu_mapping() to produce the same result upon this and the earlier invocation, i.e. you ought to latch the first call's result into a local variable. Um. Okay. But if this changes during the life time of a domain it's going to either leak iommu mappings or fail to create them which sounds rather broken to me. Hotplugging a passed through device into a domain can change the outcome of iommu_needed(). I wouldn't be surprised if the combination of mapped grants and passed through devices didn't correctly deal with all (corner) cases, as it seems likely to me that such domains aren't of wide spread use (yet). In particular I don't see arch_iommu_populate_page_table() take care of active grant mappings at all. @@ -2645,7 +2663,7 @@ __gnttab_swap_grant_ref(grant_ref_t ref_a, grant_ref_t ref_b) struct active_grant_entry *act_b = NULL; s16 rc = GNTST_okay; -spin_lock(gt-lock); +write_lock(gt-lock); /* Bounds check on the grant refs */ if ( unlikely(ref_a = nr_grant_entries(d-grant_table))) @@ -2689,7 +2707,7 @@ out: active_entry_release(act_b); if ( act_a != NULL ) active_entry_release(act_a); -spin_unlock(gt-lock); +write_unlock(gt-lock); It would seem to me that these could be dropped when the per-active- entry locks get introduced. I'm not sure what you want dropped here? We require the write lock here because we're taking two active entries at once. Ah, right. But couldn't the write lock then be dropped as soon as the two active entries got locked? Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCHv10 4/4] gnttab: use per-VCPU maptrack free lists
On 28.05.15 at 18:10, dvra...@cantab.net wrote: On 28/05/15 16:39, Jan Beulich wrote: On 26.05.15 at 20:00, david.vra...@citrix.com wrote: --- a/xen/common/grant_table.c +++ b/xen/common/grant_table.c @@ -57,7 +57,7 @@ integer_param(gnttab_max_frames, max_grant_frames); * New options allow to set max_maptrack_frames and * map_grant_table_frames independently. */ -#define DEFAULT_MAX_MAPTRACK_FRAMES 256 +#define DEFAULT_MAX_MAPTRACK_FRAMES 1024 Apart from everything else this again results in ... @@ -1457,6 +1491,17 @@ gnttab_setup_table( gt = d-grant_table; write_lock(gt-lock); +/* Tracking of mapped foreign frames table */ +gt-maptrack = xzalloc_array(struct grant_mapping *, max_maptrack_frames); ... this becoming an order-1 runtime allocation on other than 32-bit ARM. I thought we agreed that this was temporary until vzalloc() was added? Was it this one? And anyway, the vzalloc() addition went in almost two weeks ago. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v8 01/13] x86: add socket_cpumask
On 29.05.15 at 10:28, chao.p.p...@linux.intel.com wrote: On Fri, May 29, 2015 at 09:01:53AM +0100, Jan Beulich wrote: On 29.05.15 at 04:35, chao.p.p...@linux.intel.com wrote: On Thu, May 28, 2015 at 01:38:05PM +0100, Jan Beulich wrote: On 21.05.15 at 10:41, chao.p.p...@linux.intel.com wrote: --- a/xen/arch/x86/mpparse.c +++ b/xen/arch/x86/mpparse.c @@ -87,6 +87,18 @@ void __init set_nr_cpu_ids(unsigned int max_cpus) #endif } +void __init set_nr_sockets(void) +{ +unsigned int cpus = bitmap_weight(phys_cpu_present_map.mask, + boot_cpu_data.x86_max_cores * + boot_cpu_data.x86_num_siblings); + +if ( cpus == 0 ) +cpus = 1; + +nr_sockets = DIV_ROUND_UP(num_processors + disabled_cpus, cpus); +} Is there a reason why this can't just be added to the end of the immediately preceding set_nr_cpu_ids()? You mean the declaration or invocation? If the former I have no special reason for it (e.g. I can change it). Neither - I just don't see the need for a new function. In which case the invocation of set_nr_cpu_ids() should move to the place where now set_nr_sockets() is invoked, to make sure boot_cpu_data.x86_max_cores/x86_num_siblings available, which may not be your expectation. Ah, in which case this _is_ the explanation, albeit only provided the use of the two boot_cpu_data fields has to remain (which I had put under question). And if these have to remain, couldn't this be done in a presmp initcall instead of an explicitly called function? Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH] x86/paging: remove pointless current domain checks
Checking that the subject domain is not the current one is pointless when already having paused that domain: domain_pause() already ASSERT()s this to be the case. Signed-off-by: Jan Beulich jbeul...@suse.com --- a/xen/arch/x86/mm/hap/hap.c +++ b/xen/arch/x86/mm/hap/hap.c @@ -464,13 +464,6 @@ int hap_enable(struct domain *d, u32 mod domain_pause(d); -/* error check */ -if ( (d == current-domain) ) -{ -rv = -EINVAL; -goto out; -} - old_pages = d-arch.paging.hap.total_pages; if ( old_pages == 0 ) { --- a/xen/arch/x86/mm/shadow/common.c +++ b/xen/arch/x86/mm/shadow/common.c @@ -2972,8 +2972,7 @@ int shadow_enable(struct domain *d, u32 domain_pause(d); /* Sanity check the arguments */ -if ( (d == current-domain) || - shadow_mode_enabled(d) || +if ( shadow_mode_enabled(d) || ((mode PG_translate) !(mode PG_refcounts)) || ((mode PG_external) !(mode PG_translate)) ) { x86/paging: remove pointless current domain checks Checking that the subject domain is not the current one is pointless when already having paused that domain: domain_pause() already ASSERT()s this to be the case. Signed-off-by: Jan Beulich jbeul...@suse.com --- a/xen/arch/x86/mm/hap/hap.c +++ b/xen/arch/x86/mm/hap/hap.c @@ -464,13 +464,6 @@ int hap_enable(struct domain *d, u32 mod domain_pause(d); -/* error check */ -if ( (d == current-domain) ) -{ -rv = -EINVAL; -goto out; -} - old_pages = d-arch.paging.hap.total_pages; if ( old_pages == 0 ) { --- a/xen/arch/x86/mm/shadow/common.c +++ b/xen/arch/x86/mm/shadow/common.c @@ -2972,8 +2972,7 @@ int shadow_enable(struct domain *d, u32 domain_pause(d); /* Sanity check the arguments */ -if ( (d == current-domain) || - shadow_mode_enabled(d) || +if ( shadow_mode_enabled(d) || ((mode PG_translate) !(mode PG_refcounts)) || ((mode PG_external) !(mode PG_translate)) ) { ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [RFC PATCH 00/13] Persistent grant maps for xen net drivers
Hi, About rx zerocopy, I have a question: If some application make a socket, then listen and accept, the client sends packets to it, but it doesn't recv from this socket right now, all persistent grant page would be in used. So other application cannot receive any packets. Is my guess right or wrong? YuZhou -Original Message- From: xen-devel-boun...@lists.xen.org [mailto:xen-devel-boun...@lists.xen.org] On Behalf Of Joao Martins Sent: Friday, May 22, 2015 6:27 PM To: Wei Liu Cc: ian.campb...@citrix.com; net...@vger.kernel.org; david.vra...@citrix.com; xen-de...@lists.xenproject.org; boris.ostrov...@oracle.com Subject: Re: [Xen-devel] [RFC PATCH 00/13] Persistent grant maps for xen net drivers On 19 May 2015, at 17:39, Wei Liu wei.l...@citrix.com wrote: On Tue, May 12, 2015 at 07:18:24PM +0200, Joao Martins wrote: There have been recently[3] some discussions and issues raised on persistent grants for the block layer, though the numbers above show some significant improvements specially on more network intensive workloads and provide a margin for comparison against future map/unmap improvements. Any comments or suggestions are welcome, Thanks! Thanks, the numbers certainly look interesting. I'm just a bit concerned about the complexity of netback. I've commented on individual patches, we can discuss the issues there. Thanks a lot for the review! It does add more complexity, mainly for the TX path, but I also would like to mention that a portion of this changeset is also the persistent grants ops that could potentially live outside. Joao [1] http://article.gmane.org/gmane.linux.network/249383 [2] http://bit.ly/1IhJfXD [3] http://lists.xen.org/archives/html/xen-devel/2015-02/msg02292.html Joao Martins (13): xen-netback: add persistent grant tree ops xen-netback: xenbus feature persistent support xen-netback: implement TX persistent grants xen-netback: implement RX persistent grants xen-netback: refactor xenvif_rx_action xen-netback: copy buffer on xenvif_start_xmit() xen-netback: add persistent tree counters to debugfs xen-netback: clone skb if skb-xmit_more is set xen-netfront: move grant_{ref,page} to struct grant xen-netfront: refactor claim/release grant xen-netfront: feature-persistent xenbus support xen-netfront: implement TX persistent grants xen-netfront: implement RX persistent grants drivers/net/xen-netback/common.h| 79 drivers/net/xen-netback/interface.c | 78 +++- drivers/net/xen-netback/netback.c | 873 ++-- drivers/net/xen-netback/xenbus.c| 24 + drivers/net/xen-netfront.c | 362 --- 5 files changed, 1216 insertions(+), 200 deletions(-) -- 2.1.3 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Xen BUG at page_alloc.c:1738 (Xen 4.5)
On Wed, 20 May 2015, Major Hayden wrote: On 05/20/2015 05:41 AM, Jan Beulich wrote: Considering that no-one else is seeing this - is this perhaps connected to you building Xen with pre-release gcc 5.0.1? This is also because in order for the above to indeed occur, mmio_ro_do_page_fault()'s put_page() would need to drop the last reference of a page, yet page_get_owner_and_reference() doesn't obtain a reference when a page is unallocated (and hence unowned), i.e. normally a page would have a refcount of at least 2 here. Hence this would be possible only due to a race, but the exact same race to be observed on different hardware _and_ under an emulator is extremely unlikely. You could try with the xen.gz file from https://copr-be.cloud.fedoraproject.org/results/myoung/xentest/fedora-21-x86_64/xen-4.5.1-0.rc1.fc21/xen-hypervisor-4.5.1-0.rc1.fc21.x86_64.rpm https://copr-be.cloud.fedoraproject.org/results/myoung/xentest/fedora-21-x86_64/xen-4.5.1-0.rc1.fc21/xen-hypervisor-4.5.1-0.rc1.fc21.x86_64.rpm It is roughly the same version of xen but built against Fedora 21 and gcc 4.9.2. If that works then it probably is gcc 5. Greetings, I have run into pretty much the same issue as the original poster. I am running a recently updated Arch Linux system, with GCC 5.1.0, using UEFI and gummiboot to boot. I currently have a build of Xen 4.4.1, built with GCC 4.9.2 from before my last update, that is functioning correctly on this machine. But the builds of Xen 4.5.0, using GCC 5 and mingw64-binutils for the EFI binary, are all failing when Xen starts the Linux kernel, with the same error mentioned in the subject. Below is the boot log I captured via the serial port. http://pastebin.com/bBC78306 Wondering if my specific toolchain was the issue, I downloaded the Fedora 22 version of xen-hypervisor and installed its EFI Xen binary over my compiled binary and received an identical error message, with slightly different addresses in the panic dump. The Fedora version was compiled with GCC 5.0.1. Below is the boot log I captured from that binary. http://pastebin.com/jvg1JazC http://pastebin.com/jvg1JazC After finding this thread, and specifically, the quoted message above, I downloaded that xen-hypervisor package and installed its EFI Xen binary. That binary boots successfully, as seen by the captured boot log below. http://pastebin.com/DKxwaU2U So, while I’m not familiar enough with Xen to begin to have an idea of what could possibly be wrong with Xen or GCC 5 to be causing this bug, I’d like to do what I can to track down the issue so I can get a working build of Xen 4.5. :) Thanks! — Jason Fritcher smime.p7s Description: S/MIME cryptographic signature ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v2 2/4] libxc: print more error messages when failed
No functional changes introduced. Signed-off-by: Wei Liu wei.l...@citrix.com Cc: Ian Campbell ian.campb...@citrix.com Cc: Ian Jackson ian.jack...@eu.citrix.com --- tools/libxc/xc_hvm_build_x86.c | 30 +++--- 1 file changed, 27 insertions(+), 3 deletions(-) diff --git a/tools/libxc/xc_hvm_build_x86.c b/tools/libxc/xc_hvm_build_x86.c index 92422bf..df4b7ed 100644 --- a/tools/libxc/xc_hvm_build_x86.c +++ b/tools/libxc/xc_hvm_build_x86.c @@ -259,7 +259,10 @@ static int setup_guest(xc_interface *xch, memset(elf, 0, sizeof(elf)); if ( elf_init(elf, image, image_size) != 0 ) +{ +PERROR(Could not initialise ELF image); goto error_out; +} xc_elf_set_logfile(xch, elf, 1); @@ -522,15 +525,24 @@ static int setup_guest(xc_interface *xch, DPRINTF( 1GB PAGES: 0x%016lx\n, stat_1gb_pages); if ( loadelfimage(xch, elf, dom, page_array) != 0 ) +{ +PERROR(Could not load ELF image); goto error_out; +} if ( loadmodules(xch, args, m_start, m_end, dom, page_array) != 0 ) -goto error_out; +{ +PERROR(Could not load ACPI modules); +goto error_out; +} if ( (hvm_info_page = xc_map_foreign_range( xch, dom, PAGE_SIZE, PROT_READ | PROT_WRITE, HVM_INFO_PFN)) == NULL ) +{ +PERROR(Could not map hvm info page); goto error_out; +} build_hvm_info(hvm_info_page, args); munmap(hvm_info_page, PAGE_SIZE); @@ -547,7 +559,10 @@ static int setup_guest(xc_interface *xch, } if ( xc_clear_domain_pages(xch, dom, special_pfn(0), NR_SPECIAL_PAGES) ) -goto error_out; +{ +PERROR(Could not clear special pages); +goto error_out; +} xc_hvm_param_set(xch, dom, HVM_PARAM_STORE_PFN, special_pfn(SPECIALPAGE_XENSTORE)); @@ -580,7 +595,10 @@ static int setup_guest(xc_interface *xch, } if ( xc_clear_domain_pages(xch, dom, ioreq_server_pfn(0), NR_IOREQ_SERVER_PAGES) ) -goto error_out; +{ +PERROR(Could not clear ioreq page); +goto error_out; +} /* Tell the domain where the pages are and how many there are */ xc_hvm_param_set(xch, dom, HVM_PARAM_IOREQ_SERVER_PFN, @@ -595,7 +613,10 @@ static int setup_guest(xc_interface *xch, if ( (ident_pt = xc_map_foreign_range( xch, dom, PAGE_SIZE, PROT_READ | PROT_WRITE, special_pfn(SPECIALPAGE_IDENT_PT))) == NULL ) +{ +PERROR(Could not map special page ident_pt); goto error_out; +} for ( i = 0; i PAGE_SIZE / sizeof(*ident_pt); i++ ) ident_pt[i] = ((i 22) | _PAGE_PRESENT | _PAGE_RW | _PAGE_USER | _PAGE_ACCESSED | _PAGE_DIRTY | _PAGE_PSE); @@ -610,7 +631,10 @@ static int setup_guest(xc_interface *xch, char *page0 = xc_map_foreign_range( xch, dom, PAGE_SIZE, PROT_READ | PROT_WRITE, 0); if ( page0 == NULL ) +{ +PERROR(Could not map page0); goto error_out; +} page0[0] = 0xe9; *(uint32_t *)page0[1] = entry_eip - 5; munmap(page0, PAGE_SIZE); -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v2 3/4] libxc: rework vnuma bits in setup_guest
Make the setup process similar to PV counterpart. That is, to allocate a P2M array that covers the whole memory range and start from there. This is clearer than using an array with no holes in it. Also the dummy layout should take MMIO hole into consideration. We might end up having two vmemranges in the dummy layout. Signed-off-by: Wei Liu wei.l...@citrix.com Cc: Ian Campbell ian.campb...@citrix.com Cc: Ian Jackson ian.jack...@eu.citrix.com --- v2: only generate 1 vnode in dummy layout --- tools/libxc/xc_hvm_build_x86.c | 62 -- 1 file changed, 47 insertions(+), 15 deletions(-) diff --git a/tools/libxc/xc_hvm_build_x86.c b/tools/libxc/xc_hvm_build_x86.c index df4b7ed..b3855e0 100644 --- a/tools/libxc/xc_hvm_build_x86.c +++ b/tools/libxc/xc_hvm_build_x86.c @@ -238,6 +238,7 @@ static int setup_guest(xc_interface *xch, { xen_pfn_t *page_array = NULL; unsigned long i, vmemid, nr_pages = args-mem_size PAGE_SHIFT; +unsigned long p2m_size; unsigned long target_pages = args-mem_target PAGE_SHIFT; unsigned long entry_eip, cur_pages, cur_pfn; void *hvm_info_page; @@ -254,8 +255,8 @@ static int setup_guest(xc_interface *xch, xen_pfn_t special_array[NR_SPECIAL_PAGES]; xen_pfn_t ioreq_server_array[NR_IOREQ_SERVER_PAGES]; uint64_t total_pages; -xen_vmemrange_t dummy_vmemrange; -unsigned int dummy_vnode_to_pnode; +xen_vmemrange_t dummy_vmemrange[2]; +unsigned int dummy_vnode_to_pnode[1]; memset(elf, 0, sizeof(elf)); if ( elf_init(elf, image, image_size) != 0 ) @@ -275,17 +276,35 @@ static int setup_guest(xc_interface *xch, if ( args-nr_vmemranges == 0 ) { -/* Build dummy vnode information */ -dummy_vmemrange.start = 0; -dummy_vmemrange.end = args-mem_size; -dummy_vmemrange.flags = 0; -dummy_vmemrange.nid = 0; +/* Build dummy vnode information + * + * Guest physical address space layout: + * [0, hole_start) [hole_start, 4G) [4G, highmem_end) + * + * Of course if there is no high memory, the second vmemrange + * has no effect on the actual result. + */ + +dummy_vmemrange[0].start = 0; +dummy_vmemrange[0].end = args-lowmem_end; +dummy_vmemrange[0].flags = 0; +dummy_vmemrange[0].nid = 0; args-nr_vmemranges = 1; -args-vmemranges = dummy_vmemrange; -dummy_vnode_to_pnode = XC_NUMA_NO_NODE; +if ( args-highmem_end (1ULL 32) ) +{ +dummy_vmemrange[1].start = 1ULL 32; +dummy_vmemrange[1].end = args-highmem_end; +dummy_vmemrange[1].flags = 0; +dummy_vmemrange[1].nid = 0; + +args-nr_vmemranges++; +} + +dummy_vnode_to_pnode[0] = XC_NUMA_NO_NODE; args-nr_vnodes = 1; -args-vnode_to_pnode = dummy_vnode_to_pnode; +args-vmemranges = dummy_vmemrange; +args-vnode_to_pnode = dummy_vnode_to_pnode; } else { @@ -297,9 +316,15 @@ static int setup_guest(xc_interface *xch, } total_pages = 0; +p2m_size = 0; for ( i = 0; i args-nr_vmemranges; i++ ) +{ total_pages += ((args-vmemranges[i].end - args-vmemranges[i].start) PAGE_SHIFT); +p2m_size = p2m_size (args-vmemranges[i].end PAGE_SHIFT) ? +p2m_size : (args-vmemranges[i].end PAGE_SHIFT); +} + if ( total_pages != (args-mem_size PAGE_SHIFT) ) { PERROR(vNUMA memory pages mismatch (0x%PRIx64 != 0x%PRIx64), @@ -325,16 +350,23 @@ static int setup_guest(xc_interface *xch, DPRINTF( TOTAL:%016PRIx64-%016PRIx64\n, v_start, v_end); DPRINTF( ENTRY:%016PRIx64\n, elf_uval(elf, elf.ehdr, e_entry)); -if ( (page_array = malloc(nr_pages * sizeof(xen_pfn_t))) == NULL ) +if ( (page_array = malloc(p2m_size * sizeof(xen_pfn_t))) == NULL ) { PERROR(Could not allocate memory.); goto error_out; } -for ( i = 0; i nr_pages; i++ ) -page_array[i] = i; -for ( i = args-mmio_start PAGE_SHIFT; i nr_pages; i++ ) -page_array[i] += args-mmio_size PAGE_SHIFT; +for ( i = 0; i p2m_size; i++ ) +page_array[i] = ((xen_pfn_t)-1); +for ( vmemid = 0; vmemid args-nr_vmemranges; vmemid++ ) +{ +uint64_t pfn; + +for ( pfn = args-vmemranges[vmemid].start PAGE_SHIFT; + pfn args-vmemranges[vmemid].end PAGE_SHIFT; + pfn++ ) +page_array[pfn] = pfn; +} /* * Try to claim pages for early warning of insufficient memory available. -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v2 4/4] libxl: fix HVM vNUMA
This patch does two thing: The original code erroneously fills in xc_hvm_build_args before generating vmemranges. The effect is that guest memory is populated without vNUMA information. Move the hunk to right place to fix this. Move the subtraction of video ram to libxl__vnuma_build_vmemrange_hvm because it's the central place for generating vmemranges. Reported-by: Boris Ostrovsky boris.ostrov...@oracle.com Signed-off-by: Wei Liu wei.l...@citrix.com Cc: Ian Campbell ian.campb...@citrix.com Cc: Ian Jackson ian.jack...@eu.citrix.com Cc: Dario Faggioli dario.faggi...@citrix.com --- tools/libxl/libxl_dom.c | 32 ++-- tools/libxl/libxl_vnuma.c | 15 ++- 2 files changed, 24 insertions(+), 23 deletions(-) diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c index dccc9ac..867172a 100644 --- a/tools/libxl/libxl_dom.c +++ b/tools/libxl/libxl_dom.c @@ -961,6 +961,16 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid, if (info-num_vnuma_nodes != 0) { int i; +ret = libxl__vnuma_build_vmemrange_hvm(gc, domid, info, state, args); +if (ret) { +LOGEV(ERROR, ret, hvm build vmemranges failed); +goto out; +} +ret = libxl__vnuma_config_check(gc, info, state); +if (ret) goto out; +ret = set_vnuma_info(gc, domid, info, state); +if (ret) goto out; + args.nr_vmemranges = state-num_vmemranges; args.vmemranges = libxl__malloc(gc, sizeof(*args.vmemranges) * args.nr_vmemranges); @@ -972,17 +982,6 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid, args.vmemranges[i].nid = state-vmemranges[i].nid; } -/* Consider video ram belongs to vmemrange 0 -- just shrink it - * by the size of video ram. - */ -if (((args.vmemranges[0].end - args.vmemranges[0].start) 10) - info-video_memkb) { -LOG(ERROR, vmemrange 0 too small to contain video ram); -goto out; -} - -args.vmemranges[0].end -= (info-video_memkb 10); - args.nr_vnodes = info-num_vnuma_nodes; args.vnode_to_pnode = libxl__malloc(gc, sizeof(*args.vnode_to_pnode) * args.nr_vnodes); @@ -996,17 +995,6 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid, goto out; } -if (info-num_vnuma_nodes != 0) { -ret = libxl__vnuma_build_vmemrange_hvm(gc, domid, info, state, args); -if (ret) { -LOGEV(ERROR, ret, hvm build vmemranges failed); -goto out; -} -ret = libxl__vnuma_config_check(gc, info, state); -if (ret) goto out; -ret = set_vnuma_info(gc, domid, info, state); -if (ret) goto out; -} ret = hvm_build_set_params(ctx-xch, domid, info, state-store_port, state-store_mfn, state-console_port, state-console_mfn, state-store_domid, diff --git a/tools/libxl/libxl_vnuma.c b/tools/libxl/libxl_vnuma.c index cac78d7..56856d2 100644 --- a/tools/libxl/libxl_vnuma.c +++ b/tools/libxl/libxl_vnuma.c @@ -257,6 +257,7 @@ int libxl__vnuma_build_vmemrange_hvm(libxl__gc *gc, uint64_t hole_start, hole_end, next; int nid, nr_vmemrange; xen_vmemrange_t *vmemranges; +int rc; /* Derive vmemranges from vnode size and memory hole. * @@ -277,6 +278,16 @@ int libxl__vnuma_build_vmemrange_hvm(libxl__gc *gc, libxl_vnode_info *p = b_info-vnuma_nodes[nid]; uint64_t remaining_bytes = p-memkb 10; +/* Consider video ram belongs to vnode 0 */ +if (nid == 0) { +if (p-memkb b_info-video_memkb) { +LOG(ERROR, vnode 0 too small to contain video ram); +rc = ERROR_INVAL; +goto out; +} +remaining_bytes -= (b_info-video_memkb 10); +} + while (remaining_bytes 0) { uint64_t count = remaining_bytes; @@ -300,7 +311,9 @@ int libxl__vnuma_build_vmemrange_hvm(libxl__gc *gc, state-vmemranges = vmemranges; state-num_vmemranges = nr_vmemrange; -return 0; +rc = 0; +out: +return rc; } /* -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v2 0/4] Fix HVM vNUMA
Boris discovered that HVM vNUMA didn't actually work. This patch series fixes that. The first patch is a prerequisite patch for the actual fixes. The second patch is to help debugging. The fixes are in the last two patches, which can be squashed into one if necessary. Patch 3 is updated in this series. Other patches remain the same as last version. Wei. Wei Liu (4): libxc/libxl: fill xc_hvm_build_args in libxl libxc: print more error messages when failed libxc: rework vnuma bits in setup_guest libxl: fix HVM vNUMA tools/libxc/xc_hvm_build_x86.c | 125 ++--- tools/libxl/libxl_dom.c| 48 tools/libxl/libxl_vnuma.c | 15 - 3 files changed, 119 insertions(+), 69 deletions(-) -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v2 1/4] libxc/libxl: fill xc_hvm_build_args in libxl
When building HVM guests, originally some fields of xc_hvm_build_args are filled in xc_hvm_build (and buried in the wrong function), some are set in libxl__build_hvm before passing xc_hvm_build_args to xc_hvm_build. This is fragile. After examining the code in xc_hvm_build that sets those fields, we can in fact move setting of mmio_start etc in libxl. This way we consolidate memory layout setting in libxl. The setting of firmware data related fields is left in xc_hvm_build because it depends on parsing ELF image. Those fields only point to scratch data that doesn't affect memory layout. There should be no change in the generated guest memory layout. Signed-off-by: Wei Liu wei.l...@citrix.com Cc: Ian Campbell ian.campb...@citrix.com Cc: Ian Jackson ian.jack...@eu.citrix.com --- Cc: Chen, Tiejun tiejun.c...@intel.com This might affect your RMRR patch series. I once said xc_hvm_build would touch various xc_hvm_build_args fields that would affect guest memory layout. It won't be that case anymore with this patch. --- tools/libxc/xc_hvm_build_x86.c | 37 +++-- tools/libxl/libxl_dom.c| 16 2 files changed, 23 insertions(+), 30 deletions(-) diff --git a/tools/libxc/xc_hvm_build_x86.c b/tools/libxc/xc_hvm_build_x86.c index e45ae4a..92422bf 100644 --- a/tools/libxc/xc_hvm_build_x86.c +++ b/tools/libxc/xc_hvm_build_x86.c @@ -88,22 +88,14 @@ static int modules_init(struct xc_hvm_build_args *args, return 0; } -static void build_hvm_info(void *hvm_info_page, uint64_t mem_size, - uint64_t mmio_start, uint64_t mmio_size, +static void build_hvm_info(void *hvm_info_page, struct xc_hvm_build_args *args) { struct hvm_info_table *hvm_info = (struct hvm_info_table *) (((unsigned char *)hvm_info_page) + HVM_INFO_OFFSET); -uint64_t lowmem_end = mem_size, highmem_end = 0; uint8_t sum; int i; -if ( lowmem_end mmio_start ) -{ -highmem_end = (1ull32) + (lowmem_end - mmio_start); -lowmem_end = mmio_start; -} - memset(hvm_info_page, 0, PAGE_SIZE); /* Fill in the header. */ @@ -116,14 +108,10 @@ static void build_hvm_info(void *hvm_info_page, uint64_t mem_size, memset(hvm_info-vcpu_online, 0xff, sizeof(hvm_info-vcpu_online)); /* Memory parameters. */ -hvm_info-low_mem_pgend = lowmem_end PAGE_SHIFT; -hvm_info-high_mem_pgend = highmem_end PAGE_SHIFT; +hvm_info-low_mem_pgend = args-lowmem_end PAGE_SHIFT; +hvm_info-high_mem_pgend = args-highmem_end PAGE_SHIFT; hvm_info-reserved_mem_pgstart = ioreq_server_pfn(0); -args-lowmem_end = lowmem_end; -args-highmem_end = highmem_end; -args-mmio_start = mmio_start; - /* Finish with the checksum. */ for ( i = 0, sum = 0; i hvm_info-length; i++ ) sum += ((uint8_t *)hvm_info)[i]; @@ -251,8 +239,6 @@ static int setup_guest(xc_interface *xch, xen_pfn_t *page_array = NULL; unsigned long i, vmemid, nr_pages = args-mem_size PAGE_SHIFT; unsigned long target_pages = args-mem_target PAGE_SHIFT; -uint64_t mmio_start = (1ull 32) - args-mmio_size; -uint64_t mmio_size = args-mmio_size; unsigned long entry_eip, cur_pages, cur_pfn; void *hvm_info_page; uint32_t *ident_pt; @@ -344,8 +330,8 @@ static int setup_guest(xc_interface *xch, for ( i = 0; i nr_pages; i++ ) page_array[i] = i; -for ( i = mmio_start PAGE_SHIFT; i nr_pages; i++ ) -page_array[i] += mmio_size PAGE_SHIFT; +for ( i = args-mmio_start PAGE_SHIFT; i nr_pages; i++ ) +page_array[i] += args-mmio_size PAGE_SHIFT; /* * Try to claim pages for early warning of insufficient memory available. @@ -446,7 +432,7 @@ static int setup_guest(xc_interface *xch, * range */ !check_mmio_hole(cur_pfn PAGE_SHIFT, SUPERPAGE_1GB_NR_PFNS PAGE_SHIFT, - mmio_start, mmio_size) ) + args-mmio_start, args-mmio_size) ) { long done; unsigned long nr_extents = count SUPERPAGE_1GB_SHIFT; @@ -545,7 +531,7 @@ static int setup_guest(xc_interface *xch, xch, dom, PAGE_SIZE, PROT_READ | PROT_WRITE, HVM_INFO_PFN)) == NULL ) goto error_out; -build_hvm_info(hvm_info_page, v_end, mmio_start, mmio_size, args); +build_hvm_info(hvm_info_page, args); munmap(hvm_info_page, PAGE_SIZE); /* Allocate and clear special pages. */ @@ -661,12 +647,6 @@ int xc_hvm_build(xc_interface *xch, uint32_t domid, if ( args.image_file_name == NULL ) return -1; -if ( args.mem_target == 0 ) -args.mem_target = args.mem_size; - -if ( args.mmio_size == 0 ) -args.mmio_size = HVM_BELOW_4G_MMIO_LENGTH; - /* An HVM guest must be initialised with at least 2MB memory. */ if
Re: [Xen-devel] [PATCH Remus v7 0/3] Remus support for Migration-v2
On Mon, 2015-05-25 at 17:06 +0800, Yang Hongyang wrote: Ping... Acked + Applied. Sorry for the delay, for some reason I thought I was waiting for Andrew to comment even though they all already had Reviewed-by. Ian. On 05/18/2015 03:03 PM, Yang Hongyang wrote: This patchset implement the Remus support for Migration v2 but without memory compressing. Git tree available at: https://github.com/macrosheep/xen/tree/Remus-newmig-v7 v6-v7: - clear deffered_pages after suspend and send dirty pages - initialise allocated_rec_num v5-v6: - refactor send_domain_memory_live() - introduce buffer_record() - remove the records buffer size limit Yang Hongyang (3): libxc/save: refactor of send_domain_memory_live() libxc/save: implement Remus checkpointed save libxc/restore: implement Remus checkpointed restore tools/libxc/xc_sr_common.h | 15 +++ tools/libxc/xc_sr_restore.c | 134 --- tools/libxc/xc_sr_save.c| 217 3 files changed, 295 insertions(+), 71 deletions(-) ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] efi: Reallocate memory map if ExitBootServices() fails
On 29/05/15 08:48, Ross Lagerwall wrote: If calling ExitBootServices() fails, the memory map size may have increased, so determine the new size and reallocate the memory map before calling GetMemoryMap() again. This was seen on the following machine when using the iscsidxe UEFI driver. The machine would consistently fail the first call to ExitBootServices(). System Information Manufacturer: Supermicro Product Name: X10SLE-F/HF BIOS Information Vendor: American Megatrends Inc. Version: 2.00 Release Date: 04/24/2014 Signed-off-by: Ross Lagerwall ross.lagerw...@citrix.com Reviewed-by: Andrew Cooper andrew.coop...@citrix.com --- xen/common/efi/boot.c | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/xen/common/efi/boot.c b/xen/common/efi/boot.c index ef8476c..078f9b8 100644 --- a/xen/common/efi/boot.c +++ b/xen/common/efi/boot.c @@ -1053,14 +1053,14 @@ efi_start(EFI_HANDLE ImageHandle, EFI_SYSTEM_TABLE *SystemTable) efi_arch_video_init(gop, info_size, mode_info); } -efi_bs-GetMemoryMap(efi_memmap_size, NULL, map_key, - efi_mdesc_size, mdesc_ver); -efi_memmap = efi_arch_allocate_mmap_buffer(efi_memmap_size); -if ( !efi_memmap ) -blexit(LUnable to allocate memory for EFI memory map); - for ( retry = 0; ; retry = 1 ) { +efi_bs-GetMemoryMap(efi_memmap_size, NULL, map_key, + efi_mdesc_size, mdesc_ver); +efi_memmap = efi_arch_allocate_mmap_buffer(efi_memmap_size); +if ( !efi_memmap ) +blexit(LUnable to allocate memory for EFI memory map); + status = efi_bs-GetMemoryMap(efi_memmap_size, efi_memmap, map_key, efi_mdesc_size, mdesc_ver); if ( EFI_ERROR(status) ) ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [RFC][PATCH] x86: remove vmalloc.h from asm/io.h
Nothing in asm/io.h uses anything from vmalloc.h, so remove the include and fix up the build problems in an allmodconfig (64 bit and 32 bit) build. This may be the place where x86 builds get vmalloc.h implicitly included and that tends to hide places where vmalloc() et al are added to files but the include of vmalloc.h is forgotten. Cc: Thomas Gleixner t...@linutronix.de Cc: Ingo Molnar mi...@redhat.com Cc: H. Peter Anvin h...@zytor.com Cc: x...@kernel.org Cc: Konrad Rzeszutek Wilk konrad.w...@oracle.com Cc: Boris Ostrovsky boris.ostrov...@oracle.com Cc: David Vrabel david.vra...@citrix.com Cc: Anton Vorontsov an...@enomsg.org Cc: Colin Cross ccr...@android.com Cc: Kees Cook keesc...@chromium.org Cc: Tony Luck tony.l...@intel.com Cc: Rafael J. Wysocki r...@rjwysocki.net Cc: Len Brown l...@kernel.org Cc: Kristen Carlson Accardi kris...@linux.intel.com Cc: Viresh Kumar viresh.ku...@linaro.org Cc: Vinod Koul vinod.k...@intel.com Cc: K. Y. Srinivasan k...@microsoft.com Cc: Haiyang Zhang haiya...@microsoft.com Cc: Hiral Patel hiral...@cisco.com Cc: Suma Ramars sram...@cisco.com Cc: Brian Uchino buch...@cisco.com Cc: James E.J. Bottomley jbottom...@odin.com Cc: Jaroslav Kysela pe...@perex.cz Cc: Takashi Iwai ti...@suse.de Cc: Andrew Morton a...@linux-foundation.org Suggested-by: David Miller da...@davemloft.net Signed-off-by: Stephen Rothwell s...@canb.auug.org.au --- Based in Linus' tree of today. There are probably more places that need vmalloc.h included, but this passes 64 bit and 32 bit allmodconfig builds, so is a place to start. Dave Miller suggested that I start this journey. arch/x86/include/asm/io.h | 2 -- arch/x86/kernel/crash.c| 1 + arch/x86/kernel/machine_kexec_64.c | 1 + arch/x86/mm/pageattr-test.c| 1 + arch/x86/mm/pageattr.c | 1 + arch/x86/xen/p2m.c | 1 + drivers/acpi/apei/erst.c | 1 + drivers/cpufreq/intel_pstate.c | 1 + drivers/dma/mic_x100_dma.c | 1 + drivers/net/hyperv/netvsc.c| 1 + drivers/net/hyperv/rndis_filter.c | 1 + drivers/scsi/fnic/fnic_debugfs.c | 1 + drivers/scsi/fnic/fnic_trace.c | 1 + sound/pci/asihpi/hpioctl.c | 1 + 14 files changed, 13 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h index 34a5b93704d3..5791e7ace9db 100644 --- a/arch/x86/include/asm/io.h +++ b/arch/x86/include/asm/io.h @@ -197,8 +197,6 @@ extern void set_iounmap_nonlazy(void); #include asm-generic/iomap.h -#include linux/vmalloc.h - /* * Convert a virtual cached pointer to an uncached pointer */ diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c index c76d3e37c6e1..e068d6683dba 100644 --- a/arch/x86/kernel/crash.c +++ b/arch/x86/kernel/crash.c @@ -22,6 +22,7 @@ #include linux/elfcore.h #include linux/module.h #include linux/slab.h +#include linux/vmalloc.h #include asm/processor.h #include asm/hardirq.h diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c index 415480d3ea84..11546b462fa6 100644 --- a/arch/x86/kernel/machine_kexec_64.c +++ b/arch/x86/kernel/machine_kexec_64.c @@ -17,6 +17,7 @@ #include linux/ftrace.h #include linux/io.h #include linux/suspend.h +#include linux/vmalloc.h #include asm/init.h #include asm/pgtable.h diff --git a/arch/x86/mm/pageattr-test.c b/arch/x86/mm/pageattr-test.c index 6629f397b467..8ff686aa7e8c 100644 --- a/arch/x86/mm/pageattr-test.c +++ b/arch/x86/mm/pageattr-test.c @@ -9,6 +9,7 @@ #include linux/random.h #include linux/kernel.h #include linux/mm.h +#include linux/vmalloc.h #include asm/cacheflush.h #include asm/pgtable.h diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c index 89af288ec674..bedfc794b4ba 100644 --- a/arch/x86/mm/pageattr.c +++ b/arch/x86/mm/pageattr.c @@ -14,6 +14,7 @@ #include linux/percpu.h #include linux/gfp.h #include linux/pci.h +#include linux/vmalloc.h #include asm/e820.h #include asm/processor.h diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c index b47124d4cd67..8b7f18e200aa 100644 --- a/arch/x86/xen/p2m.c +++ b/arch/x86/xen/p2m.c @@ -67,6 +67,7 @@ #include linux/seq_file.h #include linux/bootmem.h #include linux/slab.h +#include linux/vmalloc.h #include asm/cache.h #include asm/setup.h diff --git a/drivers/acpi/apei/erst.c b/drivers/acpi/apei/erst.c index ed65e9c4b5b0..3670bbab57a3 100644 --- a/drivers/acpi/apei/erst.c +++ b/drivers/acpi/apei/erst.c @@ -35,6 +35,7 @@ #include linux/nmi.h #include linux/hardirq.h #include linux/pstore.h +#include linux/vmalloc.h #include acpi/apei.h #include apei-internal.h diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c index 6414661ac1c4..2ba53f4f6af2 100644 --- a/drivers/cpufreq/intel_pstate.c +++ b/drivers/cpufreq/intel_pstate.c @@ -26,6 +26,7 @@ #include linux/fs.h #include linux/debugfs.h #include linux/acpi.h +#include linux/vmalloc.h #include trace/events/power.h #include asm/div64.h diff --git
Re: [Xen-devel] [PATCH] libxl: Don't insert PCI device into xenstore for HVM guests
On 05/29/2015 10:41 AM, Wei Liu wrote: On Fri, May 29, 2015 at 08:59:45AM +0100, Ross Lagerwall wrote: When doing passthrough of a PCI device for an HVM guest, don't insert the device into xenstore, otherwise pciback attempts to use it which conflicts with QEMU. This manifests itself such that the first time a device is passed to a domain, it succeeds. Subsequent attempts fail unless the device is unbound from pciback or the machine rebooted. The commit message looks sensible to me. However this might break qemu-trad PCI passthrough if I'm not mistaken. What QEMU version are you using? Upstream or trad? Have you tested both of them? qemu-trad. I haven't tested with qemu-upstream. -- Ross Lagerwall ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [xen-unstable test] 56759: regressions - FAIL
On 28/05/15 11:10, Jan Beulich wrote: On 28.05.15 at 11:26, ian.campb...@citrix.com wrote: On Thu, 2015-05-28 at 09:50 +0100, Jan Beulich wrote: On 27.05.15 at 18:04, ian.campb...@citrix.com wrote: On Tue, 2015-05-26 at 14:29 +0100, Ian Campbell wrote: I've now managed to reproduce using the arndale on my desk. ... and now I've confirmed that reverting the spin lock change causes the issue to not happen any more. Considering that this issue has prevented a push for almost two weeks, I think we ought to consider reverting the two offending commits until the problem got sorted out. I think that would probably be wise. I'll try and figure out exactly what is going on and propose some patches ASAP. Now done and pushed. Wait what? This failure is not related to spinlocks; It is a networking behavioural bug (hardware specific, even) which has been uncovered, showing that there is a preexisting race condition. It is not reasonable to revert a correct change because it has exposed an existing race condition elsewhere. IMO, this should have been a force push to mark the test as non-blocking. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] efi: Reallocate memory map if ExitBootServices() fails
On 05/29/2015 10:45 AM, Jan Beulich wrote: On 29.05.15 at 09:48, ross.lagerw...@citrix.com wrote: --- a/xen/common/efi/boot.c +++ b/xen/common/efi/boot.c @@ -1053,14 +1053,14 @@ efi_start(EFI_HANDLE ImageHandle, EFI_SYSTEM_TABLE *SystemTable) efi_arch_video_init(gop, info_size, mode_info); } -efi_bs-GetMemoryMap(efi_memmap_size, NULL, map_key, - efi_mdesc_size, mdesc_ver); -efi_memmap = efi_arch_allocate_mmap_buffer(efi_memmap_size); -if ( !efi_memmap ) -blexit(LUnable to allocate memory for EFI memory map); - for ( retry = 0; ; retry = 1 ) { +efi_bs-GetMemoryMap(efi_memmap_size, NULL, map_key, + efi_mdesc_size, mdesc_ver); +efi_memmap = efi_arch_allocate_mmap_buffer(efi_memmap_size); +if ( !efi_memmap ) +blexit(LUnable to allocate memory for EFI memory map); You can't blexit() anymore after having called ExitBootServices() once. Admittedly even the PrintErrMesg() used for error handling a few lines down is on the edge of being invalid (but this is a best effort thing anyway). OK, I'll convert it into a PrintErrMesg. Further you should do a second allocation only if you positively identified that the new size is larger than what the existing buffer can hold. It may be worth allocating a couple of extra entries the first time through anyway; perhaps that would even avoid th need for this workaround. OK. Since, finally, this is only a workaround, as the spec clearly says: After an Operating System calls ExitBootServices(), firmware boot services are no longer available and it is illegal to call any boot service. Even our re-invocation of GetMemoryMap() is bending that (and I only hesitantly agreed for it to be added). The spec kinda disagrees with that: It is suggested that GetMemoryMap()be called immediately before calling ExitBootServices(). If MapKey value is incorrect, ExitBootServices() returns EFI_INVALID_PARAMETER and GetMemoryMap() with ExitBootServices() must be called again. Firmware implementation may choose to do a partial shutdown of the boot services during the first call to ExitBootServices(). EFI OS loader should not make calls to any boot service function other then GetMemoryMap() after the first call to ExitBootServices(). I think the patch does the right thing in the regard. -- Ross Lagerwall ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Xen BUG at page_alloc.c:1738 (Xen 4.5)
On 29/05/15 07:24, Jason Fritcher wrote: On Wed, 20 May 2015, Major Hayden wrote: / On 05/20/2015 05:41 AM, Jan Beulich wrote:/ / Considering that no-one else is seeing this - is this perhaps connected/ / to you building Xen with pre-release gcc 5.0.1? This is also because in/ / order for the above to indeed occur, mmio_ro_do_page_fault()'s/ / put_page() would need to drop the last reference of a page, yet/ / page_get_owner_and_reference() doesn't obtain a reference when/ / a page is unallocated (and hence unowned), i.e. normally a page/ / would have a refcount of at least 2 here. Hence this would be/ / possible only due to a race, but the exact same race to be observed/ / on different hardware _and_ under an emulator is extremely unlikely./ You could try with the xen.gz file from https://copr-be.cloud.fedoraproject.org/results/myoung/xentest/fedora-21-x86_64/xen-4.5.1-0.rc1.fc21/xen-hypervisor-4.5.1-0.rc1.fc21.x86_64.rpm It is roughly the same version of xen but built against Fedora 21 and gcc 4.9.2. If that works then it probably is gcc 5. Greetings, I have run into pretty much the same issue as the original poster. I am running a recently updated Arch Linux system, with GCC 5.1.0, using UEFI and gummiboot to boot. I currently have a build of Xen 4.4.1, built with GCC 4.9.2 from before my last update, that is functioning correctly on this machine. But the builds of Xen 4.5.0, using GCC 5 and mingw64-binutils for the EFI binary, are all failing when Xen starts the Linux kernel, with the same error mentioned in the subject. Below is the boot log I captured via the serial port. http://pastebin.com/bBC78306 Wondering if my specific toolchain was the issue, I downloaded the Fedora 22 version of xen-hypervisor and installed its EFI Xen binary over my compiled binary and received an identical error message, with slightly different addresses in the panic dump. The Fedora version was compiled with GCC 5.0.1. Below is the boot log I captured from that binary. http://pastebin.com/jvg1JazC After finding this thread, and specifically, the quoted message above, I downloaded that xen-hypervisor package and installed its EFI Xen binary. That binary boots successfully, as seen by the captured boot log below. http://pastebin.com/DKxwaU2U So, while I’m not familiar enough with Xen to begin to have an idea of what could possibly be wrong with Xen or GCC 5 to be causing this bug, I’d like to do what I can to track down the issue so I can get a working build of Xen 4.5. :) Are you in a position to compile identical Xen 4.5 source with two different versions of gcc? (current staging-4.5 staging even has the gcc5 build fix in) If it is a gcc compiler bug, we would expect the version compiled with gcc 4.9 to work fine, but the one compiled with 5 to fail in the identified manor. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] libxl: Don't insert PCI device into xenstore for HVM guests
On Fri, May 29, 2015 at 10:54:09AM +0100, Ross Lagerwall wrote: On 05/29/2015 10:50 AM, Wei Liu wrote: On Fri, May 29, 2015 at 10:43:08AM +0100, Ross Lagerwall wrote: On 05/29/2015 10:41 AM, Wei Liu wrote: On Fri, May 29, 2015 at 08:59:45AM +0100, Ross Lagerwall wrote: When doing passthrough of a PCI device for an HVM guest, don't insert the device into xenstore, otherwise pciback attempts to use it which conflicts with QEMU. This manifests itself such that the first time a device is passed to a domain, it succeeds. Subsequent attempts fail unless the device is unbound from pciback or the machine rebooted. The commit message looks sensible to me. However this might break qemu-trad PCI passthrough if I'm not mistaken. What QEMU version are you using? Upstream or trad? Have you tested both of them? qemu-trad. I haven't tested with qemu-upstream. I don't quite get this. Doesn't qemu-trad depends on those xenstore nodes for PCI passthrough information? What did I miss? A different set of xenstore keys are used for communication between libxl and QEMU. The communication between libxl and QEMU happens further up in the same function: http://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=tools/libxl/libxl_pci.c;h=e0743f8112689b340ba7de88bc8895b62105aaba;hb=HEAD#l901 OK. Now I get the idea. IMHO this piece of code is not in a very good state. The problem is the way it works is very fragile. Now we have three functions, each of which has partial responsibility of writing some xenstore nodes. This is not your fault. Acked-by: Wei Liu wei.l...@citrix.com Regards, -- Ross Lagerwall ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 for Xen 4.6 0/4] Enabling XL to set per-VCPU parameters of a domain for RTDS scheduler
On Tue, 2015-05-26 at 09:59 +0100, Ian Campbell wrote: On Mon, 2015-05-25 at 18:59 -0500, Chong Li wrote: This series arrived in my mailbox as 5 distinct mails. Please use git send-email such that the mails arrive as a single email thread (i.e. each mail as a reply to the previous or to the 0th mail) or arrange for the same thing by hand (I highly recommend using git send-email though) Indeed. BTW, Chong, v1 was threaded ok, so maybe did something different this time when sending the patches. If yes, just don't! :-) Regards, Dario -- This happens because I choose it to happen! (Raistlin Majere) - Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems RD Ltd., Cambridge (UK) signature.asc Description: This is a digitally signed message part ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 1/4] libxc/libxl: fill xc_hvm_build_args in libxl
On 29/05/15 12:37, Wei Liu wrote: When building HVM guests, originally some fields of xc_hvm_build_args are filled in xc_hvm_build (and buried in the wrong function), some are set in libxl__build_hvm before passing xc_hvm_build_args to xc_hvm_build. This is fragile. After examining the code in xc_hvm_build that sets those fields, we can in fact move setting of mmio_start etc in libxl. This way we consolidate memory layout setting in libxl. The setting of firmware data related fields is left in xc_hvm_build because it depends on parsing ELF image. Those fields only point to scratch data that doesn't affect memory layout. There should be no change in the generated guest memory layout. Signed-off-by: Wei Liu wei.l...@citrix.com Cc: Ian Campbell ian.campb...@citrix.com Cc: Ian Jackson ian.jack...@eu.citrix.com --- Cc: Chen, Tiejun tiejun.c...@intel.com This might affect your RMRR patch series. It should at least me noted that this is a semantic change in domain construction for all other toolstacks out there, an aid to the unlucky person forward porting something like Xapi and finding that a chunk of work is no longer performed by libxc. I once said xc_hvm_build would touch various xc_hvm_build_args fields that would affect guest memory layout. It won't be that case anymore with this patch. --- tools/libxc/xc_hvm_build_x86.c | 37 +++-- tools/libxl/libxl_dom.c| 16 2 files changed, 23 insertions(+), 30 deletions(-) diff --git a/tools/libxc/xc_hvm_build_x86.c b/tools/libxc/xc_hvm_build_x86.c index e45ae4a..92422bf 100644 --- a/tools/libxc/xc_hvm_build_x86.c +++ b/tools/libxc/xc_hvm_build_x86.c @@ -88,22 +88,14 @@ static int modules_init(struct xc_hvm_build_args *args, return 0; } -static void build_hvm_info(void *hvm_info_page, uint64_t mem_size, - uint64_t mmio_start, uint64_t mmio_size, +static void build_hvm_info(void *hvm_info_page, struct xc_hvm_build_args *args) { struct hvm_info_table *hvm_info = (struct hvm_info_table *) (((unsigned char *)hvm_info_page) + HVM_INFO_OFFSET); -uint64_t lowmem_end = mem_size, highmem_end = 0; uint8_t sum; int i; -if ( lowmem_end mmio_start ) -{ -highmem_end = (1ull32) + (lowmem_end - mmio_start); -lowmem_end = mmio_start; -} - memset(hvm_info_page, 0, PAGE_SIZE); /* Fill in the header. */ @@ -116,14 +108,10 @@ static void build_hvm_info(void *hvm_info_page, uint64_t mem_size, memset(hvm_info-vcpu_online, 0xff, sizeof(hvm_info-vcpu_online)); /* Memory parameters. */ -hvm_info-low_mem_pgend = lowmem_end PAGE_SHIFT; -hvm_info-high_mem_pgend = highmem_end PAGE_SHIFT; +hvm_info-low_mem_pgend = args-lowmem_end PAGE_SHIFT; +hvm_info-high_mem_pgend = args-highmem_end PAGE_SHIFT; hvm_info-reserved_mem_pgstart = ioreq_server_pfn(0); -args-lowmem_end = lowmem_end; -args-highmem_end = highmem_end; -args-mmio_start = mmio_start; - /* Finish with the checksum. */ for ( i = 0, sum = 0; i hvm_info-length; i++ ) sum += ((uint8_t *)hvm_info)[i]; @@ -251,8 +239,6 @@ static int setup_guest(xc_interface *xch, xen_pfn_t *page_array = NULL; unsigned long i, vmemid, nr_pages = args-mem_size PAGE_SHIFT; unsigned long target_pages = args-mem_target PAGE_SHIFT; -uint64_t mmio_start = (1ull 32) - args-mmio_size; -uint64_t mmio_size = args-mmio_size; unsigned long entry_eip, cur_pages, cur_pfn; void *hvm_info_page; uint32_t *ident_pt; @@ -344,8 +330,8 @@ static int setup_guest(xc_interface *xch, for ( i = 0; i nr_pages; i++ ) page_array[i] = i; -for ( i = mmio_start PAGE_SHIFT; i nr_pages; i++ ) -page_array[i] += mmio_size PAGE_SHIFT; +for ( i = args-mmio_start PAGE_SHIFT; i nr_pages; i++ ) +page_array[i] += args-mmio_size PAGE_SHIFT; /* * Try to claim pages for early warning of insufficient memory available. @@ -446,7 +432,7 @@ static int setup_guest(xc_interface *xch, * range */ !check_mmio_hole(cur_pfn PAGE_SHIFT, SUPERPAGE_1GB_NR_PFNS PAGE_SHIFT, - mmio_start, mmio_size) ) + args-mmio_start, args-mmio_size) ) { long done; unsigned long nr_extents = count SUPERPAGE_1GB_SHIFT; @@ -545,7 +531,7 @@ static int setup_guest(xc_interface *xch, xch, dom, PAGE_SIZE, PROT_READ | PROT_WRITE, HVM_INFO_PFN)) == NULL ) goto error_out; -build_hvm_info(hvm_info_page, v_end, mmio_start, mmio_size, args); +build_hvm_info(hvm_info_page, args); munmap(hvm_info_page, PAGE_SIZE); /* Allocate and clear
Re: [Xen-devel] [PATCH] efi: Reallocate memory map if ExitBootServices() fails
On 29.05.15 at 09:48, ross.lagerw...@citrix.com wrote: --- a/xen/common/efi/boot.c +++ b/xen/common/efi/boot.c @@ -1053,14 +1053,14 @@ efi_start(EFI_HANDLE ImageHandle, EFI_SYSTEM_TABLE *SystemTable) efi_arch_video_init(gop, info_size, mode_info); } -efi_bs-GetMemoryMap(efi_memmap_size, NULL, map_key, - efi_mdesc_size, mdesc_ver); -efi_memmap = efi_arch_allocate_mmap_buffer(efi_memmap_size); -if ( !efi_memmap ) -blexit(LUnable to allocate memory for EFI memory map); - for ( retry = 0; ; retry = 1 ) { +efi_bs-GetMemoryMap(efi_memmap_size, NULL, map_key, + efi_mdesc_size, mdesc_ver); +efi_memmap = efi_arch_allocate_mmap_buffer(efi_memmap_size); +if ( !efi_memmap ) +blexit(LUnable to allocate memory for EFI memory map); You can't blexit() anymore after having called ExitBootServices() once. Admittedly even the PrintErrMesg() used for error handling a few lines down is on the edge of being invalid (but this is a best effort thing anyway). Further you should do a second allocation only if you positively identified that the new size is larger than what the existing buffer can hold. It may be worth allocating a couple of extra entries the first time through anyway; perhaps that would even avoid th need for this workaround. Since, finally, this is only a workaround, as the spec clearly says: After an Operating System calls ExitBootServices(), firmware boot services are no longer available and it is illegal to call any boot service. Even our re-invocation of GetMemoryMap() is bending that (and I only hesitantly agreed for it to be added). Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Question on QEMU restore process
On Tue, May 26, 2015 at 5:49 PM, Ian Campbell ian.campb...@citrix.com wrote: On Tue, 2015-05-26 at 17:38 +0200, Lengyel, Tamas wrote: Hi all, I'm wondering if someone can point me in the right direction. I'm trying to understand the process around domain save/restore using XL. I can see the xl save format and how its appended with the QEMU state aquired via the xen-save-devices-state qmp command (this is for upstream QEMU). However, I have a hard time locating the exact point where that save state is being loaded into the new QEMU process. I see in libxc that it should be dumped into /var/lib/xen/qemu-resume appended by the domain id of the newly created domain. It seems this was passed to QEMU via the -loadvm flag at one point, but on Xen 4.4 I don't see that flag on my QEMU processes when I restore a domain. libxl__build_device_model_args_new appears to pass it as -incoming fd:N where N is an open file descriptor which qemu will inherit from its parent (libxl). Ian. Hi Ian, thanks for the info. What I'm trying to do is full HVM cloning of domains where I would also load up the correct QEMU save state for a clone domain. For this I was thinking of simply creating a clone domain paused (xl create -p -e clone config), copy over hvm+vcpu context of the origin domain and dedup memory using memsharing. Afterwards, I would kill the QEMU instance for the clone that was automatically created by xl, and replace it with a qemu instance that loads the state using -loadvm. Now, this loading of the QEMU instance via -loadvm doesn't seem to work. I tried restoring a saved domain (xl restore -p -e savefile), dumped its QEMU state via QMP, killed the QEMU process and restarted it exactly as it was running before, except with -loadvm qemu-save instead of the -incoming fd option. I get the following error: /usr/local/lib/xen/bin/qemu-system-i386 -xen-domid 8 -chardev socket,id=libxl-cmd,path=/var/run/xen/qmp-libxl-8,server,nowait -no-shutdown -mon chardev=libxl-cmd,mode=control -chardev socket,id=libxenstat-cmd,path=/var/run/xen/qmp-libxenstat-8,server,nowait -mon chardev=libxenstat-cmd,mode=control -nodefaults -name windows7-sp1-x86 -vnc 0.0.0.0:0,to=99 -display none -device cirrus-vga,vgamem_mb=8 -boot order=cd -usb -usbdevice tablet -device e1000,id=nic0,netdev=net0,mac=00:06:5b:ba:7c:02 -netdev type=tap,id=net0,ifname=vif8.0-emu,script=no,downscript=no -loadvm ./qemu-save -machine xenfv -m 1016 -drive file=/dev/l1vg/windows7-sp1-x86,if=ide,index=0,media=disk,format=raw,cache=writeback -daemonize xen be: vkbd-0: initial backend state is wrong (InitWait) qemu: hardware error: xen: failed to populate ram at 3f80 CPU #0: EAX= EBX= ECX= EDX=0663 ESI= EDI= EBP= ESP= EIP=fff0 EFL=0002 [---] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES = 9300 CS =f000 9b00 SS = 9300 DS = 9300 FS = 9300 GS = 9300 LDT= 8200 TR = 8b00 GDT= IDT= CR0=6010 CR2= CR3= CR4= DR0= DR1= DR2= DR3= DR6=0ff0 DR7=0400 EFER= FCW=037f FSW= [ST=0] FTW=00 MXCSR=1f80 FPR0= FPR1= FPR2= FPR3= FPR4= FPR5= FPR6= FPR7= XMM00= XMM01= XMM02= XMM03= XMM04= XMM05= XMM06= XMM07= What step am I missing in here for the (re)creation of the QEMU process? -- [image: www.novetta.com] Tamas K Lengyel Senior Security Researcher 7921 Jones Branch Drive McLean VA 22102 Email tleng...@novetta.com ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v8 05/13] x86: expose CBM length and COS number information
On Thu, 2015-05-28 at 14:26 +0100, Jan Beulich wrote: On 21.05.15 at 10:41, chao.p.p...@linux.intel.com wrote: --- a/xen/include/public/sysctl.h +++ b/xen/include/public/sysctl.h @@ -694,6 +694,20 @@ struct xen_sysctl_pcitopoinfo { typedef struct xen_sysctl_pcitopoinfo xen_sysctl_pcitopoinfo_t; DEFINE_XEN_GUEST_HANDLE(xen_sysctl_pcitopoinfo_t); +#define XEN_SYSCTL_PSR_CAT_get_l3_info 0 +struct xen_sysctl_psr_cat_op { +uint32_t cmd; /* IN: XEN_SYSCTL_PSR_CAT_* */ +uint32_t target;/* IN: socket to be operated on */ If this is always the socket number, why would the variable be named anything other than socket. If otoh subsequent patches use it differently, I think the comment should be omitted now rather than being dropped then (or it should be given its final wording from the beginning). ISTR asking about this interface before (it might have been at the toolstack level, though), and the answer was that it is not subsequent patches _in_this_series_ (of course), but maybe future ones that will modify the 'per-socket-ness' nature of this feature. So, yes, maybe the comment should say something along those lines (and be updated when things change). Dario -- This happens because I choose it to happen! (Raistlin Majere) - Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems RD Ltd., Cambridge (UK) signature.asc Description: This is a digitally signed message part ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH V7] xen/vm_event: Clean up control-register-write vm_events and add XCR0 event
On 28.05.15 at 18:32, rcojoc...@bitdefender.com wrote: On 05/28/2015 07:06 PM, Lengyel, Tamas wrote: That seems a bit odd to me that we now have both a macro and a function with the same name. I think making the macro at least all capital would avoid confusion and make the code a lot more readable. But even then IMHO this wrapper is not helping code readability much. Its really a pain to track down auto-expanding fields in macros when someone is trying to read the code. I think the lowercase version of the wrapper macro of the same name is a Xen coding style convention, but if Jan thinks going uppercase is not a problem that's fine with me. As for the wrapper, being right next to the function it is wrapping makes it hard to ignore when reading the header, and the #define is pretty clear on what it does. Also, while I think that the macro does help with readability (so my preference would be to leave it in), it's just that - a preference - so if Jan also agrees that we should now remove it I'll do that. After all, Afaic it should be kept as is. the macro will probably go out the window (or the first parameter will need to be changed to VM_EVENT_##what instead of VM_EVENT_X86_##what) as soon as ARM control register write events will come into play. It's in an x86-specific header, so why should it need to be changed for ARM? If ARM will gain a similarly named function, the use sites will still all be architecture specific, and hence both declaration and whether or not to have a wrapper macro can remain a per-arch decision. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [RFC][PATCH] x86: remove vmalloc.h from asm/io.h
* Stephen Rothwell s...@canb.auug.org.au wrote: Nothing in asm/io.h uses anything from vmalloc.h, so remove the include and fix up the build problems in an allmodconfig (64 bit and 32 bit) build. This may be the place where x86 builds get vmalloc.h implicitly included and that tends to hide places where vmalloc() et al are added to files but the include of vmalloc.h is forgotten. Good idea. Acked-by: Ingo Molnar mi...@kernel.org Based in Linus' tree of today. There are probably more places that need vmalloc.h included, but this passes 64 bit and 32 bit allmodconfig builds, so is a place to start. Please also test x86 allnoconfig and defconfig 32/64, that tends to unearth the remaining places. People doing randconfig testing will find the rest. Thanks, Ingo ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v8 05/13] x86: expose CBM length and COS number information
On 29.05.15 at 11:23, dario.faggi...@citrix.com wrote: On Fri, 2015-05-29 at 09:07 +0100, Jan Beulich wrote: On 29.05.15 at 04:47, chao.p.p...@linux.intel.com wrote: On Thu, May 28, 2015 at 02:26:03PM +0100, Jan Beulich wrote: --- a/xen/include/public/sysctl.h +++ b/xen/include/public/sysctl.h @@ -694,6 +694,20 @@ struct xen_sysctl_pcitopoinfo { typedef struct xen_sysctl_pcitopoinfo xen_sysctl_pcitopoinfo_t; DEFINE_XEN_GUEST_HANDLE(xen_sysctl_pcitopoinfo_t); +#define XEN_SYSCTL_PSR_CAT_get_l3_info 0 +struct xen_sysctl_psr_cat_op { +uint32_t cmd; /* IN: XEN_SYSCTL_PSR_CAT_* */ +uint32_t target;/* IN: socket to be operated on */ If this is always the socket number, why would the variable be named anything other than socket. If otoh subsequent patches use it differently, I think the comment should be omitted now rather than being dropped then (or it should be given its final wording from the beginning). Or 'target to be operated on'? Fine with me. Just not something that may end up being confusing. So, I really don't want to turn this into pure bikeshedding, but, for a field called 'target', a comment saying 'target to be operated on' seems rather pointless, and I'd go for omitting it (for now). Right - my earlier response was merely meant to say I'm not opposed to a non-confusing comment, not that I see a strict need for a mostly redundant one here. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH] x86/mem_sharing: Relax sanity check for memops
The sharing vm_event ring being enabled is not necessary for mem_sharing memops. Signed-off-by: Tamas K Lengyel tkleng...@sec.in.tum.de --- xen/arch/x86/mm/mem_sharing.c | 4 1 file changed, 4 deletions(-) diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c index 0700f00..16e329e 100644 --- a/xen/arch/x86/mm/mem_sharing.c +++ b/xen/arch/x86/mm/mem_sharing.c @@ -1320,10 +1320,6 @@ int mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg) if ( !hap_enabled(d) || !d-arch.hvm_domain.mem_sharing_enabled ) goto out; -rc = -ENODEV; -if ( unlikely(!d-vm_event-share.ring_page) ) -goto out; - switch ( mso.op ) { case XENMEM_sharing_op_nominate_gfn: -- 2.1.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [xen-unstable test] 56759: regressions - FAIL
On 29.05.15 at 11:56, andrew.coop...@citrix.com wrote: On 28/05/15 11:10, Jan Beulich wrote: On 28.05.15 at 11:26, ian.campb...@citrix.com wrote: On Thu, 2015-05-28 at 09:50 +0100, Jan Beulich wrote: On 27.05.15 at 18:04, ian.campb...@citrix.com wrote: On Tue, 2015-05-26 at 14:29 +0100, Ian Campbell wrote: I've now managed to reproduce using the arndale on my desk. ... and now I've confirmed that reverting the spin lock change causes the issue to not happen any more. Considering that this issue has prevented a push for almost two weeks, I think we ought to consider reverting the two offending commits until the problem got sorted out. I think that would probably be wise. I'll try and figure out exactly what is going on and propose some patches ASAP. Now done and pushed. Wait what? This failure is not related to spinlocks; It is a networking behavioural bug (hardware specific, even) which has been uncovered, showing that there is a preexisting race condition. If Ian gives his okay, I'm fine to re-instate the reverted patches (which incidentally even got a push during the night), but I can't really see the proof of what you claim in any of the earlier communication. It is not reasonable to revert a correct change because it has exposed an existing race condition elsewhere. IMO, this should have been a force push to mark the test as non-blocking. That's one way to view it. I'm not sure a force push would have been warranted here, as the regression was real. And further holding up the tree moving forward would have been bad in that situation too, the more that it was - as said above - almost two weeks that it had been stuck. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [xen-unstable test] 56759: regressions - FAIL
On Fri, 2015-05-29 at 10:56 +0100, Andrew Cooper wrote: On 28/05/15 11:10, Jan Beulich wrote: On 28.05.15 at 11:26, ian.campb...@citrix.com wrote: On Thu, 2015-05-28 at 09:50 +0100, Jan Beulich wrote: On 27.05.15 at 18:04, ian.campb...@citrix.com wrote: On Tue, 2015-05-26 at 14:29 +0100, Ian Campbell wrote: I've now managed to reproduce using the arndale on my desk. ... and now I've confirmed that reverting the spin lock change causes the issue to not happen any more. Considering that this issue has prevented a push for almost two weeks, I think we ought to consider reverting the two offending commits until the problem got sorted out. I think that would probably be wise. I'll try and figure out exactly what is going on and propose some patches ASAP. Now done and pushed. Wait what? This failure is not related to spinlocks; It is a networking behavioural bug (hardware specific, even) which has been uncovered, showing that there is a preexisting race condition. That's the current _hypothesis_, but it hasn't been confirmed what is actually happening here. So far doing the apparently obvious fix in netback (moving the state change to closed until after the uevent is generated) doesn't seem to have fixed the issue. So either the hypothesis is wrong or there is something more subtle going on. We don't know what is causing this issue yet and therefore neither holding up the push gate nor force pushing seem appropriate under the circumstances. It is not reasonable to revert a correct change because it has exposed an existing race condition elsewhere. IMO, this should have been a force push to mark the test as non-blocking. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] xen-netfront sets partial checksum at wrong offset
On 29.05.15 at 12:39, wei.l...@citrix.com wrote: On Fri, May 29, 2015 at 11:34:07AM +0100, Jan Beulich wrote: On 11.05.15 at 19:25, venkat.x.venkatsu...@oracle.com wrote: Please CC the maintainers of the driver. You can get that from 'scripts/get_maintainer.pl' I've done that for you. Thanks, Konrad. I am copying Wei too who had fixed the below problem earlier. It fixed the incorrect ip_hdr(). tcp_hdr() still needs to fixed. commit d554f73df6bc35ac8f6a65e5560bf1d31dfebed9 Author: Wei Liu wei.l...@citrix.com Date: Wed Feb 19 18:48:34 2014 + xen-netfront: reset skb network header before checksum So no response at all so far from the maintainers made me look into this: I first thought what we need would be calls to skb_probe_transport_header() in skb_checksum_setup_ip() after each of the skb_maybe_pull_tail() functions. But skb_partial_csum_set() already calls skb_set_transport_header(), so I now think things ought to be fine without any change. Can you clarify what you think is missing? Or is this an issue in just the old (3.8.x) kernel you're using? I think this is the follow-up thread 20150512013424.ga7...@oracle.com And the conclusion is 3.8 is too old so the fix is not there. Ah, right. Seems like I failed to remove the earlier mail from my list of things to look at when this came through. Sorry for the noise then. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Xen BUG at page_alloc.c:1738 (Xen 4.5)
On 29/05/15 11:50, M A Young wrote: On Fri, 29 May 2015, Andrew Cooper wrote: Are you in a position to compile identical Xen 4.5 source with two different versions of gcc? (current staging-4.5 staging even has the gcc5 build fix in) If it is a gcc compiler bug, we would expect the version compiled with gcc 4.9 to work fine, but the one compiled with 5 to fail in the identified manor. I did a bit of testing - xen-4.5.1-rc1 built on Fedora 22 (gcc5) doesn't boot for me, but if I replace xen.gz with one from the same code built on Fedora 21 (gcc4) then it does boot. There are rpms and build logs available via http://copr.fedoraproject.org/coprs/myoung/xentest/build/93366/ if anyone else wants to do some testing. Michael Young Do you have easy access to xen-syms from each build? ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] x86/paging: remove pointless current domain checks
On 29/05/15 10:09, Jan Beulich wrote: Checking that the subject domain is not the current one is pointless when already having paused that domain: domain_pause() already ASSERT()s this to be the case. Signed-off-by: Jan Beulich jbeul...@suse.com Reviewed-by: Andrew Cooper andrew.coop...@citrix.com --- a/xen/arch/x86/mm/hap/hap.c +++ b/xen/arch/x86/mm/hap/hap.c @@ -464,13 +464,6 @@ int hap_enable(struct domain *d, u32 mod domain_pause(d); -/* error check */ -if ( (d == current-domain) ) -{ -rv = -EINVAL; -goto out; -} - old_pages = d-arch.paging.hap.total_pages; if ( old_pages == 0 ) { --- a/xen/arch/x86/mm/shadow/common.c +++ b/xen/arch/x86/mm/shadow/common.c @@ -2972,8 +2972,7 @@ int shadow_enable(struct domain *d, u32 domain_pause(d); /* Sanity check the arguments */ -if ( (d == current-domain) || - shadow_mode_enabled(d) || +if ( shadow_mode_enabled(d) || ((mode PG_translate) !(mode PG_refcounts)) || ((mode PG_external) !(mode PG_translate)) ) { ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] libxl: Don't insert PCI device into xenstore for HVM guests
On Fri, May 29, 2015 at 08:59:45AM +0100, Ross Lagerwall wrote: When doing passthrough of a PCI device for an HVM guest, don't insert the device into xenstore, otherwise pciback attempts to use it which conflicts with QEMU. This manifests itself such that the first time a device is passed to a domain, it succeeds. Subsequent attempts fail unless the device is unbound from pciback or the machine rebooted. The commit message looks sensible to me. However this might break qemu-trad PCI passthrough if I'm not mistaken. What QEMU version are you using? Upstream or trad? Have you tested both of them? Wei. Signed-off-by: Ross Lagerwall ross.lagerw...@citrix.com --- tools/libxl/libxl_pci.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c index e0743f8..2552889 100644 --- a/tools/libxl/libxl_pci.c +++ b/tools/libxl/libxl_pci.c @@ -994,7 +994,7 @@ out: } } -if (!starting) +if (!starting type == LIBXL_DOMAIN_TYPE_PV) rc = libxl__device_pci_add_xenstore(gc, domid, pcidev, starting); else rc = 0; -- 2.1.0 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] x86/mem_sharing: Relax sanity check for memops
On 29.05.15 at 11:37, tkleng...@sec.in.tum.de wrote: The sharing vm_event ring being enabled is not necessary for mem_sharing memops. If indeed so, why would the same not apply to mem_paging memops? Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] libxl: Don't insert PCI device into xenstore for HVM guests
On 05/29/2015 10:50 AM, Wei Liu wrote: On Fri, May 29, 2015 at 10:43:08AM +0100, Ross Lagerwall wrote: On 05/29/2015 10:41 AM, Wei Liu wrote: On Fri, May 29, 2015 at 08:59:45AM +0100, Ross Lagerwall wrote: When doing passthrough of a PCI device for an HVM guest, don't insert the device into xenstore, otherwise pciback attempts to use it which conflicts with QEMU. This manifests itself such that the first time a device is passed to a domain, it succeeds. Subsequent attempts fail unless the device is unbound from pciback or the machine rebooted. The commit message looks sensible to me. However this might break qemu-trad PCI passthrough if I'm not mistaken. What QEMU version are you using? Upstream or trad? Have you tested both of them? qemu-trad. I haven't tested with qemu-upstream. I don't quite get this. Doesn't qemu-trad depends on those xenstore nodes for PCI passthrough information? What did I miss? A different set of xenstore keys are used for communication between libxl and QEMU. The communication between libxl and QEMU happens further up in the same function: http://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=tools/libxl/libxl_pci.c;h=e0743f8112689b340ba7de88bc8895b62105aaba;hb=HEAD#l901 Regards, -- Ross Lagerwall ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 26/41] arm : acpi add xen environment table
On Thu, 28 May 2015, Jan Beulich wrote: On 28.05.15 at 14:12, stefano.stabell...@eu.citrix.com wrote: On Thu, 28 May 2015, Jan Beulich wrote: On 28.05.15 at 12:58, stefano.stabell...@eu.citrix.com wrote: Let's take a closer look at this table. After the boilierplate, these are the interesting fields: GNT Start, GNT Size Evtchn Intr, Evtchn Intr Flags After the table, it is clearly stated: The Grant Table region is optional. and The Event Channel Interrupt is optional. So I think there is no problem: we don't want to pass any info in that table? Sure, let's not pass any. We can still use it to flag the presence of Xen on the platform. Even more so a reason to use what base ACPI has, without any custom table. Could you please make a concrete suggestion with table names and fields? I already pointed you at 6.0's new FADT field Hypervisor Vendor Identity. OK, that is a decent alternative. We don't have a suitable hypercall to retrieve the evtchn PPI but we could add one. Overall I still prefer the table approach, but if you really don't want it, I can live with the hypercalls. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] xen-netfront sets partial checksum at wrong offset
On Fri, May 29, 2015 at 11:34:07AM +0100, Jan Beulich wrote: On 11.05.15 at 19:25, venkat.x.venkatsu...@oracle.com wrote: Please CC the maintainers of the driver. You can get that from 'scripts/get_maintainer.pl' I've done that for you. Thanks, Konrad. I am copying Wei too who had fixed the below problem earlier. It fixed the incorrect ip_hdr(). tcp_hdr() still needs to fixed. commit d554f73df6bc35ac8f6a65e5560bf1d31dfebed9 Author: Wei Liu wei.l...@citrix.com Date: Wed Feb 19 18:48:34 2014 + xen-netfront: reset skb network header before checksum So no response at all so far from the maintainers made me look into this: I first thought what we need would be calls to skb_probe_transport_header() in skb_checksum_setup_ip() after each of the skb_maybe_pull_tail() functions. But skb_partial_csum_set() already calls skb_set_transport_header(), so I now think things ought to be fine without any change. Can you clarify what you think is missing? Or is this an issue in just the old (3.8.x) kernel you're using? I think this is the follow-up thread 20150512013424.ga7...@oracle.com And the conclusion is 3.8 is too old so the fix is not there. (In either case netback's xenvif_tx_submit() calling skb_probe_transport_header() would seem pointless, as skb_checksum_setup() - with or without a fix - ought to be taking care of this anyway.) Patches welcome. :-) Wei. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH] osstest: reduce FreeBSD install timeouts
Only the first block is expected to take longer (because it decompresses the image and writes it to a LVM volume), the remaining commands should execute much faster, so reduce the timeout. Signed-off-by: Roger Pau Monné roger@citrix.com Cc: Ian Jackson ian.jack...@eu.citrix.com Cc: Ian Campbell ian.campb...@citrix.com --- I've checked this with the output of OSSTest, and those blocks execute immediately, OSSTest doesn't even report execution times on them: http://logs.test-lab.xenproject.org/osstest/logs/57419/test-amd64-i386-freebsd10-amd64/9.ts-freebsd-install.log --- ts-freebsd-install | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/ts-freebsd-install b/ts-freebsd-install index 0d6eb0c..98dad24 100755 --- a/ts-freebsd-install +++ b/ts-freebsd-install @@ -67,7 +67,7 @@ sub prep () { mount -t ufs -o ufstype=ufs2,rw $rootpartition_dev $mnt END -target_cmd_root($ho, END, 900); +target_cmd_root($ho, END, 15); mkdir -p $mnt/root/.ssh cat 'ENDKEYS' $mnt/root/.ssh/authorized_keys $authkeys @@ -75,7 +75,7 @@ ENDKEYS END -target_cmd_root($ho, END, 900); +target_cmd_root($ho, END, 30); echo 'sshd_enable=YES' $mnt/etc/rc.conf echo 'ifconfig_xn0=DHCP' $mnt/etc/rc.conf echo 'PermitRootLogin yes' $mnt/etc/ssh/sshd_config -- 1.9.5 (Apple Git-50.3) ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 0/4] Fix HVM vNUMA
Ping... On Mon, May 18, 2015 at 04:34:48PM +0100, Wei Liu wrote: Boris discovered that HVM vNUMA didn't actually work. This patch series fixes that. The first patch is a prerequisite patch for the actual fixes. The second patch is to help debugging. The fixes are in the last two patches, which can be squashed into one if necessary. Wei. Wei Liu (4): libxc/libxl: fill xc_hvm_build_args in libxl libxc: print more error messages when failed libxc: rework vnuma bits in setup_guest libxl: fix HVM vNUMA tools/libxc/xc_hvm_build_x86.c | 129 ++--- tools/libxl/libxl_dom.c| 48 --- tools/libxl/libxl_vnuma.c | 15 - 3 files changed, 122 insertions(+), 70 deletions(-) -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Xen BUG at page_alloc.c:1738 (Xen 4.5)
On Fri, 29 May 2015, Andrew Cooper wrote: Are you in a position to compile identical Xen 4.5 source with two different versions of gcc? (current staging-4.5 staging even has the gcc5 build fix in) If it is a gcc compiler bug, we would expect the version compiled with gcc 4.9 to work fine, but the one compiled with 5 to fail in the identified manor. I did a bit of testing - xen-4.5.1-rc1 built on Fedora 22 (gcc5) doesn't boot for me, but if I replace xen.gz with one from the same code built on Fedora 21 (gcc4) then it does boot. There are rpms and build logs available via http://copr.fedoraproject.org/coprs/myoung/xentest/build/93366/ if anyone else wants to do some testing. Michael Young___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Xen BUG at page_alloc.c:1738 (Xen 4.5)
On Fri, 29 May 2015, Andrew Cooper wrote: On 29/05/15 11:50, M A Young wrote: On Fri, 29 May 2015, Andrew Cooper wrote: Are you in a position to compile identical Xen 4.5 source with two different versions of gcc? (current staging-4.5 staging even has the gcc5 build fix in) If it is a gcc compiler bug, we would expect the version compiled with gcc 4.9 to work fine, but the one compiled with 5 to fail in the identified manor. I did a bit of testing - xen-4.5.1-rc1 built on Fedora 22 (gcc5) doesn't boot for me, but if I replace xen.gz with one from the same code built on Fedora 21 (gcc4) then it does boot. There are rpms and build logs available via http://copr.fedoraproject.org/coprs/myoung/xentest/build/93366/ if anyone else wants to do some testing. Michael Young Do you have easy access to xen-syms from each build? Yes. Michael Young ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v8 05/13] x86: expose CBM length and COS number information
On Fri, 2015-05-29 at 09:07 +0100, Jan Beulich wrote: On 29.05.15 at 04:47, chao.p.p...@linux.intel.com wrote: On Thu, May 28, 2015 at 02:26:03PM +0100, Jan Beulich wrote: --- a/xen/include/public/sysctl.h +++ b/xen/include/public/sysctl.h @@ -694,6 +694,20 @@ struct xen_sysctl_pcitopoinfo { typedef struct xen_sysctl_pcitopoinfo xen_sysctl_pcitopoinfo_t; DEFINE_XEN_GUEST_HANDLE(xen_sysctl_pcitopoinfo_t); +#define XEN_SYSCTL_PSR_CAT_get_l3_info 0 +struct xen_sysctl_psr_cat_op { +uint32_t cmd; /* IN: XEN_SYSCTL_PSR_CAT_* */ +uint32_t target;/* IN: socket to be operated on */ If this is always the socket number, why would the variable be named anything other than socket. If otoh subsequent patches use it differently, I think the comment should be omitted now rather than being dropped then (or it should be given its final wording from the beginning). Or 'target to be operated on'? Fine with me. Just not something that may end up being confusing. So, I really don't want to turn this into pure bikeshedding, but, for a field called 'target', a comment saying 'target to be operated on' seems rather pointless, and I'd go for omitting it (for now). Dario -- This happens because I choose it to happen! (Raistlin Majere) - Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems RD Ltd., Cambridge (UK) signature.asc Description: This is a digitally signed message part ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] libxl: Don't insert PCI device into xenstore for HVM guests
On Fri, May 29, 2015 at 10:43:08AM +0100, Ross Lagerwall wrote: On 05/29/2015 10:41 AM, Wei Liu wrote: On Fri, May 29, 2015 at 08:59:45AM +0100, Ross Lagerwall wrote: When doing passthrough of a PCI device for an HVM guest, don't insert the device into xenstore, otherwise pciback attempts to use it which conflicts with QEMU. This manifests itself such that the first time a device is passed to a domain, it succeeds. Subsequent attempts fail unless the device is unbound from pciback or the machine rebooted. The commit message looks sensible to me. However this might break qemu-trad PCI passthrough if I'm not mistaken. What QEMU version are you using? Upstream or trad? Have you tested both of them? qemu-trad. I haven't tested with qemu-upstream. I don't quite get this. Doesn't qemu-trad depends on those xenstore nodes for PCI passthrough information? What did I miss? Wei. -- Ross Lagerwall ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] xen-netfront sets partial checksum at wrong offset
On 11.05.15 at 19:25, venkat.x.venkatsu...@oracle.com wrote: Please CC the maintainers of the driver. You can get that from 'scripts/get_maintainer.pl' I've done that for you. Thanks, Konrad. I am copying Wei too who had fixed the below problem earlier. It fixed the incorrect ip_hdr(). tcp_hdr() still needs to fixed. commit d554f73df6bc35ac8f6a65e5560bf1d31dfebed9 Author: Wei Liu wei.l...@citrix.com Date: Wed Feb 19 18:48:34 2014 + xen-netfront: reset skb network header before checksum So no response at all so far from the maintainers made me look into this: I first thought what we need would be calls to skb_probe_transport_header() in skb_checksum_setup_ip() after each of the skb_maybe_pull_tail() functions. But skb_partial_csum_set() already calls skb_set_transport_header(), so I now think things ought to be fine without any change. Can you clarify what you think is missing? Or is this an issue in just the old (3.8.x) kernel you're using? (In either case netback's xenvif_tx_submit() calling skb_probe_transport_header() would seem pointless, as skb_checksum_setup() - with or without a fix - ought to be taking care of this anyway.) Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] x86/mem_sharing: Relax sanity check for memops
On Fri, May 29, 2015 at 11:54 AM, Jan Beulich jbeul...@suse.com wrote: On 29.05.15 at 11:37, tkleng...@sec.in.tum.de wrote: The sharing vm_event ring being enabled is not necessary for mem_sharing memops. If indeed so, why would the same not apply to mem_paging memops? Jan The ring during mem_sharing is only used to signal an out-of-memory error condition. If no listener is present for this error condition, Xen will automatically kill the domain that was requesting more memory when none is available during unsharing. For paging, the listener is an absolute requirement as without it paging won't work at all. That's not the case for sharing. Tamas ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 26/41] arm : acpi add xen environment table
On 29.05.15 at 12:31, stefano.stabell...@eu.citrix.com wrote: On Thu, 28 May 2015, Jan Beulich wrote: On 28.05.15 at 14:12, stefano.stabell...@eu.citrix.com wrote: Could you please make a concrete suggestion with table names and fields? I already pointed you at 6.0's new FADT field Hypervisor Vendor Identity. OK, that is a decent alternative. We don't have a suitable hypercall to retrieve the evtchn PPI but we could add one. Overall I still prefer the table approach, but if you really don't want it, I can live with the hypercalls. I'm clearly not in the position to force any design decision onto the ARM side of Xen. But I continue to be of the opinion that custom tables should be a last resort approach only, which isn't warranted here (anymore). Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] efi: Reallocate memory map if ExitBootServices() fails
On 29.05.15 at 11:57, ross.lagerw...@citrix.com wrote: On 05/29/2015 10:45 AM, Jan Beulich wrote: On 29.05.15 at 09:48, ross.lagerw...@citrix.com wrote: --- a/xen/common/efi/boot.c +++ b/xen/common/efi/boot.c @@ -1053,14 +1053,14 @@ efi_start(EFI_HANDLE ImageHandle, EFI_SYSTEM_TABLE *SystemTable) efi_arch_video_init(gop, info_size, mode_info); } -efi_bs-GetMemoryMap(efi_memmap_size, NULL, map_key, - efi_mdesc_size, mdesc_ver); -efi_memmap = efi_arch_allocate_mmap_buffer(efi_memmap_size); -if ( !efi_memmap ) -blexit(LUnable to allocate memory for EFI memory map); - for ( retry = 0; ; retry = 1 ) { +efi_bs-GetMemoryMap(efi_memmap_size, NULL, map_key, + efi_mdesc_size, mdesc_ver); +efi_memmap = efi_arch_allocate_mmap_buffer(efi_memmap_size); +if ( !efi_memmap ) +blexit(LUnable to allocate memory for EFI memory map); You can't blexit() anymore after having called ExitBootServices() once. Admittedly even the PrintErrMesg() used for error handling a few lines down is on the edge of being invalid (but this is a best effort thing anyway). OK, I'll convert it into a PrintErrMesg. Conditionally upon being in the second iteration. Since, finally, this is only a workaround, as the spec clearly says: After an Operating System calls ExitBootServices(), firmware boot services are no longer available and it is illegal to call any boot service. Even our re-invocation of GetMemoryMap() is bending that (and I only hesitantly agreed for it to be added). The spec kinda disagrees with that: It is suggested that GetMemoryMap()be called immediately before calling ExitBootServices(). If MapKey value is incorrect, ExitBootServices() returns EFI_INVALID_PARAMETER and GetMemoryMap() with ExitBootServices() must be called again. Firmware implementation may choose to do a partial shutdown of the boot services during the first call to ExitBootServices(). EFI OS loader should not make calls to any boot service function other then GetMemoryMap() after the first call to ExitBootServices(). This was the grounds on which the current retry loop was added. I think the patch does the right thing in the regard. For x86 yes, since efi_arch_allocate_mmap_buffer() doesn't use boot services there. For ARM no, due to its use of AllocatePool(). Iirc Daniel was also planning on changing the x86 side too, and such a change would break things in non-obvious ways. Hence I think depending on the ability to do another allocation after the first ExitBootServices() call is not really a good idea. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 0/4] Fix HVM vNUMA
On Fri, 2015-05-29 at 11:46 +0100, Wei Liu wrote: Ping... FWIW, I had a look at the patches and they seemed fine, sorry to not having said that before. TBH, that was partly because I was expecting a v2 to show up, fixing the (trivial) issue you identified together with Boris, and was planning to 'Reviewed-by' that... Dario On Mon, May 18, 2015 at 04:34:48PM +0100, Wei Liu wrote: Boris discovered that HVM vNUMA didn't actually work. This patch series fixes that. The first patch is a prerequisite patch for the actual fixes. The second patch is to help debugging. The fixes are in the last two patches, which can be squashed into one if necessary. Wei. Wei Liu (4): libxc/libxl: fill xc_hvm_build_args in libxl libxc: print more error messages when failed libxc: rework vnuma bits in setup_guest libxl: fix HVM vNUMA tools/libxc/xc_hvm_build_x86.c | 129 ++--- tools/libxl/libxl_dom.c| 48 --- tools/libxl/libxl_vnuma.c | 15 - 3 files changed, 122 insertions(+), 70 deletions(-) -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel -- This happens because I choose it to happen! (Raistlin Majere) - Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems RD Ltd., Cambridge (UK) signature.asc Description: This is a digitally signed message part ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [RFC][PATCH] x86: remove vmalloc.h from asm/io.h
Hi Ingo, On Fri, 29 May 2015 11:21:05 +0200 Ingo Molnar mi...@kernel.org wrote: Good idea. Acked-by: Ingo Molnar mi...@kernel.org Thanks. Please also test x86 allnoconfig and defconfig 32/64, that tends to unearth the remaining places. People doing randconfig testing will find the rest. Good idea. the allnoconfigs produced this further patch. I will squash it into the original. The defconfigs built ok. From: Stephen Rothwell s...@canb.auug.org.au Date: Fri, 29 May 2015 22:01:41 +1000 Subject: [PATCH] x86: more fixes for removing vmalloc.h fron asm/io.h Signed-off-by: Stephen Rothwell s...@canb.auug.org.au --- arch/x86/include/asm/io.h | 1 + include/linux/io.h| 1 + 2 files changed, 2 insertions(+) diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h index 5791e7ace9db..2a3543a4db1d 100644 --- a/arch/x86/include/asm/io.h +++ b/arch/x86/include/asm/io.h @@ -40,6 +40,7 @@ #include linux/compiler.h #include asm/page.h #include asm/early_ioremap.h +#include asm/pgtable_types.h #define build_mmio_read(name, size, type, reg, barrier) \ static inline type name(const volatile void __iomem *addr) \ diff --git a/include/linux/io.h b/include/linux/io.h index 986f2bffea1e..cb753a2450b8 100644 --- a/include/linux/io.h +++ b/include/linux/io.h @@ -19,6 +19,7 @@ #define _LINUX_IO_H #include linux/types.h +#include linux/init.h #include asm/io.h #include asm/page.h -- 2.1.4 -- Cheers, Stephen Rothwells...@canb.auug.org.au pgpHAjsDQXDnb.pgp Description: OpenPGP digital signature ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [RFC][PATCH] x86: remove vmalloc.h from asm/io.h
At Fri, 29 May 2015 19:18:47 +1000, Stephen Rothwell wrote: Nothing in asm/io.h uses anything from vmalloc.h, so remove the include and fix up the build problems in an allmodconfig (64 bit and 32 bit) build. This may be the place where x86 builds get vmalloc.h implicitly included and that tends to hide places where vmalloc() et al are added to files but the include of vmalloc.h is forgotten. Cc: Thomas Gleixner t...@linutronix.de Cc: Ingo Molnar mi...@redhat.com Cc: H. Peter Anvin h...@zytor.com Cc: x...@kernel.org Cc: Konrad Rzeszutek Wilk konrad.w...@oracle.com Cc: Boris Ostrovsky boris.ostrov...@oracle.com Cc: David Vrabel david.vra...@citrix.com Cc: Anton Vorontsov an...@enomsg.org Cc: Colin Cross ccr...@android.com Cc: Kees Cook keesc...@chromium.org Cc: Tony Luck tony.l...@intel.com Cc: Rafael J. Wysocki r...@rjwysocki.net Cc: Len Brown l...@kernel.org Cc: Kristen Carlson Accardi kris...@linux.intel.com Cc: Viresh Kumar viresh.ku...@linaro.org Cc: Vinod Koul vinod.k...@intel.com Cc: K. Y. Srinivasan k...@microsoft.com Cc: Haiyang Zhang haiya...@microsoft.com Cc: Hiral Patel hiral...@cisco.com Cc: Suma Ramars sram...@cisco.com Cc: Brian Uchino buch...@cisco.com Cc: James E.J. Bottomley jbottom...@odin.com Cc: Jaroslav Kysela pe...@perex.cz Cc: Takashi Iwai ti...@suse.de For the sound bits, Acked-by: Takashi Iwai ti...@suse.de thanks, Takashi Cc: Andrew Morton a...@linux-foundation.org Suggested-by: David Miller da...@davemloft.net Signed-off-by: Stephen Rothwell s...@canb.auug.org.au --- Based in Linus' tree of today. There are probably more places that need vmalloc.h included, but this passes 64 bit and 32 bit allmodconfig builds, so is a place to start. Dave Miller suggested that I start this journey. arch/x86/include/asm/io.h | 2 -- arch/x86/kernel/crash.c| 1 + arch/x86/kernel/machine_kexec_64.c | 1 + arch/x86/mm/pageattr-test.c| 1 + arch/x86/mm/pageattr.c | 1 + arch/x86/xen/p2m.c | 1 + drivers/acpi/apei/erst.c | 1 + drivers/cpufreq/intel_pstate.c | 1 + drivers/dma/mic_x100_dma.c | 1 + drivers/net/hyperv/netvsc.c| 1 + drivers/net/hyperv/rndis_filter.c | 1 + drivers/scsi/fnic/fnic_debugfs.c | 1 + drivers/scsi/fnic/fnic_trace.c | 1 + sound/pci/asihpi/hpioctl.c | 1 + 14 files changed, 13 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h index 34a5b93704d3..5791e7ace9db 100644 --- a/arch/x86/include/asm/io.h +++ b/arch/x86/include/asm/io.h @@ -197,8 +197,6 @@ extern void set_iounmap_nonlazy(void); #include asm-generic/iomap.h -#include linux/vmalloc.h - /* * Convert a virtual cached pointer to an uncached pointer */ diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c index c76d3e37c6e1..e068d6683dba 100644 --- a/arch/x86/kernel/crash.c +++ b/arch/x86/kernel/crash.c @@ -22,6 +22,7 @@ #include linux/elfcore.h #include linux/module.h #include linux/slab.h +#include linux/vmalloc.h #include asm/processor.h #include asm/hardirq.h diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c index 415480d3ea84..11546b462fa6 100644 --- a/arch/x86/kernel/machine_kexec_64.c +++ b/arch/x86/kernel/machine_kexec_64.c @@ -17,6 +17,7 @@ #include linux/ftrace.h #include linux/io.h #include linux/suspend.h +#include linux/vmalloc.h #include asm/init.h #include asm/pgtable.h diff --git a/arch/x86/mm/pageattr-test.c b/arch/x86/mm/pageattr-test.c index 6629f397b467..8ff686aa7e8c 100644 --- a/arch/x86/mm/pageattr-test.c +++ b/arch/x86/mm/pageattr-test.c @@ -9,6 +9,7 @@ #include linux/random.h #include linux/kernel.h #include linux/mm.h +#include linux/vmalloc.h #include asm/cacheflush.h #include asm/pgtable.h diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c index 89af288ec674..bedfc794b4ba 100644 --- a/arch/x86/mm/pageattr.c +++ b/arch/x86/mm/pageattr.c @@ -14,6 +14,7 @@ #include linux/percpu.h #include linux/gfp.h #include linux/pci.h +#include linux/vmalloc.h #include asm/e820.h #include asm/processor.h diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c index b47124d4cd67..8b7f18e200aa 100644 --- a/arch/x86/xen/p2m.c +++ b/arch/x86/xen/p2m.c @@ -67,6 +67,7 @@ #include linux/seq_file.h #include linux/bootmem.h #include linux/slab.h +#include linux/vmalloc.h #include asm/cache.h #include asm/setup.h diff --git a/drivers/acpi/apei/erst.c b/drivers/acpi/apei/erst.c index ed65e9c4b5b0..3670bbab57a3 100644 --- a/drivers/acpi/apei/erst.c +++ b/drivers/acpi/apei/erst.c @@ -35,6 +35,7 @@ #include linux/nmi.h #include linux/hardirq.h #include linux/pstore.h +#include linux/vmalloc.h #include acpi/apei.h #include apei-internal.h diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
Re: [Xen-devel] [PATCH v2 1/4] libxc/libxl: fill xc_hvm_build_args in libxl
On Fri, May 29, 2015 at 01:29:18PM +0100, Andrew Cooper wrote: On 29/05/15 12:37, Wei Liu wrote: When building HVM guests, originally some fields of xc_hvm_build_args are filled in xc_hvm_build (and buried in the wrong function), some are set in libxl__build_hvm before passing xc_hvm_build_args to xc_hvm_build. This is fragile. After examining the code in xc_hvm_build that sets those fields, we can in fact move setting of mmio_start etc in libxl. This way we consolidate memory layout setting in libxl. The setting of firmware data related fields is left in xc_hvm_build because it depends on parsing ELF image. Those fields only point to scratch data that doesn't affect memory layout. There should be no change in the generated guest memory layout. Signed-off-by: Wei Liu wei.l...@citrix.com Cc: Ian Campbell ian.campb...@citrix.com Cc: Ian Jackson ian.jack...@eu.citrix.com --- Cc: Chen, Tiejun tiejun.c...@intel.com This might affect your RMRR patch series. It should at least me noted that this is a semantic change in domain construction for all other toolstacks out there, an aid to the unlucky person forward porting something like Xapi and finding that a chunk of work is no longer performed by libxc. Right. I will write this in the commit message if I end up sending v3. Wei. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] ACPI shutdown unreliable with win7?
On Fri, 2015-05-22 at 10:08 +0100, Ian Campbell wrote: If win7 doesn't shutdown given a power button request I'd be more inclined to remove the setting in osstest for those flights and let guest-stop go back to being never pass than to start making such changes to the VM config which I think would probably break the preceding suspend and migration tests (which aren't completely reliable, but are far more so than this shutdown one). Does anyone have any ideas here or shall I propose: -8-- From 2d1b814e65676c3cf56ce2e569491953607f53f7 Mon Sep 17 00:00:00 2001 From: Ian Campbell ian.campb...@citrix.com Date: Fri, 29 May 2015 13:51:33 +0100 Subject: [PATCH] Turn off acpi_shutdown for windows 7 As described in 1432284841.10746.136.ca...@citrix.com / http://lists.xen.org/archives/html/xen-devel/2015-05/msg03016.html Windows 7 does not appear to reliably actually shutdown when asked to via the ACPI power button. Once this patch is applied some force pushes will likely be needed in order for this to not appear as a regression. Signed-off-by: Ian Campbell ian.campb...@citrix.com Cc: Andrew Cooper andrew.coop...@citrix.com Cc: Paul Durrant paul.durr...@citrix.com Cc: Ian Jackson ian.jack...@eu.citrix.com, Cc: Jan Beulich jbeul...@suse.com --- Alternatively could make it an allowed/non-blocking failure? --- make-flight | 1 - 1 file changed, 1 deletion(-) diff --git a/make-flight b/make-flight index 8a1fceb..5120891 100755 --- a/make-flight +++ b/make-flight @@ -206,7 +206,6 @@ do_hvm_win7_x64_tests () { job_create_test test-$xenarch$kern-$dom0arch-xl$qemuu_suffix-win7-amd64 \ test-win xl $xenarch $dom0arch $qemuu_runvar \ win_image=win7-x64.iso \ -win_acpi_shutdown=true \ all_hostflags=$most_hostflags,hvm } -- 2.1.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 for Xen 4.6 1/4] xen: enabling XL to set per-VCPU parameters of a domain for RTDS scheduler
On Wed, 2015-05-27 at 12:41 +0100, Jan Beulich wrote: On 26.05.15 at 19:18, lichong...@gmail.com wrote: On Tue, May 26, 2015 at 4:08 AM, Jan Beulich jbeul...@suse.com wrote: On 26.05.15 at 02:05, lichong...@gmail.com wrote: Add two hypercalls(XEN_DOMCTL_SCHEDOP_getvcpuinfo/putvcpuinfo) to get/set a domain's per-VCPU parameters. Hypercalls are handled by newly added hook (.adjust_vcpu) in the scheduler interface. Add a new data structure (struct xen_domctl_scheduler_vcpu_op) for transferring data between tool and hypervisor. Signed-off-by: Chong Li chong...@wustl.edu Signed-off-by: Meng Xu men...@cis.upenn.edu Signed-off-by: Sisu Xi xis...@gmail.com --- Missing brief description of changes in v2 here. Based on Dario's suggestion in PATCH v1, we think it would be better to make the per-vcpu get/set functionalities more general, rather than just serving for rtds scheduler. So, the changes in v2 are: 1) Add a new hypercall, XEN_DOMCTL_scheduler_vcpu_op. Any scheduler can use it for per-vcpu get/set. There is a new data structure (struct xen_domctl_scheduler_vcpu_op), which helps transferring data (per-vcpu params) between tool and hypervisor when the hypercall is invoked. 2) The new hypercall is handled by function sched_adjust_vcpu (in schedule.c), which would call a newly added hook (named as .adjust_vcpu, added to the scheduler interface (struct scheduler)). 3) In RTDS scheduler, .adjust_vcpu hooks a function defined in sched_rt.c, named as rt_vcpu_cntl. This function gets/sets the per-vcpu params. No, that looks to be the description of the patch as a whole, not one of the delta between v1 and v2 (which is what I was asking for). + +switch ( op-cmd ) +{ +case XEN_DOMCTL_SCHEDOP_getvcpuinfo: +spin_lock_irqsave(prv-lock, flags); +list_for_each( iter, sdom-vcpu ) +{ +svc = list_entry(iter, struct rt_vcpu, sdom_elem); +vcpuid = svc-vcpu-vcpu_id; + +local_sched.budget = svc-budget / MICROSECS(1); +local_sched.period = svc-period / MICROSECS(1); +if ( copy_to_guest_offset(op-u.rtds.vcpus, vcpuid, What makes you think that the caller has provided enough slots? This code works in a similar way as the one for XEN_SYSCTL_numainfo. So if there is no enough slot in guest space, en error code is returned. I don't think I saw any place you returned an error here. The only error being returned is -EFAULT when the copy-out fails. Look again at XEN_SYSCTL_numainfo, in particular, at this part: if ( ni-num_nodes num_nodes ) { ret = -ENOBUFS; i = num_nodes; } else i = 0 Which, as you'll appreciate if you check from where ni-num_nodes and num_nodes come from, if where the boundary checking we're asking for happens. Note how the 'i' variable is used in the rest of the block, and you'll see what we mean, and can do something similar. +local_sched, 1) ) You're leaking hypervisor stack contents here, but by reading the structure from guest memory first to honor its vcpuid field (this making get match put) you'd avoid this anyway. The structure in guest memory has nothing, and it will hold per-vcpu params after this XEN_DOMCTL_SCHEDOP_getvcpuinfo hypercall is well served. Does leaking hypervisor stack mean that I need to call local_sched.vcpuid = vcpuid before copying this local_sched to the guest memory? Not just that. You'd also need to make sure padding space is cleared. Indeed. But as said, I suppose you want to copy in what the guest provided anyway, in which case the leak is implicitly gone. Chong, you're saying above The structure in guest memory has nothing. AFAICT, what Jan means when he says make get match put, is why don't you, for XEN_DOMCTL_SCHEDOP_getvcpuinfo, allow the caller to provide an array with fewer elements than the number of vcpus, and put in the vcpuid field of those elements, the id-s of the vcpus she wants to receive information upon. I also was about to make a similar request, as it makes things more consistent, between get and put. After all, if we think that it is valuable to allow batching (i.e., having an hypercall that returns the parameters of _a_bunch_ of vcpus, not just of one), and that it makes sense to allow that bunch of vcpus to be incomplete and sparse, this is true for both the directions, isn't it? +} +hypercall_preempt_check(); ??? We've talked about this in patch v1 (http://lists.xenproject.org/archives/html/xen-devel/2015-05/msg01152.html). When a domain has too many VCPUs, we need to make sure the spin lock does not last for too long. Right, but the call above does nothing like that. Please look at other uses of it to understand what needs to be done here. This being said, would it make sense to put down a
Re: [Xen-devel] ACPI shutdown unreliable with win7?
On Fri, 2015-05-29 at 14:14 +0100, Andrew Cooper wrote: On 29/05/15 14:04, Jan Beulich wrote: On 29.05.15 at 14:54, ian.campb...@citrix.com wrote: On Fri, 2015-05-22 at 10:08 +0100, Ian Campbell wrote: If win7 doesn't shutdown given a power button request I'd be more inclined to remove the setting in osstest for those flights and let guest-stop go back to being never pass than to start making such changes to the VM config which I think would probably break the preceding suspend and migration tests (which aren't completely reliable, but are far more so than this shutdown one). Does anyone have any ideas here or shall I propose: Unless we have a way to make an adjustment inside the guest for the power button to gain shutdown meaning, I think there's no alternative to the change below. You can avoid advertising S3/S4 in the ACPI tables, which iirc causes the same alteration to happen. Hvmloader uses the platform/acpi_s{3,4} booleans to control whether the relevant SSDTs are exposed. As I said in 1432285699.10746.147.ca...@citrix.com and again in reply to Jan just now, this test does sometimes pass. So whether or not we are exposing the correct set of tables to the guest there is something else non-deterministic going on here I think. I'd also be concerned about the impact on other tests (in particular the SRM ones) on disabling s3 and/or s4. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [rumpuserxen test] 57518: regressions - FAIL
flight 57518 rumpuserxen real [real] http://logs.test-lab.xenproject.org/osstest/logs/57518/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: build-amd64-rumpuserxen 5 rumpuserxen-build fail REGR. vs. 33866 build-i386-rumpuserxen5 rumpuserxen-build fail REGR. vs. 33866 Tests which did not succeed, but are not blocking: test-amd64-amd64-rumpuserxen-amd64 1 build-check(1) blocked n/a test-amd64-i386-rumpuserxen-i386 1 build-check(1) blocked n/a version targeted for testing: rumpuserxen 3b91e44996ea6ae1276bce1cc44f38701c53ee6f baseline version: rumpuserxen 30d72f3fc5e35cd53afd82c8179cc0e0b11146ad People who touched revisions under test: Antti Kantee po...@iki.fi Ian Jackson ian.jack...@eu.citrix.com Martin Lucina mar...@lucina.net Wei Liu l...@liuw.name jobs: build-amd64-xsm pass build-i386-xsm pass build-amd64 pass build-i386 pass build-amd64-pvopspass build-i386-pvops pass build-amd64-rumpuserxen fail build-i386-rumpuserxen fail test-amd64-amd64-rumpuserxen-amd64 blocked test-amd64-i386-rumpuserxen-i386 blocked sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Not pushing. (No revision log; it would be 535 lines long.) ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] ACPI shutdown unreliable with win7?
On 29.05.15 at 14:54, ian.campb...@citrix.com wrote: On Fri, 2015-05-22 at 10:08 +0100, Ian Campbell wrote: If win7 doesn't shutdown given a power button request I'd be more inclined to remove the setting in osstest for those flights and let guest-stop go back to being never pass than to start making such changes to the VM config which I think would probably break the preceding suspend and migration tests (which aren't completely reliable, but are far more so than this shutdown one). Does anyone have any ideas here or shall I propose: Unless we have a way to make an adjustment inside the guest for the power button to gain shutdown meaning, I think there's no alternative to the change below. Jan -8-- From 2d1b814e65676c3cf56ce2e569491953607f53f7 Mon Sep 17 00:00:00 2001 From: Ian Campbell ian.campb...@citrix.com Date: Fri, 29 May 2015 13:51:33 +0100 Subject: [PATCH] Turn off acpi_shutdown for windows 7 As described in 1432284841.10746.136.ca...@citrix.com / http://lists.xen.org/archives/html/xen-devel/2015-05/msg03016.html Windows 7 does not appear to reliably actually shutdown when asked to via the ACPI power button. Once this patch is applied some force pushes will likely be needed in order for this to not appear as a regression. Signed-off-by: Ian Campbell ian.campb...@citrix.com Cc: Andrew Cooper andrew.coop...@citrix.com Cc: Paul Durrant paul.durr...@citrix.com Cc: Ian Jackson ian.jack...@eu.citrix.com, Cc: Jan Beulich jbeul...@suse.com --- Alternatively could make it an allowed/non-blocking failure? --- make-flight | 1 - 1 file changed, 1 deletion(-) diff --git a/make-flight b/make-flight index 8a1fceb..5120891 100755 --- a/make-flight +++ b/make-flight @@ -206,7 +206,6 @@ do_hvm_win7_x64_tests () { job_create_test test-$xenarch$kern-$dom0arch-xl$qemuu_suffix-win7-amd64 \ test-win xl $xenarch $dom0arch $qemuu_runvar \ win_image=win7-x64.iso \ -win_acpi_shutdown=true \ all_hostflags=$most_hostflags,hvm } -- 2.1.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] ACPI shutdown unreliable with win7?
On Fri, 2015-05-29 at 14:04 +0100, Jan Beulich wrote: On 29.05.15 at 14:54, ian.campb...@citrix.com wrote: On Fri, 2015-05-22 at 10:08 +0100, Ian Campbell wrote: If win7 doesn't shutdown given a power button request I'd be more inclined to remove the setting in osstest for those flights and let guest-stop go back to being never pass than to start making such changes to the VM config which I think would probably break the preceding suspend and migration tests (which aren't completely reliable, but are far more so than this shutdown one). Does anyone have any ideas here or shall I propose: Unless we have a way to make an adjustment inside the guest for the power button to gain shutdown meaning, I think there's no alternative to the change below. The strange this is that it does work _sometimes_, either by complete coincidence or because there is something non-deterministic about how Win7 reacts to this ACPI event. Does anyone know which setting we would want to change? In particular it would be good if I knew what to look for to check the current status... Jan -8-- From 2d1b814e65676c3cf56ce2e569491953607f53f7 Mon Sep 17 00:00:00 2001 From: Ian Campbell ian.campb...@citrix.com Date: Fri, 29 May 2015 13:51:33 +0100 Subject: [PATCH] Turn off acpi_shutdown for windows 7 As described in 1432284841.10746.136.ca...@citrix.com / http://lists.xen.org/archives/html/xen-devel/2015-05/msg03016.html Windows 7 does not appear to reliably actually shutdown when asked to via the ACPI power button. Once this patch is applied some force pushes will likely be needed in order for this to not appear as a regression. Signed-off-by: Ian Campbell ian.campb...@citrix.com Cc: Andrew Cooper andrew.coop...@citrix.com Cc: Paul Durrant paul.durr...@citrix.com Cc: Ian Jackson ian.jack...@eu.citrix.com, Cc: Jan Beulich jbeul...@suse.com --- Alternatively could make it an allowed/non-blocking failure? --- make-flight | 1 - 1 file changed, 1 deletion(-) diff --git a/make-flight b/make-flight index 8a1fceb..5120891 100755 --- a/make-flight +++ b/make-flight @@ -206,7 +206,6 @@ do_hvm_win7_x64_tests () { job_create_test test-$xenarch$kern-$dom0arch-xl$qemuu_suffix-win7-amd64 \ test-win xl $xenarch $dom0arch $qemuu_runvar \ win_image=win7-x64.iso \ -win_acpi_shutdown=true \ all_hostflags=$most_hostflags,hvm } -- 2.1.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 for Xen 4.6 1/4] xen: enabling XL to set per-VCPU parameters of a domain for RTDS scheduler
On 29.05.15 at 15:01, dario.faggi...@citrix.com wrote: On Wed, 2015-05-27 at 12:41 +0100, Jan Beulich wrote: On 26.05.15 at 19:18, lichong...@gmail.com wrote: On Tue, May 26, 2015 at 4:08 AM, Jan Beulich jbeul...@suse.com wrote: On 26.05.15 at 02:05, lichong...@gmail.com wrote: + +switch ( op-cmd ) +{ +case XEN_DOMCTL_SCHEDOP_getvcpuinfo: +spin_lock_irqsave(prv-lock, flags); +list_for_each( iter, sdom-vcpu ) +{ +svc = list_entry(iter, struct rt_vcpu, sdom_elem); +vcpuid = svc-vcpu-vcpu_id; + +local_sched.budget = svc-budget / MICROSECS(1); +local_sched.period = svc-period / MICROSECS(1); +if ( copy_to_guest_offset(op-u.rtds.vcpus, vcpuid, What makes you think that the caller has provided enough slots? This code works in a similar way as the one for XEN_SYSCTL_numainfo. So if there is no enough slot in guest space, en error code is returned. I don't think I saw any place you returned an error here. The only error being returned is -EFAULT when the copy-out fails. Look again at XEN_SYSCTL_numainfo, in particular, at this part: if ( ni-num_nodes num_nodes ) { ret = -ENOBUFS; i = num_nodes; } else i = 0 Which, as you'll appreciate if you check from where ni-num_nodes and num_nodes come from, if where the boundary checking we're asking for happens. Note how the 'i' variable is used in the rest of the block, and you'll see what we mean, and can do something similar. I think this was targeted at Chong rather than me (while I was listed as To, and Chong only on Cc)? +} +hypercall_preempt_check(); ??? We've talked about this in patch v1 (http://lists.xenproject.org/archives/html/xen-devel/2015-05/msg01152.html). When a domain has too many VCPUs, we need to make sure the spin lock does not last for too long. Right, but the call above does nothing like that. Please look at other uses of it to understand what needs to be done here. This being said, would it make sense to put down a threshold, below which we don't need to check for preemption (nor to ask to reissue the hypercall)? Something like, if we're dealing with a request for the parameters of N (== 16 ? 32 ?) vcpus, it's fine to do them all at once. Above that limit, we just do bunches of N vcpus, and then do a preempt check. What do you think Jan? And what do you think a suitable limit would be? I have no problem with that, or with checking for preemption only every so many iterations. In fact, apart from the fact that it's used incorrectly, I don't think that checking for preemption after _each_ step of the loop make much sense... Whether checking at the beginning (but not for the first iteration) or at the end (but not for the last one) doesn't really matter. --- a/xen/common/schedule.c +++ b/xen/common/schedule.c @@ -1104,6 +1104,30 @@ long sched_adjust(struct domain *d, struct xen_domctl_scheduler_op *op) return ret; } +/* Adjust scheduling parameter for the vcpus of a given domain. */ +long sched_adjust_vcpu( +struct domain *d, +struct xen_domctl_scheduler_vcpu_op *op) +{ +long ret; I see no reason for this and the function return type to be long. The reason is stated in the begining. In the beginning of what? I don't see any such, and it would also be odd for there to be an explanation of why a particular type was chosen. As Chong he's saying elsewhere, he's moslty following suit. Should we consider a pre-patch making both sched_adjust() and sched_adjust_global() retunr int? Perhaps. But the main point here is that people copying existing code should sanity check what they copy (so to not spread oddities or even bugs). Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] ACPI shutdown unreliable with win7?
On 29.05.15 at 15:14, andrew.coop...@citrix.com wrote: On 29/05/15 14:04, Jan Beulich wrote: On 29.05.15 at 14:54, ian.campb...@citrix.com wrote: On Fri, 2015-05-22 at 10:08 +0100, Ian Campbell wrote: If win7 doesn't shutdown given a power button request I'd be more inclined to remove the setting in osstest for those flights and let guest-stop go back to being never pass than to start making such changes to the VM config which I think would probably break the preceding suspend and migration tests (which aren't completely reliable, but are far more so than this shutdown one). Does anyone have any ideas here or shall I propose: Unless we have a way to make an adjustment inside the guest for the power button to gain shutdown meaning, I think there's no alternative to the change below. You can avoid advertising S3/S4 in the ACPI tables, which iirc causes the same alteration to happen. Hvmloader uses the platform/acpi_s{3,4} booleans to control whether the relevant SSDTs are exposed. Which libxl even has settings for. That would perhaps be a better first try than disabling ACPI shutdown. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v2] efi: Reallocate memory map if ExitBootServices() fails
If calling ExitBootServices() fails, the memory map size may increase. Allocate an extra page the first time around so that we hopefully shouldn't have to reallocate the memory map. Allocate a new memory map if needed, although this only a fallback, because some arches may use boot services to allocate memory which may not work after the first call to ExitBootServices(). This was seen on the following machine when using the iscsidxe UEFI driver. The machine would consistently fail the first call to ExitBootServices(). System Information Manufacturer: Supermicro Product Name: X10SLE-F/HF BIOS Information Vendor: American Megatrends Inc. Version: 2.00 Release Date: 04/24/2014 Signed-off-by: Ross Lagerwall ross.lagerw...@citrix.com --- xen/common/efi/boot.c | 23 --- 1 file changed, 16 insertions(+), 7 deletions(-) diff --git a/xen/common/efi/boot.c b/xen/common/efi/boot.c index ef8476c..7d09b39 100644 --- a/xen/common/efi/boot.c +++ b/xen/common/efi/boot.c @@ -700,7 +700,7 @@ efi_start(EFI_HANDLE ImageHandle, EFI_SYSTEM_TABLE *SystemTable) EFI_STATUS status; unsigned int i, argc; CHAR16 **argv, *file_name, *cfg_file_name = NULL, *options = NULL; -UINTN map_key, info_size, gop_mode = ~0; +UINTN memmap_size, map_key, info_size, gop_mode = ~0; EFI_HANDLE *handles = NULL; EFI_SHIM_LOCK_PROTOCOL *shim_lock; EFI_GRAPHICS_OUTPUT_PROTOCOL *gop = NULL; @@ -1053,14 +1053,23 @@ efi_start(EFI_HANDLE ImageHandle, EFI_SYSTEM_TABLE *SystemTable) efi_arch_video_init(gop, info_size, mode_info); } -efi_bs-GetMemoryMap(efi_memmap_size, NULL, map_key, - efi_mdesc_size, mdesc_ver); -efi_memmap = efi_arch_allocate_mmap_buffer(efi_memmap_size); -if ( !efi_memmap ) -blexit(LUnable to allocate memory for EFI memory map); - for ( retry = 0; ; retry = 1 ) { +efi_bs-GetMemoryMap(memmap_size, NULL, map_key, + efi_mdesc_size, mdesc_ver); +if ( memmap_size efi_memmap_size ) +{ +efi_memmap_size = memmap_size + EFI_PAGE_SIZE; +efi_memmap = efi_arch_allocate_mmap_buffer(efi_memmap_size); +if ( !efi_memmap ) +{ +if ( retry ) +PrintErr(LUnable to allocate memory for EFI memory map); +else +blexit(LUnable to allocate memory for EFI memory map); +} +} + status = efi_bs-GetMemoryMap(efi_memmap_size, efi_memmap, map_key, efi_mdesc_size, mdesc_ver); if ( EFI_ERROR(status) ) -- 2.1.0 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [osstest test] 57401: tolerable FAIL - PUSHED
On Thu, 2015-05-28 at 11:26 +0100, Ian Campbell wrote: On Thu, 2015-05-28 at 04:49 +, osstest service user wrote: flight 57401 osstest real [real] http://logs.test-lab.xenproject.org/osstest/logs/57401/ Failures :-/ but no regressions. Lontao, Robert, The grub related bits of the nested series are now in the production osstest instance. Yes, I've seen it. Thanks Ian. I'm sorry but I've had a couple of high priority interrupts wrt reviewing the remainder. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 2/4] libxc: print more error messages when failed
On 29/05/15 12:37, Wei Liu wrote: No functional changes introduced. Signed-off-by: Wei Liu wei.l...@citrix.com Cc: Ian Campbell ian.campb...@citrix.com Cc: Ian Jackson ian.jack...@eu.citrix.com This patch reminds me of a todo item I have wanted to do for a while using setvcpucontext rather than mapping gfn 0 and manually inserting `jmp $0x10` However, as for the change itself, a definite improvement. --- tools/libxc/xc_hvm_build_x86.c | 30 +++--- 1 file changed, 27 insertions(+), 3 deletions(-) diff --git a/tools/libxc/xc_hvm_build_x86.c b/tools/libxc/xc_hvm_build_x86.c index 92422bf..df4b7ed 100644 --- a/tools/libxc/xc_hvm_build_x86.c +++ b/tools/libxc/xc_hvm_build_x86.c @@ -259,7 +259,10 @@ static int setup_guest(xc_interface *xch, memset(elf, 0, sizeof(elf)); if ( elf_init(elf, image, image_size) != 0 ) +{ +PERROR(Could not initialise ELF image); goto error_out; +} xc_elf_set_logfile(xch, elf, 1); @@ -522,15 +525,24 @@ static int setup_guest(xc_interface *xch, DPRINTF( 1GB PAGES: 0x%016lx\n, stat_1gb_pages); if ( loadelfimage(xch, elf, dom, page_array) != 0 ) +{ +PERROR(Could not load ELF image); goto error_out; +} if ( loadmodules(xch, args, m_start, m_end, dom, page_array) != 0 ) -goto error_out; +{ +PERROR(Could not load ACPI modules); +goto error_out; +} if ( (hvm_info_page = xc_map_foreign_range( xch, dom, PAGE_SIZE, PROT_READ | PROT_WRITE, HVM_INFO_PFN)) == NULL ) +{ +PERROR(Could not map hvm info page); goto error_out; +} build_hvm_info(hvm_info_page, args); munmap(hvm_info_page, PAGE_SIZE); @@ -547,7 +559,10 @@ static int setup_guest(xc_interface *xch, } if ( xc_clear_domain_pages(xch, dom, special_pfn(0), NR_SPECIAL_PAGES) ) -goto error_out; +{ +PERROR(Could not clear special pages); +goto error_out; +} xc_hvm_param_set(xch, dom, HVM_PARAM_STORE_PFN, special_pfn(SPECIALPAGE_XENSTORE)); @@ -580,7 +595,10 @@ static int setup_guest(xc_interface *xch, } if ( xc_clear_domain_pages(xch, dom, ioreq_server_pfn(0), NR_IOREQ_SERVER_PAGES) ) -goto error_out; +{ +PERROR(Could not clear ioreq page); +goto error_out; +} /* Tell the domain where the pages are and how many there are */ xc_hvm_param_set(xch, dom, HVM_PARAM_IOREQ_SERVER_PFN, @@ -595,7 +613,10 @@ static int setup_guest(xc_interface *xch, if ( (ident_pt = xc_map_foreign_range( xch, dom, PAGE_SIZE, PROT_READ | PROT_WRITE, special_pfn(SPECIALPAGE_IDENT_PT))) == NULL ) +{ +PERROR(Could not map special page ident_pt); goto error_out; +} for ( i = 0; i PAGE_SIZE / sizeof(*ident_pt); i++ ) ident_pt[i] = ((i 22) | _PAGE_PRESENT | _PAGE_RW | _PAGE_USER | _PAGE_ACCESSED | _PAGE_DIRTY | _PAGE_PSE); @@ -610,7 +631,10 @@ static int setup_guest(xc_interface *xch, char *page0 = xc_map_foreign_range( xch, dom, PAGE_SIZE, PROT_READ | PROT_WRITE, 0); if ( page0 == NULL ) +{ +PERROR(Could not map page0); goto error_out; +} page0[0] = 0xe9; *(uint32_t *)page0[1] = entry_eip - 5; munmap(page0, PAGE_SIZE); ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] ACPI shutdown unreliable with win7?
On 29/05/15 14:04, Jan Beulich wrote: On 29.05.15 at 14:54, ian.campb...@citrix.com wrote: On Fri, 2015-05-22 at 10:08 +0100, Ian Campbell wrote: If win7 doesn't shutdown given a power button request I'd be more inclined to remove the setting in osstest for those flights and let guest-stop go back to being never pass than to start making such changes to the VM config which I think would probably break the preceding suspend and migration tests (which aren't completely reliable, but are far more so than this shutdown one). Does anyone have any ideas here or shall I propose: Unless we have a way to make an adjustment inside the guest for the power button to gain shutdown meaning, I think there's no alternative to the change below. You can avoid advertising S3/S4 in the ACPI tables, which iirc causes the same alteration to happen. Hvmloader uses the platform/acpi_s{3,4} booleans to control whether the relevant SSDTs are exposed. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [Qemu-devel] [PATCH 1/1] Fix device listener interface for PCI to PCI bridges
On 05/28/15 17:05, Michael S. Tsirkin wrote: On Thu, May 28, 2015 at 11:03:07PM +0200, Michael S. Tsirkin wrote: On Thu, May 28, 2015 at 03:09:48PM -0400, Don Slutz wrote: On 05/28/15 08:28, Michael S. Tsirkin wrote: On Thu, May 28, 2015 at 07:25:50AM -0400, Don Slutz wrote: On 05/28/15 05:30, Michael S. Tsirkin wrote: On Thu, May 28, 2015 at 04:46:37AM -0400, Don Slutz wrote: The commit 707ff80021ccd7a68f4b3d2c44eebf87efbb41c4 assumed that a PCI device has a static address. This is not true for PCI devices that are on the secondary bus of a PCI to PCI bridge. ... Not really because it's not just secondary bus number. Changing subordinate bus numbers can hide/unhide whole buses. You are right. I have no idea what Paul Durrant was thinking about this case. And this would apply to PCI_SUBORDINATE_BUS not PCI_SECONDARY_BUS. Since at QEMU 2.2 Xen sends all pci-config requests to QEMU, things worked. Why worked? The issue is that PCI to PCI bridges were not configured by hvmloader nor by SeaBIOS. I do have a patch to Xen that fixes this (I was rebasing on the latest Xen to post it and my test case stopped working). So PCI devices that exist on the secondary side of a PCI to PCI bridge could only be used (by Linux with at least pci=assign-busses) after booting. It is not clear to me that the complexity of tracking bus visibility make sense. Clearly you do. Well what was the point of the change? To get config cycles that are valid that you do not get today. If you don't care that we get irrelevant config cycles why not just send them all to QEMU? That would need to be answered by Paul Durrant and/or other people (see below) Would it be better to have: void device_listener_change_pci_bus_num(PCIBus *b, PCIDevice *d, void *opaque) { uint8_t oldbus = (uint8_t)opaque; DEVICE_LISTENER_CALL(change_pci_bus_num, Forward, d, oldbus); } So that the above works, or to add a function to convert args? To know whether device is accessible to config cycles, you really need to scan the hierarchy from root downwards. Yes, that is correct. However what I am doing here is not changing how QEMU checks if the device is accessible, but changing what pci config Xen sends to QEMU. If the change to PCI_SECONDARY_BUS hides this PCI to PCI bridge, that is not an issue. -Don Slutz Imagine a bridge with secondary bus number 5 behind another one with subordinate numbers 1-3. You should not send conf cycles for bus number 5 to qemu. That is correct. How ever unless Paul Durrant has an example of more then 1 QEMU where this would make a difference, the cases I am aware of are: 1) Xen does not send it, and returns 0x (or smaller). 2) QEMU returns 0x (or smaller). I will grant that #1 is faster, but it also is only happening during start up and so I do not see the clear win to add more complex code to only do #1. -Don Slutz It's not about faster. I assumed you need to get just the correct stuff out to QEMU to gain the better security as 3996e85c1822e05c50250f8d2d1e57b6bea1229d claims. And if that's not the case, please educate me. I do not know enough about this to answer here. I have added xen-devel list to this in the hope that someone there can answer the better security question. -Don Slutz -- MST ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [Draft C] Xen on ARM vITS Handling
Hi Ian, NIT: You used my Linaro email which I think is de-activated now :). On 27/05/2015 13:48, Ian Campbell wrote: Here follows draft C based on previous feedback. Also at: http://xenbits.xen.org/people/ianc/vits/draftC.{pdf,html} I think I've captured most of the previous discussion, except where explicitly noted by XXX or in other replies, but please do point out places where I've missed something. One area where I am pretty sure I've dropped the ball is on the completion and update of `CREADR`. That conversation ended up bifurcating along the 1:N vs N:N mapping scheme lines, and I didn't manage to get the various proposals straight. Since we've now agreed on N:N hopefully we can reach a conclusion (no pun intended) on the completion aspect too (sorry that this probably means rehasing at least a subset of the previous thread). Ian. % Xen on ARM vITS Handling % Ian Campbell ian.campb...@citrix.com % Draft C # Changelog ## Since Draft B * Details of command translation (thanks to Julien and Vijay) * Added background on LPI Translation and Pending tablesd * Added background on Collections * Settled on `N:N` scheme for vITS:pITS mapping. * Rejigged section nesting a bit. * Since we now thing translation should be cheap, settle on translation at scheduling time. * Lazy `INVALL` and `SYNC` ## Since Draft A * Added discussion of when/where command translation occurs. * Contention on scheduler lock, suggestion to use SOFTIRQ. * Handling of domain shutdown. * More detailed discussion of multiple vs single vits pros/cons. # Introduction ARM systems containing a GIC version 3 or later may contain one or more ITS logical blocks. An ITS is used to route Message Signalled interrupts from devices into an LPI injection on the processor. The following summarises the ITS hardware design and serves as a set of assumptions for the vITS software design. (XXX it is entirely possible I've horribly misunderstood how this stuff fits together). For full details of the ITS see the GIC Architecture Specification. ## Device Identifiers Each device using the ITS is associated with a unique identifier. The device IDs are typically described via system firmware, e.g. the ACPI IORT table or via device tree. The number of device ids is variable and can be discovered via `GITS_TYPER.Devbits`. This field allows an ITS to have up to 2^32 device. ## Interrupt Collections Each interrupt is a member of an Interrupt Collection. This allows software to manage large numbers of physical interrupts with a small number of commands rather than issuing one command per interrupt. On a system with N processors, the ITS must provide at least N+1 collections. ## Target Addresses The Target Address correspond to a specific GIC re-distributor. The format of this field depends on the value of the `GITS_TYPER.PTA` bit: * 1: the base address of the re-distributor target is used * 0: a unique processor number is used. The mapping between the processor affinity value (`MPIDR`) and the processor number is discoverable via `GICR_TYPER.ProcessorNumber`. ## ITS Translation Table Message signalled interrupts are translated into an LPI via an ITS translation table which must be configured for each device which can generate an MSI. I'm not sure what is the ITS Table Table. Did you mean Interrupt Translation Table? The ITS translation table maps the device id of the originating devic s/devic/device/? into an Interrupt Collection and then into a target address. ## ITS Configuration The ITS is configured and managed, including establishing and configuring Translation Table for each device, via an in memory ring shared between the CPU and the ITS controller. The ring is managed via the `GITS_CBASER` register and indexed by `GITS_CWRITER` and `GITS_CREADR` registers. A processor adds commands to the shared ring and then updates `GITS_CWRITER` to make them visible to the ITS controller. The ITS controller processes commands from the ring and then updates `GITS_CREADR` to indicate the the processor that the command has been processed. Commands are processed sequentially. Commands sent on the ring include operational commands: * Routing interrupts to processors; * Generating interrupts; * Clearing the pending state of interrupts; * Synchronising the command queue and maintenance commands: * Map device/collection/processor; * Map virtual interrupt; * Clean interrupts; * Discard interrupts; The field `GITS_CBASER.Size` encodes the number of 4KB pages minus 0 consisting of the command queue. This field is 8 bits which means the maximum size is 2^8 * 4KB = 1MB. Given that each command is 32 bytes, there is a maximum of 32768 commands in the queue. The ITS provides no specific completion notification mechanism. Completion is monitored by a combination of a `SYNC` command and either polling `GITS_CREADR` or notification via an
[Xen-devel] [PATCH V8] xen/vm_event: Clean up control-register-write vm_events and add XCR0 event
As suggested by Andrew Cooper, this patch attempts to remove some redundancy and allow for an easier time when adding vm_events for new control registers in the future, by having a single VM_EVENT_REASON_WRITE_CTRLREG vm_event type, meant to serve CR0, CR3, CR4 and (newly introduced) XCR0. The actual control register will be deduced by the new .index field in vm_event_write_ctrlreg (renamed from vm_event_mov_to_cr). The patch has also modified the xen-access.c test - it is now able to log CR3 events. Signed-off-by: Razvan Cojocaru rcojoc...@bitdefender.com Acked-by: Jan Beulich jbeul...@suse.com --- Changes since V7: - Explicitly #include asm/hvm/event.h. --- tools/libxc/include/xenctrl.h |9 ++--- tools/libxc/xc_monitor.c| 40 +++--- tools/tests/xen-access/xen-access.c | 30 +++-- xen/arch/x86/hvm/event.c| 55 +- xen/arch/x86/hvm/hvm.c | 11 +++--- xen/arch/x86/hvm/vmx/vmx.c |6 ++-- xen/arch/x86/monitor.c | 63 +-- xen/include/asm-x86/domain.h| 12 ++- xen/include/asm-x86/hvm/event.h |5 ++- xen/include/asm-x86/monitor.h |2 ++ xen/include/public/domctl.h | 12 +++ xen/include/public/vm_event.h | 29 12 files changed, 114 insertions(+), 160 deletions(-) diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h index 09a7450..83fd289 100644 --- a/tools/libxc/include/xenctrl.h +++ b/tools/libxc/include/xenctrl.h @@ -2375,12 +2375,9 @@ int xc_mem_access_disable_emulate(xc_interface *xch, domid_t domain_id); void *xc_monitor_enable(xc_interface *xch, domid_t domain_id, uint32_t *port); int xc_monitor_disable(xc_interface *xch, domid_t domain_id); int xc_monitor_resume(xc_interface *xch, domid_t domain_id); -int xc_monitor_mov_to_cr0(xc_interface *xch, domid_t domain_id, bool enable, - bool sync, bool onchangeonly); -int xc_monitor_mov_to_cr3(xc_interface *xch, domid_t domain_id, bool enable, - bool sync, bool onchangeonly); -int xc_monitor_mov_to_cr4(xc_interface *xch, domid_t domain_id, bool enable, - bool sync, bool onchangeonly); +int xc_monitor_write_ctrlreg(xc_interface *xch, domid_t domain_id, + uint16_t index, bool enable, bool sync, + bool onchangeonly); int xc_monitor_mov_to_msr(xc_interface *xch, domid_t domain_id, bool enable, bool extended_capture); int xc_monitor_singlestep(xc_interface *xch, domid_t domain_id, bool enable); diff --git a/tools/libxc/xc_monitor.c b/tools/libxc/xc_monitor.c index 87ad968..63013de 100644 --- a/tools/libxc/xc_monitor.c +++ b/tools/libxc/xc_monitor.c @@ -45,8 +45,9 @@ int xc_monitor_resume(xc_interface *xch, domid_t domain_id) NULL); } -int xc_monitor_mov_to_cr0(xc_interface *xch, domid_t domain_id, bool enable, - bool sync, bool onchangeonly) +int xc_monitor_write_ctrlreg(xc_interface *xch, domid_t domain_id, + uint16_t index, bool enable, bool sync, + bool onchangeonly) { DECLARE_DOMCTL; @@ -54,39 +55,8 @@ int xc_monitor_mov_to_cr0(xc_interface *xch, domid_t domain_id, bool enable, domctl.domain = domain_id; domctl.u.monitor_op.op = enable ? XEN_DOMCTL_MONITOR_OP_ENABLE : XEN_DOMCTL_MONITOR_OP_DISABLE; -domctl.u.monitor_op.event = XEN_DOMCTL_MONITOR_EVENT_MOV_TO_CR0; -domctl.u.monitor_op.u.mov_to_cr.sync = sync; -domctl.u.monitor_op.u.mov_to_cr.onchangeonly = onchangeonly; - -return do_domctl(xch, domctl); -} - -int xc_monitor_mov_to_cr3(xc_interface *xch, domid_t domain_id, bool enable, - bool sync, bool onchangeonly) -{ -DECLARE_DOMCTL; - -domctl.cmd = XEN_DOMCTL_monitor_op; -domctl.domain = domain_id; -domctl.u.monitor_op.op = enable ? XEN_DOMCTL_MONITOR_OP_ENABLE -: XEN_DOMCTL_MONITOR_OP_DISABLE; -domctl.u.monitor_op.event = XEN_DOMCTL_MONITOR_EVENT_MOV_TO_CR3; -domctl.u.monitor_op.u.mov_to_cr.sync = sync; -domctl.u.monitor_op.u.mov_to_cr.onchangeonly = onchangeonly; - -return do_domctl(xch, domctl); -} - -int xc_monitor_mov_to_cr4(xc_interface *xch, domid_t domain_id, bool enable, - bool sync, bool onchangeonly) -{ -DECLARE_DOMCTL; - -domctl.cmd = XEN_DOMCTL_monitor_op; -domctl.domain = domain_id; -domctl.u.monitor_op.op = enable ? XEN_DOMCTL_MONITOR_OP_ENABLE -: XEN_DOMCTL_MONITOR_OP_DISABLE; -domctl.u.monitor_op.event = XEN_DOMCTL_MONITOR_EVENT_MOV_TO_CR4; +domctl.u.monitor_op.event = XEN_DOMCTL_MONITOR_EVENT_WRITE_CTRLREG; +domctl.u.monitor_op.u.mov_to_cr.index = index;
Re: [Xen-devel] [PATCH v2] efi: Reallocate memory map if ExitBootServices() fails
On 29.05.15 at 15:30, ross.lagerw...@citrix.com wrote: @@ -1053,14 +1053,23 @@ efi_start(EFI_HANDLE ImageHandle, EFI_SYSTEM_TABLE *SystemTable) efi_arch_video_init(gop, info_size, mode_info); } -efi_bs-GetMemoryMap(efi_memmap_size, NULL, map_key, - efi_mdesc_size, mdesc_ver); -efi_memmap = efi_arch_allocate_mmap_buffer(efi_memmap_size); -if ( !efi_memmap ) -blexit(LUnable to allocate memory for EFI memory map); - for ( retry = 0; ; retry = 1 ) { +efi_bs-GetMemoryMap(memmap_size, NULL, map_key, + efi_mdesc_size, mdesc_ver); +if ( memmap_size efi_memmap_size ) +{ +efi_memmap_size = memmap_size + EFI_PAGE_SIZE; I'd really like to make sure the value is a multiple of efi_mdesc_size (i.e. simply adding EFI_PAGE_SIZE will not do). And that's leaving aside whether the increase really needs to be that big. +efi_memmap = efi_arch_allocate_mmap_buffer(efi_memmap_size); So considering the ARM side (which this breaks) - did you try at all the alternative of allocating a slightly larger memory map ahead of the loop? +if ( !efi_memmap ) +{ +if ( retry ) +PrintErr(LUnable to allocate memory for EFI memory map); +else +blexit(LUnable to allocate memory for EFI memory map); +} Looking at this again (and remembering that PrintErrMesg(), as used a few lines down, also calls blexit()), I think we should handle this inside blexit(): If ExitBootServices() was already used enter an infinite loop right after printing the message(s). I wonder whether, to signal this, we shouldn't set efi_bs to NULL right after calling ExitBootServices(), replacing the single remaining valid use with SystemTable-BootServices-GetMemoryMap(). In no case should you allow execution to fall through here. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 for Xen 4.6 1/4] xen: enabling XL to set per-VCPU parameters of a domain for RTDS scheduler
On Mon, 2015-05-25 at 19:05 -0500, Chong Li wrote: diff --git a/xen/common/domctl.c b/xen/common/domctl.c index 28aea55..8143c44 100644 --- a/xen/common/domctl.c +++ b/xen/common/domctl.c @@ -841,6 +841,11 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl) copyback = 1; break; +case XEN_DOMCTL_scheduler_vcpu_op: +ret = sched_adjust_vcpu(d, op-u.scheduler_vcpu_op); +copyback = 1; +break; + So, thinking more about this, do we really want a full new DOMCTL, or do we want to instrument XEN_DOMCTL_scheduler_op? That would mean adding two operations there, and fiddling with struct xen_domctl_scheduler_op? The problem I see is that, whatever I imagine putting this, there would be some redundancy. Quick--dirty, I put together the below... would it possibly make sense? typedef struct xen_domctl_sched_credit { uint16_t weight; uint16_t cap; } xen_domctl_sched_credit_t; typedef struct xen_domctl_sched_credit2 { uint16_t weight; } xen_domctl_sched_credit2_t; typedef struct xen_domctl_sched_rtds { uint32_t period; uint32_t budget; } xen_domctl_sched_rtds_t; typedef union xen_domctl_schedparam { xen_domctl_sched_credit_t credit; xen_domctl_sched_credit2_t credit2; xen_domctl_sched_rtds_t rtds; } xen_domctl_schedparam_t; typedef struct xen_domctl_schedparam_vcpu { union { xen_domctl_sched_credit_t credit; xen_domctl_sched_credit2_t credit2; xen_domctl_sched_rtds_t rtds; } s; uint16_t vcpuid; } xen_domctl_schedparam_vcpu_t; DEFINE_XEN_GUEST_HANDLE(xen_domctl_schedparam_vcpu_t); /* Set or get info? */ #define XEN_DOMCTL_SCHEDOP_putinfo 0 #define XEN_DOMCTL_SCHEDOP_getinfo 1 #define XEN_DOMCTL_SCHEDOP_putvcpuinfo 2 #define XEN_DOMCTL_SCHEDOP_getvcpuinfo 3 struct xen_domctl_scheduler_op { uint32_t sched_id; /* XEN_SCHEDULER_* */ uint32_t cmd; /* XEN_DOMCTL_SCHEDOP_* */ union { xen_domctl_schedparam_t d; struct { XEN_GUEST_HANDLE_64(xen_domctl_schedparam_vcpu_t) vcpus; uint16_t nr_vcpus; } v; } u; }; typedef struct xen_domctl_scheduler_op xen_domctl_scheduler_op_t; DEFINE_XEN_GUEST_HANDLE(xen_domctl_scheduler_op_t); I'm also attaching a (build-tested only) patch to that effect (I'm killing SEDF in there, so I don't have to care about it, as it's going away anyway). Thoughts? /* - * set/get each vcpu info of each domain + * set/get per-domain params of a domain */ static int rt_dom_cntl( @@ -1115,6 +1115,74 @@ rt_dom_cntl( return rc; } +/* + * set/get per-vcpu params of a domain + */ +static int +rt_vcpu_cntl( +const struct scheduler *ops, +struct domain *d, +struct xen_domctl_scheduler_vcpu_op *op) +{ I commented this function already, while replying to Jan. +/* Adjust scheduling parameter for the vcpus of a given domain. */ +long sched_adjust_vcpu( +struct domain *d, +struct xen_domctl_scheduler_vcpu_op *op) +{ Actually, I don't think you even need this function. Especially if we take the path of doing all this as subops of DOMCTL_scheduler_op, you can just use sched_adjust, do some more multiplexing in there, and the call the proper scheduler hook. Thanks and Regards, Dario -- This happens because I choose it to happen! (Raistlin Majere) - Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems RD Ltd., Cambridge (UK) diff --git a/xen/common/Makefile b/xen/common/Makefile index 1cddebc..3fdf931 100644 --- a/xen/common/Makefile +++ b/xen/common/Makefile @@ -31,7 +31,6 @@ obj-y += rbtree.o obj-y += rcupdate.o obj-y += sched_credit.o obj-y += sched_credit2.o -obj-y += sched_sedf.o obj-y += sched_arinc653.o obj-y += sched_rt.o obj-y += schedule.o diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c index 953ecb0..43b086b 100644 --- a/xen/common/sched_credit.c +++ b/xen/common/sched_credit.c @@ -1039,25 +1039,25 @@ csched_dom_cntl( if ( op-cmd == XEN_DOMCTL_SCHEDOP_getinfo ) { -op-u.credit.weight = sdom-weight; -op-u.credit.cap = sdom-cap; +op-u.d.credit.weight = sdom-weight; +op-u.d.credit.cap = sdom-cap; } else { ASSERT(op-cmd == XEN_DOMCTL_SCHEDOP_putinfo); -if ( op-u.credit.weight != 0 ) +if ( op-u.d.credit.weight != 0 ) { if ( !list_empty(sdom-active_sdom_elem) ) { prv-weight -= sdom-weight * sdom-active_vcpu_count; -prv-weight += op-u.credit.weight * sdom-active_vcpu_count; +prv-weight += op-u.d.credit.weight * sdom-active_vcpu_count; } -sdom-weight =
Re: [Xen-devel] [PATCH V7] xen/vm_event: Clean up control-register-write vm_events and add XCR0 event
On 05/29/2015 12:12 PM, Jan Beulich wrote: On 28.05.15 at 18:32, rcojoc...@bitdefender.com wrote: the macro will probably go out the window (or the first parameter will need to be changed to VM_EVENT_##what instead of VM_EVENT_X86_##what) as soon as ARM control register write events will come into play. It's in an x86-specific header, so why should it need to be changed for ARM? If ARM will gain a similarly named function, the use sites will still all be architecture specific, and hence both declaration and whether or not to have a wrapper macro can remain a per-arch decision. You're right. Submitted V8 with the proper header included. Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] ACPI shutdown unreliable with win7?
On Fri, 2015-05-29 at 14:28 +0100, Jan Beulich wrote: On 29.05.15 at 15:14, andrew.coop...@citrix.com wrote: On 29/05/15 14:04, Jan Beulich wrote: On 29.05.15 at 14:54, ian.campb...@citrix.com wrote: On Fri, 2015-05-22 at 10:08 +0100, Ian Campbell wrote: If win7 doesn't shutdown given a power button request I'd be more inclined to remove the setting in osstest for those flights and let guest-stop go back to being never pass than to start making such changes to the VM config which I think would probably break the preceding suspend and migration tests (which aren't completely reliable, but are far more so than this shutdown one). Does anyone have any ideas here or shall I propose: Unless we have a way to make an adjustment inside the guest for the power button to gain shutdown meaning, I think there's no alternative to the change below. You can avoid advertising S3/S4 in the ACPI tables, which iirc causes the same alteration to happen. Hvmloader uses the platform/acpi_s{3,4} booleans to control whether the relevant SSDTs are exposed. Which libxl even has settings for. That would perhaps be a better first try than disabling ACPI shutdown. I've mentioned it at least twice now but nobody seems to think it is of note that even with the current settings it does the right thing about 1 time in 10? Is Win7 really expected to behave so randomly here? Also, when the test fails the guest is also not hibernating either. Plus I've queried the impact of change the ACPI s3/s4 settings on the save/restore/migrate tests in the flight more than once and nobody has responded to that either. Ian ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v3] run QEMU as non-root
Try to use xen-qemudepriv-$domname first, then xen-qemudepriv-base + domid, finally xen-qemudepriv-shared and root if everything else fails. The uids need to be manually created by the user or, more likely, by the xen package maintainer. To actually secure QEMU when running in Dom0, we need at least to deprivilege the privcmd and xenstore interfaces, this is just the first step in that direction. Signed-off-by: Stefano Stabellini stefano.stabell...@eu.citrix.com --- Changes in v3: - clarify doc - handle errno == ERANGE --- docs/misc/qemu-deprivilege | 36 + tools/libxl/libxl_dm.c | 60 ++ tools/libxl/libxl_internal.h |4 +++ 3 files changed, 100 insertions(+) create mode 100644 docs/misc/qemu-deprivilege diff --git a/docs/misc/qemu-deprivilege b/docs/misc/qemu-deprivilege new file mode 100644 index 000..3a61867 --- /dev/null +++ b/docs/misc/qemu-deprivilege @@ -0,0 +1,36 @@ +For security reasons, libxl tries to create QEMU as non-root. +Libxl looks for the following users in this order: + +1) a user named xen-qemudepriv-$domname +Where $domname is the name of the domain being created. +To use this, you just need to create a user with the appropriate name +for each domain. For example, if your virtual machine is named windows: + +adduser --system xen-qemudepriv-windows + + +2) a user named xen-qemudepriv-base, adding domid to its uid +This requires the reservation of 65536 uids from the uid of +xen-qemudepriv-base to uid+65535. For example, if xen-qemudepriv-base +has uid 6000, and the domid is 25, libxl will try to use uid 6025. To +use this mechanism, you might want to create a large number of users at +installation time. For example: + +adduser --system xen-qemudepriv-base +for i in '' $(seq 1 65335) +do +adduser --system xen-qemudepriv-base$i +done + + +3) a user named xen-qemudepriv-shared +As a fall back if both 1) and 2) fail, libxl will use a single user for +all QEMU instances. The user is named xen-qemudepriv-shared. This is +less secure but still better than running QEMU as root. Using this is as +simple as creating just one more user on your host: + +adduser --system xen-qemudepriv-shared + + +4) root +As a last resort, libxl will start QEMU as root. diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c index 0c6408d..97a921f 100644 --- a/tools/libxl/libxl_dm.c +++ b/tools/libxl/libxl_dm.c @@ -19,6 +19,8 @@ #include libxl_internal.h #include xen/hvm/e820.h +#include sys/types.h +#include pwd.h static const char *libxl_tapif_script(libxl__gc *gc) { @@ -439,6 +441,9 @@ static char ** libxl__build_device_model_args_new(libxl__gc *gc, int i, connection, devid; uint64_t ram_size; const char *path, *chardev; +struct passwd pwd, *user = NULL; +char *buf = NULL; +long buf_size; dm_args = flexarray_make(gc, 16, 1); @@ -878,6 +883,61 @@ static char ** libxl__build_device_model_args_new(libxl__gc *gc, default: break; } + +buf_size = sysconf(_SC_GETPW_R_SIZE_MAX); +if (buf_size 0) { +LOGE(ERROR, sysconf(_SC_GETPW_R_SIZE_MAX) returned error %ld, buf_size); +goto end_search; +} +errno = 0; + +retry: +if (errno == ERANGE) +buf_size += 512; +buf = libxl__realloc(gc, buf, buf_size); +if (c_info-name) { +errno = 0; +getpwnam_r(libxl__sprintf(gc, %s-%s, +LIBXL_QEMU_USER_PREFIX, c_info-name), +pwd, buf, buf_size, user); +if (user != NULL) +goto end_search; +if (errno == ERANGE) +goto retry; +} + +errno = 0; +getpwnam_r(LIBXL_QEMU_USER_BASE, pwd, buf, buf_size, user); +if (user != NULL) { +errno = 0; +getpwuid_r(user-pw_uid + guest_domid, pwd, buf, buf_size, user); +if (user != NULL) +goto end_search; +if (errno == ERANGE) +goto retry; +} +if (errno == ERANGE) +goto retry; + +errno = 0; +getpwnam_r(LIBXL_QEMU_USER_SHARED, pwd, buf, buf_size, user); +if (user != NULL) { +LOG(WARN, Could not find user %s-%s or user %s (+domid %d), falling back to %s, +LIBXL_QEMU_USER_PREFIX, c_info-name, LIBXL_QEMU_USER_BASE, +guest_domid, LIBXL_QEMU_USER_SHARED); +goto end_search; +} +if (errno == ERANGE) +goto retry; + + +LOG(WARN, Could not find user %s, starting QEMU as root, LIBXL_QEMU_USER_SHARED); + +end_search: +if (user) { +flexarray_append(dm_args, -runas); +flexarray_append(dm_args, user-pw_name); +} } flexarray_append(dm_args, NULL); return (char **) flexarray_contents(dm_args); diff --git
Re: [Xen-devel] [Draft C] Xen on ARM vITS Handling
Hi Vijay, On 27/05/15 17:44, Vijay Kilari wrote: ## Command Translation Of the existing GICv3 ITS commands, `MAPC`, `MAPD`, `MAPVI`/`MAPI` are potentially time consuming commands as these commands creates entry in the Xen ITS structures, which are used to validate other ITS commands. `INVALL` and `SYNC` are global and potentially disruptive to other guests and so need consideration. All other ITS command like `MOVI`, `DISCARD`, `INV`, `INT`, `CLEAR` just validate and generate physical command. ### `MAPC` command translation Format: `MAPC vCID, vTA` - The GITS_TYPER.PAtype is emulated as 0. Hence vTA is always represents vcpu number. Hence vTA is validated against physical Collection IDs by querying ITS driver and corresponding Physical Collection ID is retrieved. - Each vITS will have cid_map (struct cid_mapping) which holds mapping of Why do you speak about each vITS? The emulation is only related to one vITS and not shared... Virtual Collection ID(vCID), Virtual Target address(vTA) and Physical Collection ID (pCID). If vCID entry already exists in cid_map, then that particular mapping is updated with the new pCID and vTA else new entry is made in cid_map When you move a collection, you also have to make sure that all the interrupts associated to it will be delivered to the new target. I'm not sure what you are suggesting for that... - MAPC pCID, pTA physical ITS command is generated We should not send any MAPC command to the physical ITS. The collection is already mapped during Xen boot and the guest should not be able to move the physical collection (they are shared between all the guests and Xen). Here there is no overhead, the cid_map entries are preallocated with size of nr_cpus in the platform. As said the number of collection should be at least nr_cpus + 1. - `MAPC pCID, pTA` physical ITS command is generated ### `MAPD` Command translation Format: `MAPD device, Valid, ITT IPA, ITT Size` `MAPD` is sent with `Valid` bit set if device needs to be added and reset when device is removed. If `Valid` bit is set: - Allocate memory for `its_device` struct - Validate ITT IPA ITT size and update its_device struct - Find number of vectors(nrvecs) for this device by querying PCI helper function - Allocate nrvecs number of LPI XXX nrvecs is a function of `ITT Size`? - Allocate memory for `struct vlpi_map` for this device. This `vlpi_map` holds mapping of Virtual LPI to Physical LPI and ID. - Find physical ITS node with which this device is associated - Call `p2m_lookup` on ITT IPA addr and get physical ITT address - Validate ITT Size - Generate/format physical ITS command: `MAPD, ITT PA, ITT Size` Here the overhead is with memory allocation for `its_device` and `vlpi_map` XXX Suggestion was to preallocate some of those at device passthrough setup time? If Validation bit is set: - Query its_device tree and get its_device structure for this device. - (XXX: If pci device is hidden from dom0, does this device is added with PHYSDEVOP_pci_device_add hypercall?) - If device does not exists return - If device exists in RB-tree then - Validate ITT IPA ITT size and update its_device struct To validate the ITT size you need to know the number of interrupt ID. - Check if device is already assigned to the domain, if not then - Find number of vectors(nrvecs) for this device. - Allocate nrvecs number of LPI - Fetch vlpi_map for this device (preallocated at the time of adding this device to Xen). This vlpi_map holds mapping of Virtual LPI to Physical LPI and ID. - Call p2m_lookup on ITT IPA addr and get physical ITT address - Assign this device to this domain and mark as enabled - If this device already exists with the domain (Domain is remapping the device) - Validate ITT IPA ITT size and update its_device struct - Call p2m_lookup on ITT IPA addr and get physical ITT address - Disable all the LPIs of this device by searching through vlpi_map and LPI configuration table Disabling all the LPIs associated to a device can be time consuming because you have to unroute them and make sure that the physical ITS effectively disabled it before sending the MAPD command. Given that the software would be buggy if it send a MAPD command without releasing all the associated interrupt we could ignore the command if any interrupt is still enabled. - Generate/format physical ITS command: MAPD, ITT PA, ITT Size If Validation bit is not set: - Validate if the device exits by checking vITS device list - Clear all `vlpis` assigned for this device - Remove this device from vITS list - Free memory XXX If preallocation presumably
Re: [Xen-devel] [PATCH V4 7/8] xen/arm64: increase MAX_VIRT_CPUS to 128 on arm64
Hi Chen, On 28/05/15 11:15, Chen Baozi wrote: From: Chen Baozi baoz...@gmail.com GIC-500 supports up to 128 cores in a single SoC. Increase MAX_VIRT_CPUS to 128 on arm64. This should be the last patch. You still have to modify domain_max_vcpus before being able to increase the number of vCPUS handled. Signed-off-by: Chen Baozi baoz...@gmail.com with the patch at the end of the series: Acked-by: Julien Grall julien.gr...@citrix.com --- xen/arch/arm/vgic-v3.c | 1 - xen/include/asm-arm/config.h | 4 2 files changed, 4 insertions(+), 1 deletion(-) diff --git a/xen/arch/arm/vgic-v3.c b/xen/arch/arm/vgic-v3.c index 0da031c..be5fff1 100644 --- a/xen/arch/arm/vgic-v3.c +++ b/xen/arch/arm/vgic-v3.c @@ -889,7 +889,6 @@ static int vgic_v3_distr_mmio_write(struct vcpu *v, mmio_info_t *info) rank = vgic_rank_offset(v, 64, gicd_reg - GICD_IROUTER, DABT_DOUBLE_WORD); if ( rank == NULL ) goto write_ignore; -BUG_ON(v-domain-max_vcpus 8); new_irouter = *r; vgic_lock_rank(v, rank, flags); diff --git a/xen/include/asm-arm/config.h b/xen/include/asm-arm/config.h index 3b23e05..817c216 100644 --- a/xen/include/asm-arm/config.h +++ b/xen/include/asm-arm/config.h @@ -47,7 +47,11 @@ #define NR_CPUS 128 #endif +#ifdef CONFIG_ARM_64 +#define MAX_VIRT_CPUS 128 +#else #define MAX_VIRT_CPUS 8 +#endif #define asmlinkage /* Nothing needed */ Regards, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] ACPI shutdown unreliable with win7?
On Fri, 2015-05-29 at 15:31 +0100, Andrew Cooper wrote: On 29/05/15 15:00, Ian Campbell wrote: On Fri, 2015-05-29 at 14:28 +0100, Jan Beulich wrote: On 29.05.15 at 15:14, andrew.coop...@citrix.com wrote: On 29/05/15 14:04, Jan Beulich wrote: On 29.05.15 at 14:54, ian.campb...@citrix.com wrote: On Fri, 2015-05-22 at 10:08 +0100, Ian Campbell wrote: If win7 doesn't shutdown given a power button request I'd be more inclined to remove the setting in osstest for those flights and let guest-stop go back to being never pass than to start making such changes to the VM config which I think would probably break the preceding suspend and migration tests (which aren't completely reliable, but are far more so than this shutdown one). Does anyone have any ideas here or shall I propose: Unless we have a way to make an adjustment inside the guest for the power button to gain shutdown meaning, I think there's no alternative to the change below. You can avoid advertising S3/S4 in the ACPI tables, which iirc causes the same alteration to happen. Hvmloader uses the platform/acpi_s{3,4} booleans to control whether the relevant SSDTs are exposed. Which libxl even has settings for. That would perhaps be a better first try than disabling ACPI shutdown. I've mentioned it at least twice now but nobody seems to think it is of note that even with the current settings it does the right thing about 1 time in 10? Is Win7 really expected to behave so randomly here? Windows is far from bug free, and also a black box. I really cannot answer whether this is intended behaviour or not. Also, when the test fails the guest is also not hibernating either. Plus I've queried the impact of change the ACPI s3/s4 settings on the save/restore/migrate tests in the flight more than once and nobody has responded to that either. save/restore/migrate necessarily need PV drivers inside the guest, and installing PV drivers should set the sane defaults inside the guest. What does OSSTest do with PV drivers? Nothing AFAIK, unless they are baked into the ISOs we take from the XenRT folks (which I think is not the case). Yet our s/r/m tests of these guests do pass (mostly), I believe because they end up leveraging HVM s3 but I might be wrong. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v7 05/10] xsm: add XEN_DOMCTL_soft_reset support
On 27.05.15 at 17:25, vkuzn...@redhat.com wrote: +static XSM_INLINE int xsm_soft_reset(XSM_DEFAULT_ARG struct domain *d1, + struct domain *d2) Hard tabs used here ... +static inline int xsm_soft_reset (xsm_default_t def, struct domain *d1, + struct domain *d2) ... and here. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 for Xen 4.6 0/4] Enabling XL to set per-VCPU parameters of a domain for RTDS scheduler
2015-05-29 4:15 GMT-07:00 Dario Faggioli dario.faggi...@citrix.com: On Tue, 2015-05-26 at 09:59 +0100, Ian Campbell wrote: On Mon, 2015-05-25 at 18:59 -0500, Chong Li wrote: This series arrived in my mailbox as 5 distinct mails. Please use git send-email such that the mails arrive as a single email thread (i.e. each mail as a reply to the previous or to the 0th mail) or arrange for the same thing by hand (I highly recommend using git send-email though) Indeed. BTW, Chong, v1 was threaded ok, so maybe did something different this time when sending the patches. If yes, just don't! :-) I think that's because Chong wanted to send different patches in this patch set to different maintainers. (For example, we don't want to spam Jan's folder with xl, libxl patch. :-) ) Is it ok to use git send-email --reply-to to attach all four patches to the cover letter (that is this email thread) of the patch set? Thanks, Meng --- Meng Xu PhD Student in Computer and Information Science University of Pennsylvania http://www.cis.upenn.edu/~mengxu/ ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v3] run QEMU as non-root
On Fri, 2015-05-29 at 14:47 +0100, Stefano Stabellini wrote: Try to use xen-qemudepriv-$domname first, then xen-qemudepriv-base + domid, finally xen-qemudepriv-shared and root if everything else fails. The uids need to be manually created by the user or, more likely, by the xen package maintainer. To actually secure QEMU when running in Dom0, we need at least to deprivilege the privcmd and xenstore interfaces, this is just the first step in that direction. Signed-off-by: Stefano Stabellini stefano.stabell...@eu.citrix.com --- Changes in v3: - clarify doc - handle errno == ERANGE --- docs/misc/qemu-deprivilege | 36 + tools/libxl/libxl_dm.c | 60 ++ tools/libxl/libxl_internal.h |4 +++ 3 files changed, 100 insertions(+) create mode 100644 docs/misc/qemu-deprivilege diff --git a/docs/misc/qemu-deprivilege b/docs/misc/qemu-deprivilege new file mode 100644 index 000..3a61867 --- /dev/null +++ b/docs/misc/qemu-deprivilege Could you name this with a .txt or even a .markdown or .pandoc please. I think you should also add a reference to this to the toplevel INSTALL file, so there is some hope of someone seeing it. And I presume you will update the relevant wikipages (e.g. the installing from source one) once this change lands? +2) a user named xen-qemudepriv-base, adding domid to its uid +This requires the reservation of 65536 uids from the uid of +xen-qemudepriv-base to uid+65535. For example, if xen-qemudepriv-base +has uid 6000, and the domid is 25, libxl will try to use uid 6025. To +use this mechanism, you might want to create a large number of users at +installation time. For example: + +adduser --system xen-qemudepriv-base +for i in '' $(seq 1 65335) +do +adduser --system xen-qemudepriv-base$i +done I'm not sure that adduser is necessarily guaranteed to create users sequentially, even if it might do so most of the time. Did you check this? Perhaps rather than doing base + domid we should just lookup the user xen-qemudepriv-domidNNN and document creating a user for every domid? The advantage with that is we don't need to figure out how to document the creation of 65K _consecutive_ users. @@ -439,6 +441,9 @@ static char ** libxl__build_device_model_args_new(libxl__gc *gc, int i, connection, devid; uint64_t ram_size; const char *path, *chardev; +struct passwd pwd, *user = NULL; +char *buf = NULL; +long buf_size; dm_args = flexarray_make(gc, 16, 1); @@ -878,6 +883,61 @@ static char ** libxl__build_device_model_args_new(libxl__gc *gc, default: break; } + +buf_size = sysconf(_SC_GETPW_R_SIZE_MAX); +if (buf_size 0) { +LOGE(ERROR, sysconf(_SC_GETPW_R_SIZE_MAX) returned error %ld, buf_size); +goto end_search; +} +errno = 0; + +retry: +if (errno == ERANGE) +buf_size += 512; +buf = libxl__realloc(gc, buf, buf_size); +if (c_info-name) { +errno = 0; +getpwnam_r(libxl__sprintf(gc, %s-%s, +LIBXL_QEMU_USER_PREFIX, c_info-name), +pwd, buf, buf_size, user); +if (user != NULL) +goto end_search; +if (errno == ERANGE) +goto retry; +} + +errno = 0; +getpwnam_r(LIBXL_QEMU_USER_BASE, pwd, buf, buf_size, user); +if (user != NULL) { +errno = 0; +getpwuid_r(user-pw_uid + guest_domid, pwd, buf, buf_size, user); +if (user != NULL) +goto end_search; +if (errno == ERANGE) +goto retry; +} +if (errno == ERANGE) +goto retry; + +errno = 0; +getpwnam_r(LIBXL_QEMU_USER_SHARED, pwd, buf, buf_size, user); +if (user != NULL) { +LOG(WARN, Could not find user %s-%s or user %s (+domid %d), falling back to %s, +LIBXL_QEMU_USER_PREFIX, c_info-name, LIBXL_QEMU_USER_BASE, +guest_domid, LIBXL_QEMU_USER_SHARED); +goto end_search; +} +if (errno == ERANGE) +goto retry; This is all rather repetitive, please add a helper which takes care of allocating the buffer, retrying and logging on fail. I don't think you need to worry about making all three attempts use the same buffer, so the helper doesn't need to have the complexity of doing that. That would also let you avoid repeating all three searches when the last one returns ERANGE. + + +LOG(WARN, Could not find user %s, starting QEMU as root, LIBXL_QEMU_USER_SHARED); + +end_search: +if (user) { +flexarray_append(dm_args, -runas); +flexarray_append(dm_args, user-pw_name); +} }
Re: [Xen-devel] ACPI shutdown unreliable with win7?
-Original Message- From: Paul Durrant Sent: 29 May 2015 16:07 To: Ian Campbell Cc: Jan Beulich; Andrew Cooper; Ian Jackson; xen-devel Subject: RE: [Xen-devel] ACPI shutdown unreliable with win7? -Original Message- From: Ian Campbell [mailto:ian.campb...@citrix.com] Sent: 29 May 2015 15:36 To: Paul Durrant Cc: Jan Beulich; Andrew Cooper; Ian Jackson; xen-devel Subject: Re: [Xen-devel] ACPI shutdown unreliable with win7? On Fri, 2015-05-29 at 15:25 +0100, Paul Durrant wrote: -Original Message- From: Ian Campbell [mailto:ian.campb...@citrix.com] Sent: 29 May 2015 14:14 To: Jan Beulich Cc: Andrew Cooper; Paul Durrant; Ian Jackson; xen-devel Subject: Re: [Xen-devel] ACPI shutdown unreliable with win7? On Fri, 2015-05-29 at 14:04 +0100, Jan Beulich wrote: On 29.05.15 at 14:54, ian.campb...@citrix.com wrote: On Fri, 2015-05-22 at 10:08 +0100, Ian Campbell wrote: If win7 doesn't shutdown given a power button request I'd be more inclined to remove the setting in osstest for those flights and let guest-stop go back to being never pass than to start making such changes to the VM config which I think would probably break the preceding suspend and migration tests (which aren't completely reliable, but are far more so than this shutdown one). Does anyone have any ideas here or shall I propose: Unless we have a way to make an adjustment inside the guest for the power button to gain shutdown meaning, I think there's no alternative to the change below. The strange this is that it does work _sometimes_, either by complete coincidence or because there is something non-deterministic about how Win7 reacts to this ACPI event. How long is the test waiting for the OS to shut down though? If you get unlucky, Windows will wander off to Windows Update, download a bazillion patches and take more than an hour to shut down. If you’re lucky, it may shut down in 10 seconds or less. The screenshot in e.g. http://logs.test-lab.xenproject.org/osstest/logs/56929/test-amd64- amd64- xl-qemut-win7-amd64/info.html seems to show that the guest isn't even trying to shut down, it's just sat there at the desktop: http://logs.test-lab.xenproject.org/osstest/logs/56929/test-amd64- amd64- xl-qemut-win7-amd64/win.guest.osstest--vnc.jpeg I think if it had hit WU there would be activity on the screen? FWIW We appear to wait 200s, if we were seeing failures due to windows update then I'd be inclined to extend that, but I think right now that would be premature, unless WU happens with no status on the screen. No, you'd see something. Perhaps our ACPI lid/power switch code is just buggy then? I already said this to Ian, but for the record... From my reading of the ACPI spec (v 5.0, page 23 glossary entry) the SCI is an active low, shareable, level-sensitive interrupt, but our code (pmtimer.c:pmt_update_sci) seems to treat it as active high. I'll have a look at the QEMU SCI code as reference, but maybe it is just broken. Paul Paul Ian ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] ACPI shutdown unreliable with win7?
-Original Message- From: Ross Philipson [mailto:ross.philip...@gmail.com] Sent: 29 May 2015 16:35 To: Ian Campbell; Paul Durrant Cc: Andrew Cooper; xen-devel; Jan Beulich; Ian Jackson Subject: Re: [Xen-devel] ACPI shutdown unreliable with win7? On 05/29/2015 11:11 AM, Ian Campbell wrote: On Fri, 2015-05-29 at 16:06 +0100, Paul Durrant wrote: FWIW We appear to wait 200s, if we were seeing failures due to windows update then I'd be inclined to extend that, but I think right now that would be premature, unless WU happens with no status on the screen. No, you'd see something. Perhaps our ACPI lid/power switch code is just buggy then? It seems to work reliably for the WinXP tests, FWIW... Ian. One thing I find confusing is that the firmware code does not even have a power button device (PNP0C0C) or the fixed feature power button that is enabled in the FADT (flag == PWR_BUTTON bit 4). So I don't see how the shutdown is purely an ACPI function. Is there something else to the story? Is it relying on PV tools to do it? Xen (and QEMU seemingly) implement the 'Fixed Power Button' (section 4.8.2.2.1.1 on my spec) and this requires the PWR_BUTTON flag to be clear (according table 4-13). It also does not require a power button device to be implemented (which is presumably why this way of doing it was chosen). Paul Ross ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel -- Ross Philipson ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] ACPI shutdown unreliable with win7?
On 05/29/2015 12:00 PM, Paul Durrant wrote: -Original Message- From: Ross Philipson [mailto:ross.philip...@gmail.com] Sent: 29 May 2015 16:35 To: Ian Campbell; Paul Durrant Cc: Andrew Cooper; xen-devel; Jan Beulich; Ian Jackson Subject: Re: [Xen-devel] ACPI shutdown unreliable with win7? On 05/29/2015 11:11 AM, Ian Campbell wrote: On Fri, 2015-05-29 at 16:06 +0100, Paul Durrant wrote: FWIW We appear to wait 200s, if we were seeing failures due to windows update then I'd be inclined to extend that, but I think right now that would be premature, unless WU happens with no status on the screen. No, you'd see something. Perhaps our ACPI lid/power switch code is just buggy then? It seems to work reliably for the WinXP tests, FWIW... Ian. One thing I find confusing is that the firmware code does not even have a power button device (PNP0C0C) or the fixed feature power button that is enabled in the FADT (flag == PWR_BUTTON bit 4). So I don't see how the shutdown is purely an ACPI function. Is there something else to the story? Is it relying on PV tools to do it? Xen (and QEMU seemingly) implement the 'Fixed Power Button' (section 4.8.2.2.1.1 on my spec) and this requires the PWR_BUTTON flag to be clear (according table 4-13). It also does not require a power button device to be implemented (which is presumably why this way of doing it was chosen). Ah got it. I read the spec backwards - that the bit was set not clear for the fixed feature power button :) Paul Ross ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel -- Ross Philipson -- Ross Philipson ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH V4 7/8] xen/arm64: increase MAX_VIRT_CPUS to 128 on arm64
On 29/05/15 17:45, Julien Grall wrote: On 29/05/15 17:41, Andrew Cooper wrote: On 29/05/15 17:20, Julien Grall wrote: On 29/05/15 16:51, Julien Grall wrote: diff --git a/xen/include/asm-arm/config.h b/xen/include/asm-arm/config.h index 3b23e05..817c216 100644 --- a/xen/include/asm-arm/config.h +++ b/xen/include/asm-arm/config.h @@ -47,7 +47,11 @@ #define NR_CPUS 128 #endif +#ifdef CONFIG_ARM_64 +#define MAX_VIRT_CPUS 128 +#else #define MAX_VIRT_CPUS 8 +#endif Looking to the last patch, the usage of MAX_VIRT_CPUS is now minimal. Can't finish to replace MAX_VIRT_CPUS to another corresponding value and drop the define? You cant drop MAX_VIRT_CPUS (I tried this when introducing max_domain_vcpus()). It is used for some conditional compilation in common code. AFAICT only in the event channel code to avoid allocating memory when less than 64 vCPU is used. Correct. Anyway, if we can drop it. I would add a check in domain_max_vcpus for safety. It has a second use currently for auditing the dom0_max_vcpus parameter (for both x86 and ARM), but could easily be replaced with a max_domain_vcpus() call. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH V4 5/8] tools/libxl: Set 'reg' of cpu node equal to MPIDR affinity for domU
On 29/05/15 16:55, Ian Campbell wrote: On Fri, 2015-05-29 at 16:44 +0100, Julien Grall wrote: +name = GCSPRINTF(cpu@%lx, mpidr_aff); It's not necessary to change the cpu@. AIUI it is conventional in DT for this to match the first reg entry. Well, it's conventional when the reg field is containing an address. It's not the case of the cpu node as reg doesn't contain an address. There is a mix of both in the different DTS. Although, increment an ID make easier to read the dumped DTS for debugging. Regards, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [xen-unstable test] 56759: regressions - FAIL
On Wed, 2015-05-27 at 17:04 +0100, Ian Campbell wrote: Looking at the netback side though it seems like netback_remove is switching to state=Closed _before_ it calls kobject_uevent(..., KOBJ_OFFLINE) and it is this which generates the call to netback_uevent which tries and fails to read script and produces the error message. I've just sent out a patch which fixes this issue, although I am still at a loss to explain why we have only started seeing this now and only under such specific circumstances. I'm still slightly concerned that perhaps the new spinlock stuff has some sort of bad behaviour either on arndale specifically or more generally for ARM systems which has pushed this particular case over the edge. I did run some benchmarks (hackbench+fio on arndale domU and hackbench on midway dom0) with and without the ticket locks and the results were close enough that I'm basically not too worried that there is something wrong with the ticket locks on ARM. It still niggles somewhat not to have a good theory about why this change had this seemingly random effect, but I've not got any good ideas for avenues to explore and I've got other things to do so I think I'll leave it at that. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] ACPI shutdown unreliable with win7?
On 05/29/2015 11:11 AM, Ian Campbell wrote: On Fri, 2015-05-29 at 16:06 +0100, Paul Durrant wrote: FWIW We appear to wait 200s, if we were seeing failures due to windows update then I'd be inclined to extend that, but I think right now that would be premature, unless WU happens with no status on the screen. No, you'd see something. Perhaps our ACPI lid/power switch code is just buggy then? It seems to work reliably for the WinXP tests, FWIW... Ian. One thing I find confusing is that the firmware code does not even have a power button device (PNP0C0C) or the fixed feature power button that is enabled in the FADT (flag == PWR_BUTTON bit 4). So I don't see how the shutdown is purely an ACPI function. Is there something else to the story? Is it relying on PV tools to do it? Ross ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel -- Ross Philipson ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH] xen: netback: fix error printf format string.
drivers/net/xen-netback/netback.c: In function ‘xenvif_tx_build_gops’: drivers/net/xen-netback/netback.c:1253:8: warning: format ‘%lu’ expects argument of type ‘long unsigned int’, but argument 5 has type ‘int’ [-Wformat=] (txreq.offset~PAGE_MASK) + txreq.size); ^ txreq.offset and .size are uint16_t fields. Signed-off-by: Ian Campbell ian.campb...@citrix.com --- drivers/net/xen-netback/netback.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c index 4de46aa..a3b1cbb 100644 --- a/drivers/net/xen-netback/netback.c +++ b/drivers/net/xen-netback/netback.c @@ -1248,7 +1248,7 @@ static void xenvif_tx_build_gops(struct xenvif_queue *queue, /* No crossing a page as the payload mustn't fragment. */ if (unlikely((txreq.offset + txreq.size) PAGE_SIZE)) { netdev_err(queue-vif-dev, - txreq.offset: %x, size: %u, end: %lu\n, + txreq.offset: %x, size: %u, end: %u\n, txreq.offset, txreq.size, (txreq.offset~PAGE_MASK) + txreq.size); xenvif_fatal_tx_err(queue-vif); -- 1.7.10.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel