Re: [Xen-devel] [OSSTEST Nested PATCH v8 6/7] Compose the main recipe of nested test job
On Thu, 2015-04-23 at 12:05 +0100, Ian Jackson wrote: Pang, LongtaoX writes (RE: [Xen-devel] [OSSTEST Nested PATCH v8 6/7] Compose the main recipe of nested test job): Ian Campbell [mailto:ian.campb...@citrix.com]: The arguments passed to ts-* in sg-run-job should _always_ be just a bare indent. The ident=foo thing is a special case for standalone mode. Do you means that the parameter passed to ts-* in sg-run-job should _always_ be just a bare ident? Yes, that's what Ian means. If yes. I think it needs to pass l1's '$gn' which is not a bare ident to 'ts-debian-hvm-install' for installing L1 guest VM, since the default '$gn ||= 'debianhvm' if no other '$gn' was passed to this script, but for nested job we expect a different '$gn' from 'debianhvm', such as 'nestedl1'. So it need to pass 'host' and 'nestedl1' to 'ts-debian-hvm-install' script. +run-ts . = ts-debian-hvm-install + host + nestedl1 Indeed. For 'ts-nested-setup' script, it need to call 'ts_get_host_guest' which need two parameters 'host' and 'nestedl1' to get $l0 and $l1 scope, so it also need to pass 'nestedl1' to this script. Yes. For ' ts-guest-destroy', it also need two parameters of 'host' and 'nestedl1' which is L1's '$gn'. If you're intending to destroy the l1, then yes. If you want to destroy theh l2 then you need to pass `nestedl1' and whatever the L2 guest name/ident is. Here comes a common issue we need to settle down: To not break the principle that we shall only pass in ident in recipe, we shall either 1. for l1 and l2, identical their ident/name/hostname, make this a principle as well. This shall avoid much complex design but adding one additional rule to bear in mind. 2. setup $ident -- $name mapping somewhere at the beginning (either alloc host or make flight, you name it, but somewhere). I prefer Ian C's suggestion, option 1, that why should we differ the ident and name for l1 and l2. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v2 06/11] vmx: add help functions to support PML
On 04/17/2015 03:37 PM, Jan Beulich wrote: On 17.04.15 at 09:23, kai.hu...@linux.intel.com wrote: On 04/17/2015 02:58 PM, Jan Beulich wrote: On 17.04.15 at 08:51, kai.hu...@linux.intel.com wrote: On 04/17/2015 02:23 PM, Jan Beulich wrote: On 17.04.15 at 05:10, kai.hu...@linux.intel.com wrote: On 04/16/2015 11:42 PM, Jan Beulich wrote: On 15.04.15 at 09:03, kai.hu...@linux.intel.com wrote: +void vmx_vcpu_flush_pml_buffer(struct vcpu *v) +{ +uint64_t *pml_buf; +unsigned long pml_idx; + +ASSERT(vmx_vcpu_pml_enabled(v)); + +vmx_vmcs_enter(v); + +__vmread(GUEST_PML_INDEX, pml_idx); Don't you require the vCPU to be non-running or current when you get here? If so, perhaps add a respective ASSERT()? Yes an ASSERT would be better. v-pause_count will be increased if vcpu is kicked out by domain_pause explicitly, but looks the same thing won't be done if vcpu is kicked out by PML buffer full VMEXIT. So should the ASSERT be done like below? ASSERT(atomic_read(v-pause_count) || (v == current)); For one I'd reverse the two parts. And then I think pause count being non-zero is not a sufficient condition - if a non-synchronous pause was issued against the vCPU it may still be running. I'd suggest !vcpu_runnable(v) !v-is_running, possibly with the pause count check instead of the runnable one if the only permitted case where v != current requires the vCPU to be paused. The vmx_vcpu_flush_pml_buffer is only supposed to be called in below cases: - When PML full VMEXIT happens - In paging_log_dirty_op hap_track_dirty_vram, before reporting dirty pages to userspace. - In vmx_vcpu_disable_pml, called from vmx_vcpu_destroy, or when log-dirty mode is disabled. In the latter two cases, domain_pause is guaranteed to be called before vmx_vcpu_flush_pml_buffer is called, therefore looks there's no possibility of non-synchronous pause of the vcpu. Or are you suggesting we should suppose this function can be called from any caller, and meanwhile is able to act reasonably? No. All I'm saying is in order to protect against eventual undue future callers, it should assert that its preconditions are met. I.e. if the vCPU is expected to be paused, check that the pause count in non-zero _and_ that the pause actually took effect. I see. I will do as you suggested: ASSERT((v == current) || (!vcpu_runnable(v) !v-is_running)); And v != current should be the only case requires the vcpu to be paused. But if you require (or at least expect) the vCPU to be paused, this isn't what I suggested. Afaict ASSERT((v == current) || (!v-is_running atomic_read(v-pause_count))); would then be the right check (and, while not be a full guarantee that things wouldn't change behind your back, would at least increase chances that the vCPU's runnable state won't change, as the vCPU could have been non-runnable for reasons other than having been paused). Hi Jan, I tried the ASSERT with atomic_read(v-pause_count), and it turns out the ASSERT would fail and panic Xen. The reason is domain_pause only increases d-pause_count, but it doesn't internally increase v-pause_count for all vcpus. vmx_vcpu_flush_pml_buffer is only supposed to be called from PML buffer full VMEXIT, and vmx_domain_flush_pml_buffer, before which domain_pause should be called. Sorry that obviously I had some misunderstanding regarding to require (or at least expect) vCPU to be paused, and looks !vcpu_runnable(v) is the right choice. What's your opinion? Thanks, -Kai Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCHv6 5/5] gnttab: use per-VCPU maptrack free lists
On 23.04.15 at 18:29, david.vra...@citrix.com wrote: On 23/04/15 17:11, Jan Beulich wrote: On 22.04.15 at 18:00, david.vra...@citrix.com wrote: -return handle; +v-maptrack_limit = new_mt_limit; + +return __get_maptrack_handle(lgt, v); With the lock dropped, nothing guarantees this to succeed, which it ought to unless the table size reached its allowed maximum. We've just added a new page of handles for this VCPU. This will succeed. Ah, right, this is per-vCPU now. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v3 3/5] libxl: use LIBXL_TOOLSTACK_DOMID
On Fri, Mar 20, Wei Liu wrote: +++ b/tools/libxl/libxl_dm.c @@ -1438,6 +1438,8 @@ retry_transaction: spawn-what = GCSPRINTF(domain %d device model, domid); spawn-xspath = GCSPRINTF(/local/domain/0/device-model/%d/state, domid); +spawn-xspath = libxl__device_model_xs_path(gc, LIBXL_TOOLSTACK_DOMID, +domid, /state); This assigns xspath twice. Olaf ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH] libxlu: don't crash on empty lists
Prior to 1a09c5113a (libxlu: rework internal representation of setting) empty lists in config files did get accepted. Restore that behavior. Signed-off-by: Jan Beulich jbeul...@suse.com --- a/tools/libxl/libxlu_cfg.c +++ b/tools/libxl/libxlu_cfg.c @@ -397,7 +397,7 @@ XLU_ConfigValue *xlu__cfg_list_mk(CfgPar value = malloc(sizeof(*value)); if (!value) goto xe; value-type = XLU_LIST; -value-u.list.nvalues = 1; +value-u.list.nvalues = !!val; value-u.list.avalues = 1; value-u.list.values = values; memcpy(value-loc, loc, sizeof(*loc)); libxlu: don't crash on empty lists Prior to 1a09c5113a (libxlu: rework internal representation of setting) empty lists in config files did get accepted. Restore that behavior. Signed-off-by: Jan Beulich jbeul...@suse.com --- a/tools/libxl/libxlu_cfg.c +++ b/tools/libxl/libxlu_cfg.c @@ -397,7 +397,7 @@ XLU_ConfigValue *xlu__cfg_list_mk(CfgPar value = malloc(sizeof(*value)); if (!value) goto xe; value-type = XLU_LIST; -value-u.list.nvalues = 1; +value-u.list.nvalues = !!val; value-u.list.avalues = 1; value-u.list.values = values; memcpy(value-loc, loc, sizeof(*loc)); ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCHv6 5/5] gnttab: use per-VCPU maptrack free lists
On 24/04/15 10:50, Jan Beulich wrote: On 24.04.15 at 11:09, malcolm.cross...@citrix.com wrote: On 23/04/15 17:11, Jan Beulich wrote: On 22.04.15 at 18:00, david.vra...@citrix.com wrote: --- a/xen/include/xen/grant_table.h +++ b/xen/include/xen/grant_table.h @@ -60,6 +60,8 @@ struct grant_mapping { u32 ref; /* grant ref */ u16 flags; /* 0-4: GNTMAP_* ; 5-15: unused */ domid_t domid; /* granting domain */ +u32 vcpu; /* vcpu which created the grant mapping */ +u16 pad[2]; }; What is this pad[] good for? The pad is to keep the struct power of 2 sized because this allows the compiler to optimise these macro's to right and left shifts: #define MAPTRACK_PER_PAGE (PAGE_SIZE / sizeof(struct grant_mapping)) #define maptrack_entry(t, e) \ ((t)-maptrack[(e)/MAPTRACK_PER_PAGE][(e)%MAPTRACK_PER_PAGE]) Okay, then why u16[2] instead of u32? And please add a brief comment explaining the reason. Most likely because vcpu was a u16 in the first draft, and got bumped during internal review. Apart from that I wonder whether fitting vcpu in the 10 unused flags bits (not 11, as the comment on the field suggests) would be an option. That would require limiting vCPU count to 4k, which I don't think would really be a problem for anyone. 8k VCPU PV guests do function, and are very good at finding scalability limits. It would be nice not to break this. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] io.c:164:d25v0 Weird HVM ioemulation status 1. domain_crash called from io.c:165
Hi, On Xen-unstable I encountered this non stopping logspam while trying to shutdown an HVM domain with pci-passthrough: (XEN) [2015-04-24 11:55:47.802] io.c:164:d25v0 Weird HVM ioemulation status 1. (XEN) [2015-04-24 11:55:47.802] domain_crash called from io.c:165 (XEN) [2015-04-24 11:55:47.802] io.c:164:d25v0 Weird HVM ioemulation status 1. (XEN) [2015-04-24 11:55:47.802] domain_crash called from io.c:165 (XEN) [2015-04-24 11:55:47.802] io.c:164:d25v0 Weird HVM ioemulation status 1. (XEN) [2015-04-24 11:55:47.802] domain_crash called from io.c:165 (XEN) [2015-04-24 11:55:47.802] io.c:164:d25v0 Weird HVM ioemulation status 1. (XEN) [2015-04-24 11:55:47.802] domain_crash called from io.c:165 (XEN) [2015-04-24 11:55:47.802] io.c:164:d25v0 Weird HVM ioemulation status 1. (XEN) [2015-04-24 11:55:47.802] domain_crash called from io.c:165 (XEN) [2015-04-24 11:55:47.802] io.c:164:d25v0 Weird HVM ioemulation status 1. (XEN) [2015-04-24 11:55:47.802] domain_crash called from io.c:165 (XEN) [2015-04-24 11:55:47.802] io.c:164:d25v0 Weird HVM ioemulation status 1. (XEN) [2015-04-24 11:55:47.802] domain_crash called from io.c:165 (XEN) [2015-04-24 11:55:47.802] io.c:164:d25v0 Weird HVM ioemulation status 1. (XEN) [2015-04-24 11:55:47.802] domain_crash called from io.c:165 (XEN) [2015-04-24 11:55:47.802] io.c:164:d25v0 Weird HVM ioemulation status 1. (XEN) [2015-04-24 11:55:47.802] domain_crash called from io.c:165 (XEN) [2015-04-24 11:55:47.802] io.c:164:d25v0 Weird HVM ioemulation status 1. (XEN) [2015-04-24 11:55:47.802] domain_crash called from io.c:165 (XEN) [2015-04-24 11:55:47.802] io.c:164:d25v0 Weird HVM ioemulation status 1. etc. etc. etc. The domain had shutdown perfectly well on earlier occasions, but this time it kept running. It did drop from it's assigned 3 vpcu's to 1 vpcu, but it stalled there. The log spam only stopped by destroying the guest. Since i haven't seen it before, it's probably hard to replicate. -- Sander ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [v3 04/10] vmx: add new data structure member to support PML
A new 4K page pointer is added to arch_vmx_struct as PML buffer for vcpu. And a new 'status' field is added to vmx_domain to indicate whether PML is enabled for the domain or not. Signed-off-by: Kai Huang kai.hu...@linux.intel.com --- xen/include/asm-x86/hvm/vmx/vmcs.h | 8 1 file changed, 8 insertions(+) diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h index f831a78..441e974 100644 --- a/xen/include/asm-x86/hvm/vmx/vmcs.h +++ b/xen/include/asm-x86/hvm/vmx/vmcs.h @@ -70,8 +70,12 @@ struct ept_data { cpumask_var_t synced_mask; }; +#define _VMX_DOMAIN_PML_ENABLED0 +#define VMX_DOMAIN_PML_ENABLED (1ul _VMX_DOMAIN_PML_ENABLED) struct vmx_domain { unsigned long apic_access_mfn; +/* VMX_DOMAIN_* */ +unsigned int status; }; struct pi_desc { @@ -85,6 +89,8 @@ struct pi_desc { #define ept_get_eptp(ept) ((ept)-eptp) #define ept_get_synced_mask(ept) ((ept)-synced_mask) +#define NR_PML_ENTRIES 512 + struct arch_vmx_struct { /* Virtual address of VMCS. */ struct vmcs_struct *vmcs; @@ -142,6 +148,8 @@ struct arch_vmx_struct { /* Bitmap to control vmexit policy for Non-root VMREAD/VMWRITE */ struct page_info *vmread_bitmap; struct page_info *vmwrite_bitmap; + +struct page_info *pml_pg; }; int vmx_create_vmcs(struct vcpu *v); -- 2.1.0 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [v3 02/10] log-dirty: add new paging_mark_gfn_dirty
PML logs GPA in PML buffer. Original paging_mark_dirty takes MFN as parameter but it gets guest pfn internally and use guest pfn to as index for looking up radix log-dirty tree. In flushing PML buffer, calling paging_mark_dirty directly introduces redundant p2m lookups (gfn-mfn-gfn), therefore we introduce paging_mark_gfn_dirty which is bulk of paging_mark_dirty but takes guest pfn as parameter, and in flushing PML buffer we call paging_mark_gfn_dirty directly. Original paging_mark_dirty then simply is a wrapper of paging_mark_gfn_dirty. Signed-off-by: Kai Huang kai.hu...@linux.intel.com --- xen/arch/x86/mm/paging.c | 31 +-- xen/include/asm-x86/paging.h | 2 ++ 2 files changed, 23 insertions(+), 10 deletions(-) diff --git a/xen/arch/x86/mm/paging.c b/xen/arch/x86/mm/paging.c index b54d76a..77c929b 100644 --- a/xen/arch/x86/mm/paging.c +++ b/xen/arch/x86/mm/paging.c @@ -266,24 +266,17 @@ static int paging_log_dirty_disable(struct domain *d, bool_t resuming) return ret; } -/* Mark a page as dirty */ -void paging_mark_dirty(struct domain *d, unsigned long guest_mfn) +/* Mark a page as dirty, with taking guest pfn as parameter */ +void paging_mark_gfn_dirty(struct domain *d, unsigned long pfn) { -unsigned long pfn; -mfn_t gmfn; int changed; mfn_t mfn, *l4, *l3, *l2; unsigned long *l1; int i1, i2, i3, i4; -gmfn = _mfn(guest_mfn); - -if ( !paging_mode_log_dirty(d) || !mfn_valid(gmfn) || - page_get_owner(mfn_to_page(gmfn)) != d ) +if ( !paging_mode_log_dirty(d) ) return; -/* We /really/ mean PFN here, even for non-translated guests. */ -pfn = get_gpfn_from_mfn(mfn_x(gmfn)); /* Shared MFNs should NEVER be marked dirty */ BUG_ON(SHARED_M2P(pfn)); @@ -351,6 +344,24 @@ out: return; } +/* Mark a page as dirty */ +void paging_mark_dirty(struct domain *d, unsigned long guest_mfn) +{ +unsigned long pfn; +mfn_t gmfn; + +gmfn = _mfn(guest_mfn); + +if ( !paging_mode_log_dirty(d) || !mfn_valid(gmfn) || + page_get_owner(mfn_to_page(gmfn)) != d ) +return; + +/* We /really/ mean PFN here, even for non-translated guests. */ +pfn = get_gpfn_from_mfn(mfn_x(gmfn)); + +paging_mark_gfn_dirty(d, pfn); +} + /* Is this guest page dirty? */ int paging_mfn_is_dirty(struct domain *d, mfn_t gmfn) diff --git a/xen/include/asm-x86/paging.h b/xen/include/asm-x86/paging.h index 53de715..c99324c 100644 --- a/xen/include/asm-x86/paging.h +++ b/xen/include/asm-x86/paging.h @@ -156,6 +156,8 @@ void paging_log_dirty_init(struct domain *d, /* mark a page as dirty */ void paging_mark_dirty(struct domain *d, unsigned long guest_mfn); +/* mark a page as dirty with taking guest pfn as parameter */ +void paging_mark_gfn_dirty(struct domain *d, unsigned long pfn); /* is this guest page dirty? * This is called from inside paging code, with the paging lock held. */ -- 2.1.0 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [v3 09/10] log-dirty: refine common code to support PML
Using PML, it's possible there are dirty GPAs logged in vcpus' PML buffers when userspace peek/clear dirty pages, therefore we need to flush them befor reporting dirty pages to userspace. This applies to both video ram tracking and paging_log_dirty_op. This patch adds new p2m layer functions to enable/disable PML and flush PML buffers. The new functions are named to be generic to cover potential futher PML-like features for other platforms. Signed-off-by: Kai Huang kai.hu...@linux.intel.com --- xen/arch/x86/mm/hap/hap.c | 29 + xen/arch/x86/mm/p2m.c | 36 xen/arch/x86/mm/paging.c | 10 ++ xen/include/asm-x86/p2m.h | 11 +++ 4 files changed, 82 insertions(+), 4 deletions(-) diff --git a/xen/arch/x86/mm/hap/hap.c b/xen/arch/x86/mm/hap/hap.c index 4ecb2e2..1099670 100644 --- a/xen/arch/x86/mm/hap/hap.c +++ b/xen/arch/x86/mm/hap/hap.c @@ -121,7 +121,10 @@ int hap_track_dirty_vram(struct domain *d, p2m_change_type_range(d, ostart, oend, p2m_ram_logdirty, p2m_ram_rw); -/* set l1e entries of range within P2M table to be read-only. */ +/* + * switch vram to log dirty mode, either by setting l1e entries of + * P2M table to be read-only, or via hardware-assisted log-dirty. + */ p2m_change_type_range(d, begin_pfn, begin_pfn + nr, p2m_ram_rw, p2m_ram_logdirty); @@ -135,6 +138,9 @@ int hap_track_dirty_vram(struct domain *d, domain_pause(d); +/* flush dirty GFNs potentially cached by hardware */ +p2m_flush_hardware_cached_dirty(d); + /* get the bitmap */ paging_log_dirty_range(d, begin_pfn, nr, dirty_bitmap); @@ -190,9 +196,15 @@ static int hap_enable_log_dirty(struct domain *d, bool_t log_global) d-arch.paging.mode |= PG_log_dirty; paging_unlock(d); +/* enable hardware-assisted log-dirty if it is supported */ +p2m_enable_hardware_log_dirty(d); + if ( log_global ) { -/* set l1e entries of P2M table to be read-only. */ +/* + * switch to log dirty mode, either by setting l1e entries of P2M table + * to be read-only, or via hardware-assisted log-dirty. + */ p2m_change_entry_type_global(d, p2m_ram_rw, p2m_ram_logdirty); flush_tlb_mask(d-domain_dirty_cpumask); } @@ -205,14 +217,23 @@ static int hap_disable_log_dirty(struct domain *d) d-arch.paging.mode = ~PG_log_dirty; paging_unlock(d); -/* set l1e entries of P2M table with normal mode */ +/* disable hardware-assisted log-dirty if it is supported */ +p2m_disable_hardware_log_dirty(d); + +/* + * switch to normal mode, either by setting l1e entries of P2M table to + * normal mode, or via hardware-assisted log-dirty. + */ p2m_change_entry_type_global(d, p2m_ram_logdirty, p2m_ram_rw); return 0; } static void hap_clean_dirty_bitmap(struct domain *d) { -/* set l1e entries of P2M table to be read-only. */ +/* + * switch to log-dirty mode, either by setting l1e entries of P2M table to + * be read-only, or via hardware-assisted log-dirty. + */ p2m_change_entry_type_global(d, p2m_ram_rw, p2m_ram_logdirty); flush_tlb_mask(d-domain_dirty_cpumask); } diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c index df4a485..67edf89 100644 --- a/xen/arch/x86/mm/p2m.c +++ b/xen/arch/x86/mm/p2m.c @@ -239,6 +239,42 @@ void p2m_memory_type_changed(struct domain *d) } } +void p2m_enable_hardware_log_dirty(struct domain *d) +{ +struct p2m_domain *p2m = p2m_get_hostp2m(d); + +if ( p2m-enable_hardware_log_dirty ) +{ +p2m_lock(p2m); +p2m-enable_hardware_log_dirty(p2m); +p2m_unlock(p2m); +} +} + +void p2m_disable_hardware_log_dirty(struct domain *d) +{ +struct p2m_domain *p2m = p2m_get_hostp2m(d); + +if ( p2m-disable_hardware_log_dirty ) +{ +p2m_lock(p2m); +p2m-disable_hardware_log_dirty(p2m); +p2m_unlock(p2m); +} +} + +void p2m_flush_hardware_cached_dirty(struct domain *d) +{ +struct p2m_domain *p2m = p2m_get_hostp2m(d); + +if ( p2m-flush_hardware_cached_dirty ) +{ +p2m_lock(p2m); +p2m-flush_hardware_cached_dirty(p2m); +p2m_unlock(p2m); +} +} + mfn_t __get_gfn_type_access(struct p2m_domain *p2m, unsigned long gfn, p2m_type_t *t, p2m_access_t *a, p2m_query_t q, unsigned int *page_order, bool_t locked) diff --git a/xen/arch/x86/mm/paging.c b/xen/arch/x86/mm/paging.c index 77c929b..59d4720 100644 --- a/xen/arch/x86/mm/paging.c +++ b/xen/arch/x86/mm/paging.c @@ -422,7 +422,17 @@ static int paging_log_dirty_op(struct domain *d, int i4, i3, i2; if ( !resuming ) +{ domain_pause(d); + +/* + * Flush
[Xen-devel] [v3 01/10] vmx: add new boot parameter to control PML enabling
A top level EPT parameter ept=options and a sub boolean opt_pml_enabled are added to control PML. Other booleans can be further added for any other EPT related features. The document description for the new parameter is also added. Signed-off-by: Kai Huang kai.hu...@linux.intel.com --- docs/misc/xen-command-line.markdown | 15 +++ xen/arch/x86/hvm/vmx/vmcs.c | 30 ++ 2 files changed, 45 insertions(+) diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown index 1dda1f0..4889e27 100644 --- a/docs/misc/xen-command-line.markdown +++ b/docs/misc/xen-command-line.markdown @@ -685,6 +685,21 @@ requirement can be relaxed. This option is particularly useful for nested virtualization, to allow the L1 hypervisor to use EPT even if the L0 hypervisor does not provide VM\_ENTRY\_LOAD\_GUEST\_PAT. +### ept (Intel) + `= List of ( pmlboolean )` + + Default: `false` + +Controls EPT related features. Currently only Page Modification Logging (PML) is +the controllable feature as boolean type. + +PML is a new hardware feature in Intel's Broadwell Server and further platforms +which reduces hypervisor overhead of log-dirty mechanism by automatically +recording GPAs (guest physical addresses) when guest memory gets dirty, and +therefore significantly reducing number of EPT violation caused by write +protection of guest memory, which is a necessity to implement log-dirty +mechanism before PML. + ### gdb `= baud[/clock_hz][,DPS[,io-base[,irq[,port-bdf[,bridge-bdf | pci | amt ] ` diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c index 63007a9..79efa42 100644 --- a/xen/arch/x86/hvm/vmx/vmcs.c +++ b/xen/arch/x86/hvm/vmx/vmcs.c @@ -64,6 +64,36 @@ integer_param(ple_gap, ple_gap); static unsigned int __read_mostly ple_window = 4096; integer_param(ple_window, ple_window); +static bool_t __read_mostly opt_pml_enabled = 0; + +/* + * The 'ept' parameter controls functionalities that depend on, or impact the + * EPT mechanism. Optional comma separated value may contain: + * + * pml Enable PML + */ +static void __init parse_ept_param(char *s) +{ +char *ss; + +do { +bool_t val = !!strncmp(s, no-, 3); +if ( !val ) +s += 3; + +ss = strchr(s, ','); +if ( ss ) +*ss = '\0'; + +if ( !strcmp(s, pml) ) +opt_pml_enabled = val; + +s = ss + 1; +} while ( ss ); +} + +custom_param(ept, parse_ept_param); + /* Dynamic (run-time adjusted) execution control flags. */ u32 vmx_pin_based_exec_control __read_mostly; u32 vmx_cpu_based_exec_control __read_mostly; -- 2.1.0 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [v3 03/10] vmx: add PML definition and feature detection
The patch adds PML definition and feature detection. Note PML won't be detected if PML is disabled from boot parameter. PML is also disabled in construct_vmcs, as it will only be enabled when domain is switched to log dirty mode. Signed-off-by: Kai Huang kai.hu...@linux.intel.com --- xen/arch/x86/hvm/vmx/vmcs.c| 18 ++ xen/include/asm-x86/hvm/vmx/vmcs.h | 6 ++ xen/include/asm-x86/hvm/vmx/vmx.h | 1 + 3 files changed, 25 insertions(+) diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c index 79efa42..04fdca3 100644 --- a/xen/arch/x86/hvm/vmx/vmcs.c +++ b/xen/arch/x86/hvm/vmx/vmcs.c @@ -140,6 +140,7 @@ static void __init vmx_display_features(void) P(cpu_has_vmx_virtual_intr_delivery, Virtual Interrupt Delivery); P(cpu_has_vmx_posted_intr_processing, Posted Interrupt Processing); P(cpu_has_vmx_vmcs_shadowing, VMCS shadowing); +P(cpu_has_vmx_pml, Page Modification Logging); #undef P if ( !printed ) @@ -237,6 +238,8 @@ static int vmx_init_vmcs_config(void) opt |= SECONDARY_EXEC_ENABLE_VPID; if ( opt_unrestricted_guest_enabled ) opt |= SECONDARY_EXEC_UNRESTRICTED_GUEST; +if ( opt_pml_enabled ) +opt |= SECONDARY_EXEC_ENABLE_PML; /* * APIC Register Virtualization and Virtual Interrupt Delivery @@ -283,6 +286,10 @@ static int vmx_init_vmcs_config(void) */ if ( !(_vmx_ept_vpid_cap VMX_VPID_INVVPID_ALL_CONTEXT) ) _vmx_secondary_exec_control = ~SECONDARY_EXEC_ENABLE_VPID; + +/* EPT A/D bits is required for PML */ +if ( !(_vmx_ept_vpid_cap VMX_EPT_AD_BIT) ) +_vmx_secondary_exec_control = ~SECONDARY_EXEC_ENABLE_PML; } if ( _vmx_secondary_exec_control SECONDARY_EXEC_ENABLE_EPT ) @@ -303,6 +310,14 @@ static int vmx_init_vmcs_config(void) SECONDARY_EXEC_UNRESTRICTED_GUEST); } +/* PML cannot be supported if EPT is not used */ +if ( !(_vmx_secondary_exec_control SECONDARY_EXEC_ENABLE_EPT) ) +_vmx_secondary_exec_control = ~SECONDARY_EXEC_ENABLE_PML; + +/* Turn off opt_pml_enabled if PML feature is not present */ +if ( !(_vmx_secondary_exec_control SECONDARY_EXEC_ENABLE_PML) ) +opt_pml_enabled = 0; + if ( (_vmx_secondary_exec_control SECONDARY_EXEC_PAUSE_LOOP_EXITING) ple_gap == 0 ) { @@ -1038,6 +1053,9 @@ static int construct_vmcs(struct vcpu *v) __vmwrite(POSTED_INTR_NOTIFICATION_VECTOR, posted_intr_vector); } +/* Disable PML anyway here as it will only be enabled in log dirty mode */ +v-arch.hvm_vmx.secondary_exec_control = ~SECONDARY_EXEC_ENABLE_PML; + /* Host data selectors. */ __vmwrite(HOST_SS_SELECTOR, __HYPERVISOR_DS); __vmwrite(HOST_DS_SELECTOR, __HYPERVISOR_DS); diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h index 6fce6aa..f831a78 100644 --- a/xen/include/asm-x86/hvm/vmx/vmcs.h +++ b/xen/include/asm-x86/hvm/vmx/vmcs.h @@ -215,6 +215,7 @@ extern u32 vmx_vmentry_control; #define SECONDARY_EXEC_ENABLE_INVPCID 0x1000 #define SECONDARY_EXEC_ENABLE_VMFUNC0x2000 #define SECONDARY_EXEC_ENABLE_VMCS_SHADOWING0x4000 +#define SECONDARY_EXEC_ENABLE_PML 0x0002 extern u32 vmx_secondary_exec_control; #define VMX_EPT_EXEC_ONLY_SUPPORTED 0x0001 @@ -226,6 +227,7 @@ extern u32 vmx_secondary_exec_control; #define VMX_EPT_INVEPT_INSTRUCTION 0x0010 #define VMX_EPT_INVEPT_SINGLE_CONTEXT 0x0200 #define VMX_EPT_INVEPT_ALL_CONTEXT 0x0400 +#define VMX_EPT_AD_BIT 0x0020 #define VMX_MISC_VMWRITE_ALL0x2000 @@ -274,6 +276,8 @@ extern u32 vmx_secondary_exec_control; (vmx_pin_based_exec_control PIN_BASED_POSTED_INTERRUPT) #define cpu_has_vmx_vmcs_shadowing \ (vmx_secondary_exec_control SECONDARY_EXEC_ENABLE_VMCS_SHADOWING) +#define cpu_has_vmx_pml \ +(vmx_secondary_exec_control SECONDARY_EXEC_ENABLE_PML) #define VMCS_RID_TYPE_MASK 0x8000 @@ -318,6 +322,7 @@ enum vmcs_field { GUEST_LDTR_SELECTOR = 0x080c, GUEST_TR_SELECTOR = 0x080e, GUEST_INTR_STATUS = 0x0810, +GUEST_PML_INDEX = 0x0812, HOST_ES_SELECTOR= 0x0c00, HOST_CS_SELECTOR= 0x0c02, HOST_SS_SELECTOR= 0x0c04, @@ -331,6 +336,7 @@ enum vmcs_field { VM_EXIT_MSR_STORE_ADDR = 0x2006, VM_EXIT_MSR_LOAD_ADDR = 0x2008, VM_ENTRY_MSR_LOAD_ADDR = 0x200a, +PML_ADDRESS = 0x200e, TSC_OFFSET = 0x2010, VIRTUAL_APIC_PAGE_ADDR = 0x2012, APIC_ACCESS_ADDR= 0x2014, diff --git
[Xen-devel] [v3 08/10] vmx: disable PML in vmx_vcpu_destroy
It's possible domain still remains in log-dirty mode when it is about to be destroyed, in which case we should manually disable PML for it. Signed-off-by: Kai Huang kai.hu...@linux.intel.com --- xen/arch/x86/hvm/vmx/vmx.c | 8 1 file changed, 8 insertions(+) diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c index e5471b8..e189424 100644 --- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -153,6 +153,14 @@ static int vmx_vcpu_initialise(struct vcpu *v) static void vmx_vcpu_destroy(struct vcpu *v) { +/* + * There are cases that domain still remains in log-dirty mode when it is + * about to be destroyed (ex, user types 'xl destroy dom'), in which case + * we should disable PML manually here. Note that vmx_vcpu_destroy is called + * prior to vmx_domain_destroy so we need to disable PML for each vcpu + * separately here. + */ +vmx_vcpu_disable_pml(v); vmx_destroy_vmcs(v); vpmu_destroy(v); passive_domain_destroy(v); -- 2.1.0 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [v3 00/10] PML (Page Modification Logging) support
v2-v3: - Merged v2 patch 02 (document change) to patch 01 as a single patch, and changed new parameter description as suggested by Andrew. - changed vmx_vcpu_flush_pml_buffer to call mark_dirty for all logged GFNs, and call p2m_change_type_one regardless of return value. - Added ASSERT for vcpu (being current, or being non-running and unrunnable) to vmx_vcpu_flush_pml_buffer - Other refinement in coding style, comments description, etc. Sanity test of live migration has been tested both with and without PML. Kai Huang (10): vmx: add new boot parameter to control PML enabling log-dirty: add new paging_mark_gfn_dirty vmx: add PML definition and feature detection vmx: add new data structure member to support PML vmx: add help functions to support PML vmx: handle PML buffer full VMEXIT vmx: handle PML enabling in vmx_vcpu_initialise vmx: disable PML in vmx_vcpu_destroy log-dirty: refine common code to support PML p2m/ept: enable PML in p2m-ept for log-dirty docs/misc/xen-command-line.markdown | 15 +++ xen/arch/x86/hvm/vmx/vmcs.c | 227 xen/arch/x86/hvm/vmx/vmx.c | 35 ++ xen/arch/x86/mm/hap/hap.c | 29 - xen/arch/x86/mm/p2m-ept.c | 79 +++-- xen/arch/x86/mm/p2m.c | 36 ++ xen/arch/x86/mm/paging.c| 41 +-- xen/include/asm-x86/hvm/vmx/vmcs.h | 26 - xen/include/asm-x86/hvm/vmx/vmx.h | 4 +- xen/include/asm-x86/p2m.h | 11 ++ xen/include/asm-x86/paging.h| 2 + 11 files changed, 482 insertions(+), 23 deletions(-) -- 2.1.0 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [v3 07/10] vmx: handle PML enabling in vmx_vcpu_initialise
It's possible domain has already been in log-dirty mode when creating vcpu, in which case we should enable PML for this vcpu if PML has been enabled for the domain. Signed-off-by: Kai Huang kai.hu...@linux.intel.com --- xen/arch/x86/hvm/vmx/vmx.c | 23 +++ 1 file changed, 23 insertions(+) diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c index f11ac46..e5471b8 100644 --- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -117,6 +117,29 @@ static int vmx_vcpu_initialise(struct vcpu *v) return rc; } +/* + * It's rare but still possible that domain has already been in log-dirty + * mode when vcpu is being created (commented by Tim), in which case we + * should enable PML for this vcpu if PML has been enabled for the domain, + * and failure to enable results in failure of creating this vcpu. + * + * Note even there's no vcpu created for the domain, vmx_domain_enable_pml + * will return successful in which case vmx_domain_pml_enabled will also + * return true. And even this is the first vcpu to be created with + * vmx_domain_pml_enabled being true, failure of enabling PML still results + * in failure of creating vcpu, to avoid complicated logic to revert PML + * style EPT table to non-PML style EPT table. + */ +if ( vmx_domain_pml_enabled(v-domain) ) +{ +if ( (rc = vmx_vcpu_enable_pml(v)) != 0 ) +{ +dprintk(XENLOG_ERR, %pv: Failed to enable PML.\n, v); +vmx_destroy_vmcs(v); +return rc; +} +} + vpmu_initialise(v); vmx_install_vlapic_mapping(v); -- 2.1.0 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [v3 05/10] vmx: add help functions to support PML
This patch adds help functions to enable/disable PML, and flush PML buffer for single vcpu and particular domain for further use. Signed-off-by: Kai Huang kai.hu...@linux.intel.com --- xen/arch/x86/hvm/vmx/vmcs.c| 179 + xen/include/asm-x86/hvm/vmx/vmcs.h | 9 ++ 2 files changed, 188 insertions(+) diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c index 04fdca3..f797fde 100644 --- a/xen/arch/x86/hvm/vmx/vmcs.c +++ b/xen/arch/x86/hvm/vmx/vmcs.c @@ -1323,6 +1323,185 @@ void vmx_clear_eoi_exit_bitmap(struct vcpu *v, u8 vector) v-arch.hvm_vmx.eoi_exitmap_changed); } +bool_t vmx_vcpu_pml_enabled(const struct vcpu *v) +{ +return !!(v-arch.hvm_vmx.secondary_exec_control + SECONDARY_EXEC_ENABLE_PML); +} + +int vmx_vcpu_enable_pml(struct vcpu *v) +{ +if ( vmx_vcpu_pml_enabled(v) ) +return 0; + +v-arch.hvm_vmx.pml_pg = v-domain-arch.paging.alloc_page(v-domain); +if ( !v-arch.hvm_vmx.pml_pg ) +return -ENOMEM; + +vmx_vmcs_enter(v); + +__vmwrite(PML_ADDRESS, page_to_mfn(v-arch.hvm_vmx.pml_pg) PAGE_SHIFT); +__vmwrite(GUEST_PML_INDEX, NR_PML_ENTRIES - 1); + +v-arch.hvm_vmx.secondary_exec_control |= SECONDARY_EXEC_ENABLE_PML; + +__vmwrite(SECONDARY_VM_EXEC_CONTROL, + v-arch.hvm_vmx.secondary_exec_control); + +vmx_vmcs_exit(v); + +return 0; +} + +void vmx_vcpu_disable_pml(struct vcpu *v) +{ +if ( !vmx_vcpu_pml_enabled(v) ) +return; + +/* Make sure we don't lose any logged GPAs */ +vmx_vcpu_flush_pml_buffer(v); + +vmx_vmcs_enter(v); + +v-arch.hvm_vmx.secondary_exec_control = ~SECONDARY_EXEC_ENABLE_PML; +__vmwrite(SECONDARY_VM_EXEC_CONTROL, + v-arch.hvm_vmx.secondary_exec_control); + +vmx_vmcs_exit(v); + +v-domain-arch.paging.free_page(v-domain, v-arch.hvm_vmx.pml_pg); +v-arch.hvm_vmx.pml_pg = NULL; +} + +void vmx_vcpu_flush_pml_buffer(struct vcpu *v) +{ +uint64_t *pml_buf; +unsigned long pml_idx; + +ASSERT((v == current) || (!vcpu_runnable(v) !v-is_running)); +ASSERT(vmx_vcpu_pml_enabled(v)); + +vmx_vmcs_enter(v); + +__vmread(GUEST_PML_INDEX, pml_idx); + +/* Do nothing if PML buffer is empty */ +if ( pml_idx == (NR_PML_ENTRIES - 1) ) +goto out; + +pml_buf = __map_domain_page(v-arch.hvm_vmx.pml_pg); + +/* + * PML index can be either 2^16-1 (buffer is full), or 0 ~ NR_PML_ENTRIES-1 + * (buffer is not full), and in latter case PML index always points to next + * available entity. + */ +if (pml_idx = NR_PML_ENTRIES) +pml_idx = 0; +else +pml_idx++; + +for ( ; pml_idx NR_PML_ENTRIES; pml_idx++ ) +{ +unsigned long gfn = pml_buf[pml_idx] PAGE_SHIFT; + +/* + * Need to change type from log-dirty to normal memory for logged GFN. + * hap_track_dirty_vram depends on it to work. And we mark all logged + * GFNs to be dirty, as we cannot be sure whether it's safe to ignore + * GFNs on which p2m_change_type_one returns failure. The failure cases + * are very rare, and additional cost is negligible, but a missing mark + * is extremely difficult to debug. + */ +p2m_change_type_one(v-domain, gfn, p2m_ram_logdirty, p2m_ram_rw); +paging_mark_gfn_dirty(v-domain, gfn); +} + +unmap_domain_page(pml_buf); + +/* Reset PML index */ +__vmwrite(GUEST_PML_INDEX, NR_PML_ENTRIES - 1); + + out: +vmx_vmcs_exit(v); +} + +bool_t vmx_domain_pml_enabled(const struct domain *d) +{ +return !!(d-arch.hvm_domain.vmx.status VMX_DOMAIN_PML_ENABLED); +} + +/* + * This function enables PML for particular domain. It should be called when + * domain is paused. + * + * PML needs to be enabled globally for all vcpus of the domain, as PML buffer + * and PML index are pre-vcpu, but EPT table is shared by vcpus, therefore + * enabling PML on partial vcpus won't work. + */ +int vmx_domain_enable_pml(struct domain *d) +{ +struct vcpu *v; +int rc; + +ASSERT(atomic_read(d-pause_count)); + +if ( vmx_domain_pml_enabled(d) ) +return 0; + +for_each_vcpu( d, v ) +if ( (rc = vmx_vcpu_enable_pml(v)) != 0 ) +goto error; + +d-arch.hvm_domain.vmx.status |= VMX_DOMAIN_PML_ENABLED; + +return 0; + + error: +for_each_vcpu( d, v ) +if ( vmx_vcpu_pml_enabled(v) ) +vmx_vcpu_disable_pml(v); +return rc; +} + +/* + * Disable PML for particular domain. Called when domain is paused. + * + * The same as enabling PML for domain, disabling PML should be done for all + * vcpus at once. + */ +void vmx_domain_disable_pml(struct domain *d) +{ +struct vcpu *v; + +ASSERT(atomic_read(d-pause_count)); + +if ( !vmx_domain_pml_enabled(d) ) +return; + +for_each_vcpu( d, v ) +vmx_vcpu_disable_pml(v); + +d-arch.hvm_domain.vmx.status = ~VMX_DOMAIN_PML_ENABLED; +}
[Xen-devel] [PATCH 6/7] tools: replace private LIBDIR with automake libdir
The result of this command: git grep -wnl LIBDIR | xargs sed -i 's@LIBDIR@libdir@g' Signed-off-by: Olaf Hering o...@aepfle.de Cc: Ian Jackson ian.jack...@eu.citrix.com Cc: Stefano Stabellini stefano.stabell...@eu.citrix.com Cc: Ian Campbell ian.campb...@citrix.com Cc: Wei Liu wei.l...@citrix.com --- Config.mk| 4 ++-- config/Paths.mk.in | 1 - tools/Makefile | 10 +- tools/blktap2/control/Makefile | 8 tools/blktap2/vhd/lib/Makefile | 2 +- tools/hotplug/FreeBSD/rc.d/xencommons.in | 2 +- tools/hotplug/Linux/xen-hotplug-common.sh.in | 2 +- tools/hotplug/NetBSD/rc.d/xen-watchdog | 2 +- tools/hotplug/NetBSD/rc.d/xencommons.in | 2 +- tools/hotplug/NetBSD/rc.d/xendomains | 2 +- tools/libfsimage/Rules.mk| 2 +- tools/libfsimage/common/Makefile | 8 tools/libvchan/Makefile | 10 +- tools/libxc/Makefile | 18 +- tools/libxl/Makefile | 18 +- tools/xenstat/libxenstat/Makefile| 8 tools/xenstore/Makefile | 10 +- 17 files changed, 54 insertions(+), 55 deletions(-) diff --git a/Config.mk b/Config.mk index ec16961..46928ca 100644 --- a/Config.mk +++ b/Config.mk @@ -157,7 +157,7 @@ define move-if-changed if ! cmp -s $(1) $(2); then mv -f $(1) $(2); else rm -f $(1); fi endef -BUILD_MAKE_VARS := sbindir bindir LIBEXEC LIBEXEC_BIN LIBDIR SHAREDIR \ +BUILD_MAKE_VARS := sbindir bindir LIBEXEC LIBEXEC_BIN libdir SHAREDIR \ XENFIRMWAREDIR XEN_CONFIG_DIR XEN_SCRIPT_DIR XEN_LOCK_DIR \ XEN_RUN_DIR XEN_PAGING_DIR @@ -204,7 +204,7 @@ CFLAGS += $(foreach i, $(EXTRA_INCLUDES), -I$(i)) LDFLAGS += $(foreach i, $(PREPEND_LIB), -L$(i)) CFLAGS += $(foreach i, $(PREPEND_INCLUDES), -I$(i)) ifeq ($(XEN_TOOLS_RPATH),y) -LDFLAGS += -Wl,-rpath,$(LIBDIR) +LDFLAGS += -Wl,-rpath,$(libdir) endif APPEND_LDFLAGS += $(foreach i, $(APPEND_LIB), -L$(i)) APPEND_CFLAGS += $(foreach i, $(APPEND_INCLUDES), -I$(i)) diff --git a/config/Paths.mk.in b/config/Paths.mk.in index 8ca02bb..6db108a 100644 --- a/config/Paths.mk.in +++ b/config/Paths.mk.in @@ -37,7 +37,6 @@ LIBEXEC_INC := $(LIBEXEC)/include SHAREDIR := @SHAREDIR@ MAN1DIR := $(mandir)/man1 MAN8DIR := $(mandir)/man8 -LIBDIR := $(libdir) DOCDIR := $(docdir) XEN_RUN_DIR := @XEN_RUN_DIR@ diff --git a/tools/Makefile b/tools/Makefile index 1e0c0a7..383d4ca 100644 --- a/tools/Makefile +++ b/tools/Makefile @@ -89,9 +89,9 @@ uninstall: rm -rf $(D)$(includedir)/xen rm -rf $(D)$(includedir)/_libxl* $(D)$(includedir)/libxl* rm -rf $(D)$(includedir)/xenstat.h $(D)$(includedir)/xentoollog.h - rm -rf $(D)$(LIBDIR)/libxenctrl* $(D)$(LIBDIR)/libxenguest* - rm -rf $(D)$(LIBDIR)/libxenstore* $(D)$(LIBDIR)/libxlutil* - rm -rf $(D)$(LIBDIR)/python/xen $(D)$(LIBDIR)/python/grub + rm -rf $(D)$(libdir)/libxenctrl* $(D)$(libdir)/libxenguest* + rm -rf $(D)$(libdir)/libxenstore* $(D)$(libdir)/libxlutil* + rm -rf $(D)$(libdir)/python/xen $(D)$(libdir)/python/grub rm -rf $(D)$(LIBEXEC) rm -rf $(D)$(sbindir)/setmask rm -rf $(D)$(sbindir)/xen* $(D)$(sbindir)/netfix $(D)$(sbindir)/xm @@ -117,8 +117,8 @@ IOEMU_CONFIGURE_CROSS ?= --cross-prefix=$(CROSS_COMPILE) \ endif ifeq ($(XEN_TOOLS_RPATH),y) -QEMU_UPSTREAM_RPATH := -Wl,-rpath,$(LIBEXEC_LIB):$(LIBDIR) -IOEMU_EXTRA_LDFLAGS := --extra-ldflags=-Wl,-rpath,$(LIBDIR) +QEMU_UPSTREAM_RPATH := -Wl,-rpath,$(LIBEXEC_LIB):$(libdir) +IOEMU_EXTRA_LDFLAGS := --extra-ldflags=-Wl,-rpath,$(libdir) else QEMU_UPSTREAM_RPATH := -Wl,-rpath,$(LIBEXEC_LIB) IOEMU_EXTRA_LDFLAGS := diff --git a/tools/blktap2/control/Makefile b/tools/blktap2/control/Makefile index 9ae2e51..767f52a 100644 --- a/tools/blktap2/control/Makefile +++ b/tools/blktap2/control/Makefile @@ -63,10 +63,10 @@ $(LIB_SHARED): $(CTL_PICS) install: $(IBIN) $(LIB_STATIC) $(LIB_SHARED) $(INSTALL_DIR) -p $(DESTDIR)$(sbindir) $(INSTALL_PROG) $(IBIN) $(DESTDIR)$(sbindir) - $(INSTALL_DATA) $(LIB_STATIC) $(DESTDIR)$(LIBDIR) - $(INSTALL_PROG) $(LIB_SHARED) $(DESTDIR)$(LIBDIR) - ln -sf $(LIBSONAME) $(DESTDIR)$(LIBDIR)/$(LIBNAME).so - ln -sf $(LIB_SHARED) $(DESTDIR)$(LIBDIR)/$(LIBSONAME) + $(INSTALL_DATA) $(LIB_STATIC) $(DESTDIR)$(libdir) + $(INSTALL_PROG) $(LIB_SHARED) $(DESTDIR)$(libdir) + ln -sf $(LIBSONAME) $(DESTDIR)$(libdir)/$(LIBNAME).so + ln -sf $(LIB_SHARED) $(DESTDIR)$(libdir)/$(LIBSONAME) clean: rm -f $(OBJS) $(PICS) $(DEPS) $(IBIN) $(LIB_STATIC) $(LIB_SHARED) diff --git a/tools/blktap2/vhd/lib/Makefile
[Xen-devel] [PATCH 1/7] tools: replace private SBINDIR with automake sbindir
The result of this command: git grep -wnl SBINDIR | xargs sed -i 's@SBINDIR@sbindir@g' Signed-off-by: Olaf Hering o...@aepfle.de Cc: Ian Jackson ian.jack...@eu.citrix.com Cc: Stefano Stabellini stefano.stabell...@eu.citrix.com Cc: Ian Campbell ian.campb...@citrix.com Cc: Wei Liu wei.l...@citrix.com Cc: Mukesh Rathor mukesh.rat...@oracle.com Cc: Tim Deegan t...@xen.org Cc: Daniel De Graaf dgde...@tycho.nsa.gov Cc: David Scott dave.sc...@eu.citrix.com --- Config.mk| 2 +- config/Paths.mk.in | 1 - tools/Makefile | 6 +++--- tools/blktap2/control/Makefile | 4 ++-- tools/blktap2/drivers/Makefile | 2 +- tools/blktap2/vhd/Makefile | 2 +- tools/console/Makefile | 4 ++-- tools/debugger/gdbsx/Makefile| 4 ++-- tools/debugger/kdd/Makefile | 4 ++-- tools/flask/utils/Makefile | 4 ++-- tools/hotplug/FreeBSD/rc.d/xencommons.in | 12 ++-- tools/hotplug/FreeBSD/vif-bridge | 2 +- tools/hotplug/Linux/init.d/xen-watchdog.in | 2 +- tools/hotplug/Linux/init.d/xencommons.in | 2 +- tools/hotplug/Linux/xen-hotplug-common.sh.in | 2 +- tools/hotplug/Linux/xendomains.in| 2 +- tools/hotplug/NetBSD/block | 2 +- tools/hotplug/NetBSD/rc.d/xen-watchdog | 4 ++-- tools/hotplug/NetBSD/rc.d/xencommons.in | 12 ++-- tools/hotplug/NetBSD/rc.d/xendomains | 2 +- tools/hotplug/NetBSD/vif-bridge | 2 +- tools/hotplug/NetBSD/vif-ip | 2 +- tools/libxl/Makefile | 4 ++-- tools/misc/Makefile | 4 ++-- tools/ocaml/xenstored/Makefile | 4 ++-- tools/tests/mce-test/tools/Makefile | 2 +- tools/xenbackendd/Makefile | 4 ++-- tools/xenmon/Makefile| 8 tools/xenpmd/Makefile| 4 ++-- tools/xenstat/xentop/Makefile| 4 ++-- tools/xenstore/Makefile | 4 ++-- 31 files changed, 58 insertions(+), 59 deletions(-) diff --git a/Config.mk b/Config.mk index 28b77d6..abb1e44 100644 --- a/Config.mk +++ b/Config.mk @@ -157,7 +157,7 @@ define move-if-changed if ! cmp -s $(1) $(2); then mv -f $(1) $(2); else rm -f $(1); fi endef -BUILD_MAKE_VARS := SBINDIR BINDIR LIBEXEC LIBEXEC_BIN LIBDIR SHAREDIR \ +BUILD_MAKE_VARS := sbindir BINDIR LIBEXEC LIBEXEC_BIN LIBDIR SHAREDIR \ XENFIRMWAREDIR XEN_CONFIG_DIR XEN_SCRIPT_DIR XEN_LOCK_DIR \ XEN_RUN_DIR XEN_PAGING_DIR diff --git a/config/Paths.mk.in b/config/Paths.mk.in index 150bae7..26674b1 100644 --- a/config/Paths.mk.in +++ b/config/Paths.mk.in @@ -31,7 +31,6 @@ sysconfdir := @sysconfdir@ PREFIX := $(prefix) -SBINDIR := $(sbindir) BINDIR := $(bindir) LIBEXEC := $(libexecdir)/$(PACKAGE_TARNAME) LIBEXEC_BIN := @LIBEXEC_BIN@ diff --git a/tools/Makefile b/tools/Makefile index 966354a..77625cb 100644 --- a/tools/Makefile +++ b/tools/Makefile @@ -71,7 +71,7 @@ uninstall: rm -f $(D)$(CONFIG_DIR)/udev/rules.d/xen-backend.rules rm -f $(D)$(CONFIG_DIR)/udev/rules.d/xend.rules rm -f $(D)$(SYSCONFIG_DIR)/xendomains - rm -f $(D)$(SBINDIR)/xendomains + rm -f $(D)$(sbindir)/xendomains rm -f $(D)$(SYSCONFIG_DIR)/xencommons rm -rf $(D)/var/lib/xen* rm -rf $(D)$(BINDIR)/cpuperf-perfcntr $(D)$(BINDIR)/cpuperf-xen @@ -93,8 +93,8 @@ uninstall: rm -rf $(D)$(LIBDIR)/libxenstore* $(D)$(LIBDIR)/libxlutil* rm -rf $(D)$(LIBDIR)/python/xen $(D)$(LIBDIR)/python/grub rm -rf $(D)$(LIBEXEC) - rm -rf $(D)$(SBINDIR)/setmask - rm -rf $(D)$(SBINDIR)/xen* $(D)$(SBINDIR)/netfix $(D)$(SBINDIR)/xm + rm -rf $(D)$(sbindir)/setmask + rm -rf $(D)$(sbindir)/xen* $(D)$(sbindir)/netfix $(D)$(sbindir)/xm rm -rf $(D)$(SHAREDIR)/doc/xen rm -rf $(D)$(SHAREDIR)/xen rm -rf $(D)$(SHAREDIR)/qemu-xen diff --git a/tools/blktap2/control/Makefile b/tools/blktap2/control/Makefile index 1ae6902..9ae2e51 100644 --- a/tools/blktap2/control/Makefile +++ b/tools/blktap2/control/Makefile @@ -61,8 +61,8 @@ $(LIB_SHARED): $(CTL_PICS) $(CC) $(LDFLAGS) -fPIC -Wl,$(SONAME_LDFLAG) -Wl,$(LIBSONAME) $(SHLIB_LDFLAGS) -rdynamic $^ -o $@ $(APPEND_LDFLAGS) install: $(IBIN) $(LIB_STATIC) $(LIB_SHARED) - $(INSTALL_DIR) -p $(DESTDIR)$(SBINDIR) - $(INSTALL_PROG) $(IBIN) $(DESTDIR)$(SBINDIR) + $(INSTALL_DIR) -p $(DESTDIR)$(sbindir) + $(INSTALL_PROG) $(IBIN) $(DESTDIR)$(sbindir) $(INSTALL_DATA) $(LIB_STATIC) $(DESTDIR)$(LIBDIR) $(INSTALL_PROG) $(LIB_SHARED) $(DESTDIR)$(LIBDIR) ln -sf $(LIBSONAME)
[Xen-devel] [PATCH 0/7] tools: remove private Makefile variables
Replace all private variables in Makefiles with automake variables. This series is based on 92ff75384bce7a11e27fbfaf0c531e88dd1ab4c7. Olaf Olaf Hering (7): tools: replace private SBINDIR with automake sbindir tools: replace private BINDIR with automake bindir tools: replace private PREFIX with automake prefix tools: replace private INCLUDEDIR with automake includedir tools: replace private MANDIR with automake mandir tools: replace private LIBDIR with automake libdir tools: replace private DOCDIR with automake docdir Config.mk| 10 +++--- config/Paths.mk.in | 12 ++-- docs/Makefile| 10 +++--- stubdom/Makefile | 4 +-- tools/Makefile | 46 ++-- tools/blktap2/control/Makefile | 12 tools/blktap2/drivers/Makefile | 2 +- tools/blktap2/include/Makefile | 2 +- tools/blktap2/vhd/Makefile | 2 +- tools/blktap2/vhd/lib/Makefile | 2 +- tools/console/Makefile | 4 +-- tools/debugger/gdbsx/Makefile| 4 +-- tools/debugger/kdd/Makefile | 4 +-- tools/flask/utils/Makefile | 4 +-- tools/hotplug/FreeBSD/rc.d/xencommons.in | 16 +- tools/hotplug/FreeBSD/vif-bridge | 2 +- tools/hotplug/Linux/init.d/xen-watchdog.in | 2 +- tools/hotplug/Linux/init.d/xencommons.in | 8 ++--- tools/hotplug/Linux/xen-hotplug-common.sh.in | 4 +-- tools/hotplug/Linux/xendomains.in| 2 +- tools/hotplug/NetBSD/block | 2 +- tools/hotplug/NetBSD/rc.d/xen-watchdog | 6 ++-- tools/hotplug/NetBSD/rc.d/xencommons.in | 16 +- tools/hotplug/NetBSD/rc.d/xendomains | 4 +-- tools/hotplug/NetBSD/vif-bridge | 2 +- tools/hotplug/NetBSD/vif-ip | 2 +- tools/include/Makefile | 40 tools/libfsimage/Rules.mk| 2 +- tools/libfsimage/common/Makefile | 16 +- tools/libvchan/Makefile | 14 - tools/libxc/Makefile | 24 +++ tools/libxl/Makefile | 26 tools/misc/Makefile | 8 ++--- tools/ocaml/xenstored/Makefile | 4 +-- tools/pygrub/Makefile| 6 ++-- tools/tests/mce-test/tools/Makefile | 2 +- tools/xenbackendd/Makefile | 4 +-- tools/xenmon/Makefile| 12 tools/xenpmd/Makefile| 4 +-- tools/xenstat/libxenstat/Makefile| 10 +++--- tools/xenstat/xentop/Makefile| 4 +-- tools/xenstore/Makefile | 44 +- tools/xentrace/Makefile | 6 ++-- 43 files changed, 201 insertions(+), 209 deletions(-) ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH 4/7] tools: replace private INCLUDEDIR with automake includedir
The result of this command: git grep -wnl INCLUDEDIR | xargs sed -i 's@INCLUDEDIR@includedir@g' Signed-off-by: Olaf Hering o...@aepfle.de Cc: Ian Jackson ian.jack...@eu.citrix.com Cc: Stefano Stabellini stefano.stabell...@eu.citrix.com Cc: Ian Campbell ian.campb...@citrix.com Cc: Wei Liu wei.l...@citrix.com --- config/Paths.mk.in| 1 - tools/Makefile| 14 +++--- tools/blktap2/include/Makefile| 2 +- tools/include/Makefile| 40 +++ tools/libfsimage/common/Makefile | 8 tools/libvchan/Makefile | 4 ++-- tools/libxc/Makefile | 6 +++--- tools/libxl/Makefile | 4 ++-- tools/xenstat/libxenstat/Makefile | 2 +- tools/xenstore/Makefile | 16 10 files changed, 48 insertions(+), 49 deletions(-) diff --git a/config/Paths.mk.in b/config/Paths.mk.in index bf9bd51..32f7c78 100644 --- a/config/Paths.mk.in +++ b/config/Paths.mk.in @@ -34,7 +34,6 @@ LIBEXEC_BIN := @LIBEXEC_BIN@ LIBEXEC_LIB := $(LIBEXEC)/lib LIBEXEC_INC := $(LIBEXEC)/include -INCLUDEDIR := $(includedir) SHAREDIR := @SHAREDIR@ MANDIR := $(mandir) MAN1DIR := $(MANDIR)/man1 diff --git a/tools/Makefile b/tools/Makefile index b05d12a..1e0c0a7 100644 --- a/tools/Makefile +++ b/tools/Makefile @@ -82,13 +82,13 @@ uninstall: rm -rf $(D)$(bindir)/xenstore* $(D)$(bindir)/xentrace* rm -rf $(D)$(bindir)/xen-detect $(D)$(bindir)/xencons rm -rf $(D)$(bindir)/xenpvnetboot $(D)$(bindir)/qemu-*-xen - rm -rf $(D)$(INCLUDEDIR)/xenctrl* $(D)$(INCLUDEDIR)/xenguest.h - rm -rf $(D)$(INCLUDEDIR)/xs_lib.h $(D)$(INCLUDEDIR)/xs.h - rm -rf $(D)$(INCLUDEDIR)/xenstore-compat/xs_lib.h $(D)$(INCLUDEDIR)/xenstore-compat/xs.h - rm -rf $(D)$(INCLUDEDIR)/xenstore_lib.h $(D)$(INCLUDEDIR)/xenstore.h - rm -rf $(D)$(INCLUDEDIR)/xen - rm -rf $(D)$(INCLUDEDIR)/_libxl* $(D)$(INCLUDEDIR)/libxl* - rm -rf $(D)$(INCLUDEDIR)/xenstat.h $(D)$(INCLUDEDIR)/xentoollog.h + rm -rf $(D)$(includedir)/xenctrl* $(D)$(includedir)/xenguest.h + rm -rf $(D)$(includedir)/xs_lib.h $(D)$(includedir)/xs.h + rm -rf $(D)$(includedir)/xenstore-compat/xs_lib.h $(D)$(includedir)/xenstore-compat/xs.h + rm -rf $(D)$(includedir)/xenstore_lib.h $(D)$(includedir)/xenstore.h + rm -rf $(D)$(includedir)/xen + rm -rf $(D)$(includedir)/_libxl* $(D)$(includedir)/libxl* + rm -rf $(D)$(includedir)/xenstat.h $(D)$(includedir)/xentoollog.h rm -rf $(D)$(LIBDIR)/libxenctrl* $(D)$(LIBDIR)/libxenguest* rm -rf $(D)$(LIBDIR)/libxenstore* $(D)$(LIBDIR)/libxlutil* rm -rf $(D)$(LIBDIR)/python/xen $(D)$(LIBDIR)/python/grub diff --git a/tools/blktap2/include/Makefile b/tools/blktap2/include/Makefile index c238486..66e8a1e 100644 --- a/tools/blktap2/include/Makefile +++ b/tools/blktap2/include/Makefile @@ -6,7 +6,7 @@ all: .PHONY: install install: - $(INSTALL_DIR) -p $(DESTDIR)$(INCLUDEDIR) + $(INSTALL_DIR) -p $(DESTDIR)$(includedir) .PHONY: clean diff --git a/tools/include/Makefile b/tools/include/Makefile index 601f79c..3cd157a 100644 --- a/tools/include/Makefile +++ b/tools/include/Makefile @@ -21,29 +21,29 @@ xen/.dir: .PHONY: install install: all - $(INSTALL_DIR) $(DESTDIR)$(INCLUDEDIR)/xen/arch-x86 - $(INSTALL_DIR) $(DESTDIR)$(INCLUDEDIR)/xen/arch-x86/hvm - $(INSTALL_DIR) $(DESTDIR)$(INCLUDEDIR)/xen/arch-arm - $(INSTALL_DIR) $(DESTDIR)$(INCLUDEDIR)/xen/arch-arm/hvm - $(INSTALL_DIR) $(DESTDIR)$(INCLUDEDIR)/xen/foreign - $(INSTALL_DIR) $(DESTDIR)$(INCLUDEDIR)/xen/hvm - $(INSTALL_DIR) $(DESTDIR)$(INCLUDEDIR)/xen/io - $(INSTALL_DIR) $(DESTDIR)$(INCLUDEDIR)/xen/sys - $(INSTALL_DIR) $(DESTDIR)$(INCLUDEDIR)/xen/xsm - $(INSTALL_DATA) xen/COPYING $(DESTDIR)$(INCLUDEDIR)/xen - $(INSTALL_DATA) xen/*.h $(DESTDIR)$(INCLUDEDIR)/xen - $(INSTALL_DATA) xen/arch-x86/*.h $(DESTDIR)$(INCLUDEDIR)/xen/arch-x86 - $(INSTALL_DATA) xen/arch-x86/hvm/*.h $(DESTDIR)$(INCLUDEDIR)/xen/arch-x86/hvm + $(INSTALL_DIR) $(DESTDIR)$(includedir)/xen/arch-x86 + $(INSTALL_DIR) $(DESTDIR)$(includedir)/xen/arch-x86/hvm + $(INSTALL_DIR) $(DESTDIR)$(includedir)/xen/arch-arm + $(INSTALL_DIR) $(DESTDIR)$(includedir)/xen/arch-arm/hvm + $(INSTALL_DIR) $(DESTDIR)$(includedir)/xen/foreign + $(INSTALL_DIR) $(DESTDIR)$(includedir)/xen/hvm + $(INSTALL_DIR) $(DESTDIR)$(includedir)/xen/io + $(INSTALL_DIR) $(DESTDIR)$(includedir)/xen/sys + $(INSTALL_DIR) $(DESTDIR)$(includedir)/xen/xsm + $(INSTALL_DATA) xen/COPYING $(DESTDIR)$(includedir)/xen + $(INSTALL_DATA) xen/*.h $(DESTDIR)$(includedir)/xen + $(INSTALL_DATA) xen/arch-x86/*.h $(DESTDIR)$(includedir)/xen/arch-x86 + $(INSTALL_DATA) xen/arch-x86/hvm/*.h
[Xen-devel] [PATCH 2/7] tools: replace private BINDIR with automake bindir
The result of this command: git grep -wnl BINDIR | xargs sed -i 's@BINDIR@bindir@g' Signed-off-by: Olaf Hering o...@aepfle.de Cc: Ian Jackson ian.jack...@eu.citrix.com Cc: Stefano Stabellini stefano.stabell...@eu.citrix.com Cc: Ian Campbell ian.campb...@citrix.com Cc: Wei Liu wei.l...@citrix.com Cc: George Dunlap george.dun...@eu.citrix.com --- Config.mk| 2 +- config/Paths.mk.in | 1 - tools/Makefile | 16 tools/hotplug/FreeBSD/rc.d/xencommons.in | 2 +- tools/hotplug/FreeBSD/vif-bridge | 2 +- tools/hotplug/Linux/init.d/xencommons.in | 6 +++--- tools/hotplug/Linux/xen-hotplug-common.sh.in | 2 +- tools/hotplug/NetBSD/block | 2 +- tools/hotplug/NetBSD/rc.d/xencommons.in | 2 +- tools/hotplug/NetBSD/vif-bridge | 2 +- tools/hotplug/NetBSD/vif-ip | 2 +- tools/misc/Makefile | 4 ++-- tools/pygrub/Makefile| 6 +++--- tools/xenstore/Makefile | 14 +++--- tools/xentrace/Makefile | 6 +++--- 15 files changed, 34 insertions(+), 35 deletions(-) diff --git a/Config.mk b/Config.mk index abb1e44..1312a80 100644 --- a/Config.mk +++ b/Config.mk @@ -157,7 +157,7 @@ define move-if-changed if ! cmp -s $(1) $(2); then mv -f $(1) $(2); else rm -f $(1); fi endef -BUILD_MAKE_VARS := sbindir BINDIR LIBEXEC LIBEXEC_BIN LIBDIR SHAREDIR \ +BUILD_MAKE_VARS := sbindir bindir LIBEXEC LIBEXEC_BIN LIBDIR SHAREDIR \ XENFIRMWAREDIR XEN_CONFIG_DIR XEN_SCRIPT_DIR XEN_LOCK_DIR \ XEN_RUN_DIR XEN_PAGING_DIR diff --git a/config/Paths.mk.in b/config/Paths.mk.in index 26674b1..1653f26 100644 --- a/config/Paths.mk.in +++ b/config/Paths.mk.in @@ -31,7 +31,6 @@ sysconfdir := @sysconfdir@ PREFIX := $(prefix) -BINDIR := $(bindir) LIBEXEC := $(libexecdir)/$(PACKAGE_TARNAME) LIBEXEC_BIN := @LIBEXEC_BIN@ LIBEXEC_LIB := $(LIBEXEC)/lib diff --git a/tools/Makefile b/tools/Makefile index 77625cb..b05d12a 100644 --- a/tools/Makefile +++ b/tools/Makefile @@ -74,14 +74,14 @@ uninstall: rm -f $(D)$(sbindir)/xendomains rm -f $(D)$(SYSCONFIG_DIR)/xencommons rm -rf $(D)/var/lib/xen* - rm -rf $(D)$(BINDIR)/cpuperf-perfcntr $(D)$(BINDIR)/cpuperf-xen - rm -rf $(D)$(BINDIR)/xc_shadow - rm -rf $(D)$(BINDIR)/pygrub - rm -rf $(D)$(BINDIR)/setsize $(D)$(BINDIR)/tbctl - rm -rf $(D)$(BINDIR)/xsls - rm -rf $(D)$(BINDIR)/xenstore* $(D)$(BINDIR)/xentrace* - rm -rf $(D)$(BINDIR)/xen-detect $(D)$(BINDIR)/xencons - rm -rf $(D)$(BINDIR)/xenpvnetboot $(D)$(BINDIR)/qemu-*-xen + rm -rf $(D)$(bindir)/cpuperf-perfcntr $(D)$(bindir)/cpuperf-xen + rm -rf $(D)$(bindir)/xc_shadow + rm -rf $(D)$(bindir)/pygrub + rm -rf $(D)$(bindir)/setsize $(D)$(bindir)/tbctl + rm -rf $(D)$(bindir)/xsls + rm -rf $(D)$(bindir)/xenstore* $(D)$(bindir)/xentrace* + rm -rf $(D)$(bindir)/xen-detect $(D)$(bindir)/xencons + rm -rf $(D)$(bindir)/xenpvnetboot $(D)$(bindir)/qemu-*-xen rm -rf $(D)$(INCLUDEDIR)/xenctrl* $(D)$(INCLUDEDIR)/xenguest.h rm -rf $(D)$(INCLUDEDIR)/xs_lib.h $(D)$(INCLUDEDIR)/xs.h rm -rf $(D)$(INCLUDEDIR)/xenstore-compat/xs_lib.h $(D)$(INCLUDEDIR)/xenstore-compat/xs.h diff --git a/tools/hotplug/FreeBSD/rc.d/xencommons.in b/tools/hotplug/FreeBSD/rc.d/xencommons.in index 3e204d6..0ef9af9 100644 --- a/tools/hotplug/FreeBSD/rc.d/xencommons.in +++ b/tools/hotplug/FreeBSD/rc.d/xencommons.in @@ -46,7 +46,7 @@ xen_startcmd() XENSTORED_ARGS=${XENSTORED_ARGS} -T /var/log/xen/xenstored-trace.log fi ${sbindir}/xenstored ${XENSTORED_ARGS} - while [ $time -lt $timeout ] ! `${BINDIR}/xenstore-read -s / /dev/null 21` ; do + while [ $time -lt $timeout ] ! `${bindir}/xenstore-read -s / /dev/null 21` ; do printf . time=$(($time+1)) sleep 1 diff --git a/tools/hotplug/FreeBSD/vif-bridge b/tools/hotplug/FreeBSD/vif-bridge index 3319710..428c653 100644 --- a/tools/hotplug/FreeBSD/vif-bridge +++ b/tools/hotplug/FreeBSD/vif-bridge @@ -13,7 +13,7 @@ DIR=$(dirname $0) . ${DIR}/hotplugpath.sh -PATH=${BINDIR}:${sbindir}:${LIBEXEC_BIN}:/bin:/usr/bin:/sbin:/usr/sbin +PATH=${bindir}:${sbindir}:${LIBEXEC_BIN}:/bin:/usr/bin:/sbin:/usr/sbin export PATH path=$1 diff --git a/tools/hotplug/Linux/init.d/xencommons.in b/tools/hotplug/Linux/init.d/xencommons.in index 82b3fcb..00d0f55 100644 --- a/tools/hotplug/Linux/init.d/xencommons.in +++ b/tools/hotplug/Linux/init.d/xencommons.in @@ -62,7 +62,7 @@ do_start () { mkdir -p ${XEN_RUN_DIR} mkdir -p
[Xen-devel] [PATCH 5/7] tools: replace private MANDIR with automake mandir
The result of this command: git grep -wnl MANDIR | xargs sed -i 's@MANDIR@mandir@g' Signed-off-by: Olaf Hering o...@aepfle.de Cc: Ian Jackson ian.jack...@eu.citrix.com Cc: Stefano Stabellini stefano.stabell...@eu.citrix.com Cc: Ian Campbell ian.campb...@citrix.com Cc: Wei Liu wei.l...@citrix.com --- config/Paths.mk.in | 5 ++--- docs/Makefile | 6 +++--- 2 files changed, 5 insertions(+), 6 deletions(-) diff --git a/config/Paths.mk.in b/config/Paths.mk.in index 32f7c78..8ca02bb 100644 --- a/config/Paths.mk.in +++ b/config/Paths.mk.in @@ -35,9 +35,8 @@ LIBEXEC_LIB := $(LIBEXEC)/lib LIBEXEC_INC := $(LIBEXEC)/include SHAREDIR := @SHAREDIR@ -MANDIR := $(mandir) -MAN1DIR := $(MANDIR)/man1 -MAN8DIR := $(MANDIR)/man8 +MAN1DIR := $(mandir)/man1 +MAN8DIR := $(mandir)/man8 LIBDIR := $(libdir) DOCDIR := $(docdir) diff --git a/docs/Makefile b/docs/Makefile index 9183252..94965c7 100644 --- a/docs/Makefile +++ b/docs/Makefile @@ -79,9 +79,9 @@ distclean: clean # Top level install targets .PHONY: install-man-pages install-man-pages: man-pages - $(INSTALL_DIR) $(DESTDIR)$(MANDIR) - cp -R man1 $(DESTDIR)$(MANDIR) - cp -R man5 $(DESTDIR)$(MANDIR) + $(INSTALL_DIR) $(DESTDIR)$(mandir) + cp -R man1 $(DESTDIR)$(mandir) + cp -R man5 $(DESTDIR)$(mandir) .PHONY: install-html install-html: html txt figs ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 1/2] x86/hvm: actually release ioreq server pages
On 23.04.15 at 17:46, paul.durr...@citrix.com wrote: --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -496,7 +496,7 @@ static void hvm_free_ioreq_gmfn(struct domain *d, unsigned long gmfn) { unsigned int i = gmfn - d-arch.hvm_domain.ioreq_gmfn.base; -clear_bit(i, d-arch.hvm_domain.ioreq_gmfn.mask); +set_bit(i, d-arch.hvm_domain.ioreq_gmfn.mask); } Double checking this I came to wonder whether HVM_PARAM_NR_IOREQ_SERVER_PAGES handling isn't broken when invoked a second time with a.value smaller than what was used the first time, or when any server pages are already in use. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [v3 06/10] vmx: handle PML buffer full VMEXIT
We need to flush PML buffer when it's full. Signed-off-by: Kai Huang kai.hu...@linux.intel.com --- xen/arch/x86/hvm/vmx/vmx.c | 4 1 file changed, 4 insertions(+) diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c index 6c4f78c..f11ac46 100644 --- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -3178,6 +3178,10 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs) vmx_handle_apic_write(); break; +case EXIT_REASON_PML_FULL: +vmx_vcpu_flush_pml_buffer(v); +break; + case EXIT_REASON_ACCESS_GDTR_OR_IDTR: case EXIT_REASON_ACCESS_LDTR_OR_TR: case EXIT_REASON_VMX_PREEMPTION_TIMER_EXPIRED: -- 2.1.0 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [v3 10/10] p2m/ept: enable PML in p2m-ept for log-dirty
This patch firstly enables EPT A/D bits if PML is used, as PML depends on EPT A/D bits to work. A bit is set for all present p2m types in middle and leaf EPT entries, and D bit is set for all writable types in EPT leaf entry, except for log-dirty type with PML. With PML, for 4K pages, instead of setting EPT entry to read-only, we just need to clear D bit in order to log that GFN. For superpages, we still need to set it to read-only as we need to split superpage to 4K pages in EPT violation. Signed-off-by: Kai Huang kai.hu...@linux.intel.com --- xen/arch/x86/mm/p2m-ept.c | 79 ++ xen/include/asm-x86/hvm/vmx/vmcs.h | 3 +- xen/include/asm-x86/hvm/vmx/vmx.h | 3 +- 3 files changed, 76 insertions(+), 9 deletions(-) diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c index 5e95a83..a1b9eaf 100644 --- a/xen/arch/x86/mm/p2m-ept.c +++ b/xen/arch/x86/mm/p2m-ept.c @@ -102,9 +102,20 @@ static int atomic_write_ept_entry(ept_entry_t *entryptr, ept_entry_t new, return rc; } -static void ept_p2m_type_to_flags(ept_entry_t *entry, p2m_type_t type, p2m_access_t access) +static void ept_p2m_type_to_flags(struct p2m_domain *p2m, ept_entry_t *entry, + p2m_type_t type, p2m_access_t access) { -/* First apply type permissions */ +/* + * First apply type permissions. + * + * A/D bits are also manually set to avoid overhead of MMU having to set + * them later. Both A/D bits are safe to be updated directly as they are + * ignored by processor if EPT A/D bits is not turned on. + * + * A bit is set for all present p2m types in middle and leaf EPT entries. + * D bit is set for all writable types in EPT leaf entry, except for + * log-dirty type with PML. + */ switch(type) { case p2m_invalid: @@ -118,27 +129,51 @@ static void ept_p2m_type_to_flags(ept_entry_t *entry, p2m_type_t type, p2m_acces break; case p2m_ram_rw: entry-r = entry-w = entry-x = 1; +entry-a = entry-d = 1; break; case p2m_mmio_direct: entry-r = entry-x = 1; entry-w = !rangeset_contains_singleton(mmio_ro_ranges, entry-mfn); +entry-a = 1; +entry-d = entry-w; break; case p2m_ram_logdirty: +entry-r = entry-x = 1; +/* + * In case of PML, we don't have to write protect 4K page, but + * only need to clear D-bit for it, but we still need to write + * protect super page in order to split it to 4K pages in EPT + * violation. + */ +if ( vmx_domain_pml_enabled(p2m-domain) + !is_epte_superpage(entry) ) +entry-w = 1; +else +entry-w = 0; +entry-a = 1; +/* For both PML or non-PML cases we clear D bit anyway */ +entry-d = 0; +break; case p2m_ram_ro: case p2m_ram_shared: entry-r = entry-x = 1; entry-w = 0; +entry-a = 1; +entry-d = 0; break; case p2m_grant_map_rw: case p2m_map_foreign: entry-r = entry-w = 1; entry-x = 0; +entry-a = entry-d = 1; break; case p2m_grant_map_ro: case p2m_mmio_write_dm: entry-r = 1; entry-w = entry-x = 0; +entry-a = 1; +entry-d = 0; break; } @@ -194,6 +229,8 @@ static int ept_set_middle_entry(struct p2m_domain *p2m, ept_entry_t *ept_entry) ept_entry-access = p2m-default_access; ept_entry-r = ept_entry-w = ept_entry-x = 1; +/* Manually set A bit to avoid overhead of MMU having to write it later. */ +ept_entry-a = 1; return 1; } @@ -244,10 +281,9 @@ static int ept_split_super_page(struct p2m_domain *p2m, ept_entry_t *ept_entry, epte-sp = (level 1); epte-mfn += i * trunk; epte-snp = (iommu_enabled iommu_snoop); -ASSERT(!epte-rsvd1); ASSERT(!epte-avail3); -ept_p2m_type_to_flags(epte, epte-sa_p2mt, epte-access); +ept_p2m_type_to_flags(p2m, epte, epte-sa_p2mt, epte-access); if ( (level - 1) == target ) continue; @@ -489,7 +525,7 @@ static int resolve_misconfig(struct p2m_domain *p2m, unsigned long gfn) { e.sa_p2mt = p2m_is_logdirty_range(p2m, gfn + i, gfn + i) ? p2m_ram_logdirty : p2m_ram_rw; - ept_p2m_type_to_flags(e, e.sa_p2mt, e.access); + ept_p2m_type_to_flags(p2m, e, e.sa_p2mt, e.access); } e.recalc = 0; wrc = atomic_write_ept_entry(epte[i], e, level);
Re: [Xen-devel] [PATCH 0/9] Porting the intel_pstate driver to Xen
On 24.04.15 at 10:32, wei.w.w...@intel.com wrote: On 23/04/2015 15:27, Jan Beulich wrote: On 24.04.15 at 07:12, wei.w.w...@intel.com wrote: On 23/04/2015 22:09, Jan Beulich wrote: On 23.04.16 at 15:31, wei.w.w...@intel.com wrote: The intel_pstate.c file under xen/arch/x86/acpi/cpufreq/ contains all the logic for selecting the current P-state. It follows its implementation in the kernel. Instead of using the traditional cpufreq governors, intel_pstate implements its internal governor in the setpolicy(). And this internal governor behaves how? Like ondemand, powersave, peerformance, or yet something else? And how would its behavior be changed? In the kenel intel_pstate implementation, they have two internal governors: Powersave and Performance. Powersave is similar to the old (cpufreq) ondemand governor. A timer function is periodically invoked to sample the CPU busy info (e.g. will get increased due to the running of a CPU-intensive workload). However, the final calculated target value is clamped into the [min_pct, max_pct] limit interval. The Performance governor is actually a special case of Powersave, when the min_pct= max_pct=100%. This is the same as the old performance governor. So a true powersave one would then be accomplished by setting min_pct = max_pct = some value smaller than 100%. Is there a limit on the valid percentages to be specified here? In the old driver, a powersave governor just sets the CPU to run with the lowest possible performance state. This one does not exist in the intel_pstate driver. The intel_pstate driver changes the terminology by using powersave to refer to the previous ondemand case. This does make people feel confused. But we may think it this way: it only has two modes, the max performance mode and the ondemand mode. ondemand is the one who saves power (actually in a more reasonable way compared to the previous powersave which simply sets the CPU to run with the lowest performance state). Anyway, we can surely change the name if it sounds uncomfortable. I think at the very least from a user interface perspective (e.g. the xenpm tool) the meaning of the old governor names should be retained as much as possible. The valid pct value range is 0 to 100. So what does 0% mean then? I.e. (wrt powersave) what does min_pct = max_pct = 0 result in? Also, you calling powersave what supposedly is ondemand makes me nervous about it not immediately raising the CPU freq when load increases, yet imo that's a fundamental requirement for server kind loads where you don't want to run in performance mode. Can you clarify the behavior here? The timer fires very 10ms to update the CPU P-state according to the sampled workload info. But that doesn't tell what the action is that the timer initiates. I.e. under what conditions it would effect a frequency change. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] Xen on CentoOS7
Hello, I'm a student in computer science, during my internship i have to go and explore multipathing solution for several hypervisor. I'm looking into MPTCP (multi-path TCP) this is a extension of TCP protocols that allows to aggregate the bandwith of several connection. I'd like tu use MPTCP to abble to have a nice bandwith between the hypervisor and the NAS. That would be a nice software solution to avoid buying expensive 10 Gig or 100 Gig NICs. In order to make some tests i'd like to install MPTCP on Xenserver. There is a .rpm for CentOS 7 but Xenserver is on CentOS 5 and some dependancies are missing. I'm not confident enought to build my own supplemental pack for xenserver 6.5. I'd like to know if there is a easy way to add MPTCP to Xenserver ? Is there a CentOS 7 version that i can get ? Thanks for your help. Ben, PS : If you feel to help me a bit more here is a link to my support topic where i explain the problem i have when i try to boot on my recompiled kernel http://discussions.citrix.com/topic/364033-change-dom0-kernel/#entry1870428 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH] libxl: remove redundant assignment of spawn-xspath
Reported-by: Olaf Hering o...@aepfle.de Signed-off-by: Wei Liu wei.l...@citrix.com Cc: Ian Campbell ian.campb...@citrix.com Cc: Ian Jackson ian.jack...@eu.citrix.com --- tools/libxl/libxl_dm.c | 1 - 1 file changed, 1 deletion(-) diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c index 30c1578..a1011e3 100644 --- a/tools/libxl/libxl_dm.c +++ b/tools/libxl/libxl_dm.c @@ -1489,7 +1489,6 @@ retry_transaction: LIBXL__LOG(CTX, XTL_DEBUG, %s, *arg); spawn-what = GCSPRINTF(domain %d device model, domid); -spawn-xspath = GCSPRINTF(/local/domain/0/device-model/%d/state, domid); spawn-xspath = libxl__device_model_xs_path(gc, LIBXL_TOOLSTACK_DOMID, domid, /state); spawn-timeout_ms = LIBXL_DEVICE_MODEL_START_TIMEOUT * 1000; -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] libxl, flexarray_append_pair usage
On Fri, Apr 24, Olaf Hering wrote: flexarray_append(back, state); -flexarray_append(back, GCSPRINTF(%d, 1)); +flexarray_append(back, GCSPRINTF(%d, XenbusStateInitialising)); Should all such code be converted to flexarray_append_pair? To reduce line length a short macro should be added, like FLXAP or FLXPAIR or something else. Olaf ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] libxl: convert strings and ints to xenbus_state
On Fri, Apr 24, 2015 at 09:07:14AM +, Olaf Hering wrote: Convert all plain ints and strings which are used for xenbus state files to xenbus_state. This makes it easier to find code which deals with backend/frontend state changes. Convert usage of libxl__sprintf to GCSPRINTF. No change in behaviour is expected by this change, beside a small increase of runtime memory usage in places that used a string constant. Signed-off-by: Olaf Hering o...@aepfle.de Cc: Ian Jackson ian.jack...@eu.citrix.com Cc: Stefano Stabellini stefano.stabell...@eu.citrix.com Cc: Ian Campbell ian.campb...@citrix.com Cc: Wei Liu wei.l...@citrix.com Acked-by: Wei Liu wei.l...@citrix.com ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v2] xen: Add __GFP_DMA flag when xen_swiotlb_init gets free pages on ARM
Make sure that xen_swiotlb_init allocates buffers that are DMA capable when at least one memblock is available below 4G. Otherwise we assume that all devices on the SoC can cope with 4G addresses. We do this on ARM and ARM64, where dom0 is mapped 1:1, so pfn == mfn in this case. No functional changes on x86. From: Chen Baozi baoz...@gmail.com Signed-off-by: Chen Baozi baoz...@gmail.com Signed-off-by: Stefano Stabellini stefano.stabell...@eu.citrix.com Tested-by: Chen Baozi baoz...@gmail.com diff --git a/arch/arm/include/asm/xen/page.h b/arch/arm/include/asm/xen/page.h index 2f7e6ff..0b579b2 100644 --- a/arch/arm/include/asm/xen/page.h +++ b/arch/arm/include/asm/xen/page.h @@ -110,5 +110,6 @@ static inline bool set_phys_to_machine(unsigned long pfn, unsigned long mfn) bool xen_arch_need_swiotlb(struct device *dev, unsigned long pfn, unsigned long mfn); +unsigned long xen_get_swiotlb_free_pages(unsigned int order); #endif /* _ASM_ARM_XEN_PAGE_H */ diff --git a/arch/arm/xen/mm.c b/arch/arm/xen/mm.c index 793551d..4983250 100644 --- a/arch/arm/xen/mm.c +++ b/arch/arm/xen/mm.c @@ -4,6 +4,7 @@ #include linux/gfp.h #include linux/highmem.h #include linux/export.h +#include linux/memblock.h #include linux/of_address.h #include linux/slab.h #include linux/types.h @@ -21,6 +22,20 @@ #include asm/xen/hypercall.h #include asm/xen/interface.h +unsigned long xen_get_swiotlb_free_pages(unsigned int order) +{ + struct memblock_region *reg; + gfp_t flags = __GFP_NOWARN; + + for_each_memblock(memory, reg) { + if (reg-base (phys_addr_t)0x) { + flags |= __GFP_DMA; + break; + } + } + return __get_free_pages(flags, order); +} + enum dma_cache_op { DMA_UNMAP, DMA_MAP, diff --git a/arch/x86/include/asm/xen/page.h b/arch/x86/include/asm/xen/page.h index 358dcd3..c44a5d5 100644 --- a/arch/x86/include/asm/xen/page.h +++ b/arch/x86/include/asm/xen/page.h @@ -269,4 +269,9 @@ static inline bool xen_arch_need_swiotlb(struct device *dev, return false; } +static inline unsigned long xen_get_swiotlb_free_pages(unsigned int order) +{ + return __get_free_pages(__GFP_NOWARN, order); +} + #endif /* _ASM_X86_XEN_PAGE_H */ diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c index 810ad41..4c54932 100644 --- a/drivers/xen/swiotlb-xen.c +++ b/drivers/xen/swiotlb-xen.c @@ -235,7 +235,7 @@ retry: #define SLABS_PER_PAGE (1 (PAGE_SHIFT - IO_TLB_SHIFT)) #define IO_TLB_MIN_SLABS ((120) IO_TLB_SHIFT) while ((SLABS_PER_PAGE order) IO_TLB_MIN_SLABS) { - xen_io_tlb_start = (void *)__get_free_pages(__GFP_NOWARN, order); + xen_io_tlb_start = (void *)xen_get_swiotlb_free_pages(order); if (xen_io_tlb_start) break; order--; ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v3 01/10] vmx: add new boot parameter to control PML enabling
On 24/04/15 09:19, Kai Huang wrote: A top level EPT parameter ept=options and a sub boolean opt_pml_enabled are added to control PML. Other booleans can be further added for any other EPT related features. The document description for the new parameter is also added. Signed-off-by: Kai Huang kai.hu...@linux.intel.com Reviewed-by: Andrew Cooper andrew.coop...@citrix.com --- docs/misc/xen-command-line.markdown | 15 +++ xen/arch/x86/hvm/vmx/vmcs.c | 30 ++ 2 files changed, 45 insertions(+) diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown index 1dda1f0..4889e27 100644 --- a/docs/misc/xen-command-line.markdown +++ b/docs/misc/xen-command-line.markdown @@ -685,6 +685,21 @@ requirement can be relaxed. This option is particularly useful for nested virtualization, to allow the L1 hypervisor to use EPT even if the L0 hypervisor does not provide VM\_ENTRY\_LOAD\_GUEST\_PAT. +### ept (Intel) + `= List of ( pmlboolean )` + + Default: `false` + +Controls EPT related features. Currently only Page Modification Logging (PML) is +the controllable feature as boolean type. + +PML is a new hardware feature in Intel's Broadwell Server and further platforms +which reduces hypervisor overhead of log-dirty mechanism by automatically +recording GPAs (guest physical addresses) when guest memory gets dirty, and +therefore significantly reducing number of EPT violation caused by write +protection of guest memory, which is a necessity to implement log-dirty +mechanism before PML. + ### gdb `= baud[/clock_hz][,DPS[,io-base[,irq[,port-bdf[,bridge-bdf | pci | amt ] ` diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c index 63007a9..79efa42 100644 --- a/xen/arch/x86/hvm/vmx/vmcs.c +++ b/xen/arch/x86/hvm/vmx/vmcs.c @@ -64,6 +64,36 @@ integer_param(ple_gap, ple_gap); static unsigned int __read_mostly ple_window = 4096; integer_param(ple_window, ple_window); +static bool_t __read_mostly opt_pml_enabled = 0; + +/* + * The 'ept' parameter controls functionalities that depend on, or impact the + * EPT mechanism. Optional comma separated value may contain: + * + * pml Enable PML + */ +static void __init parse_ept_param(char *s) +{ +char *ss; + +do { +bool_t val = !!strncmp(s, no-, 3); +if ( !val ) +s += 3; + +ss = strchr(s, ','); +if ( ss ) +*ss = '\0'; + +if ( !strcmp(s, pml) ) +opt_pml_enabled = val; + +s = ss + 1; +} while ( ss ); +} + +custom_param(ept, parse_ept_param); + /* Dynamic (run-time adjusted) execution control flags. */ u32 vmx_pin_based_exec_control __read_mostly; u32 vmx_cpu_based_exec_control __read_mostly; ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 5/7] tools: replace private MANDIR with automake mandir
On 24/04/15 11:25, Olaf Hering wrote: The result of this command: git grep -wnl MANDIR | xargs sed -i 's@MANDIR@mandir@g' Signed-off-by: Olaf Hering o...@aepfle.de Cc: Ian Jackson ian.jack...@eu.citrix.com Cc: Stefano Stabellini stefano.stabell...@eu.citrix.com Cc: Ian Campbell ian.campb...@citrix.com Cc: Wei Liu wei.l...@citrix.com In some copious free time during a docs day, we should see about moving the xenstat/xentrace manpages into docs/ (and writing them in something slightly nicer than raw troff), which would allow all $(mandir) information to disappear from Paths.mk ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH V2] xen: arm: X-Gene Storm check GIC DIST address for EOI quirk
In old X-Gene Storm firmware and DT, secure mode addresses have been mentioned in GICv2 node. In this case maintenance interrupt is used instead of EOI HW method. This patch checks the GIC Distributor Base Address to enable EOI quirk for old firmware. Ref: http://lists.xen.org/archives/html/xen-devel/2014-07/msg01263.html ChangeLog: V2: - Fine tune interrupt controller node search as per comments on V1 patch - Incorporating other misc comments on V1. V1: - Initial patch. Tested-by: Christoffer Dall christoffer.d...@linaro.org Signed-off-by: Pranavkumar Sawargaonkar pranavku...@linaro.org --- xen/arch/arm/platforms/xgene-storm.c | 42 +- 1 file changed, 41 insertions(+), 1 deletion(-) diff --git a/xen/arch/arm/platforms/xgene-storm.c b/xen/arch/arm/platforms/xgene-storm.c index 1812e5b..c9a6dfc 100644 --- a/xen/arch/arm/platforms/xgene-storm.c +++ b/xen/arch/arm/platforms/xgene-storm.c @@ -22,6 +22,7 @@ #include asm/platform.h #include xen/stdbool.h #include xen/vmap.h +#include xen/device_tree.h #include asm/io.h #include asm/gic.h @@ -35,9 +36,46 @@ static u64 reset_addr, reset_size; static u32 reset_mask; static bool reset_vals_valid = false; +#define XGENE_SEC_GICV2_DIST_ADDR0x7801 +static u32 __read_mostly xgene_quirks = PLATFORM_QUIRK_GIC_64K_STRIDE; + +static void __init xgene_check_pirq_eoi(void) +{ +struct dt_device_node *node; +int res; +paddr_t dbase; +static const struct dt_device_match xgene_dt_int_ctrl_match[] = +{ +DT_MATCH_COMPATIBLE(arm,cortex-a15-gic), +{ /*sentinel*/ }, +}; + +node = dt_find_interrupt_controller(xgene_dt_int_ctrl_match); +if ( !node ) +panic(%s: Can not find interrupt controller node\n, __func__); + +res = dt_device_get_address(node, 0, dbase, NULL); +if ( !dbase ) +panic(%s: Cannot find a valid address for the +distributor, __func__); + +/* + * In old X-Gene Storm firmware and DT, secure mode addresses have + * been mentioned in GICv2 node. We have to use maintenance interrupt + * instead of EOI HW in this case. We check the GIC Distributor Base + * Address to maintain compatibility with older firmware. + */ +if ( dbase == XGENE_SEC_GICV2_DIST_ADDR ) +{ +xgene_quirks |= PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI; +printk(Xen: warning: Using OLD X-Gene Firmware, +disabling PIRQ EOI mode ...\n); +} +} + static uint32_t xgene_storm_quirks(void) { -return PLATFORM_QUIRK_GIC_64K_STRIDE|PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI; +return xgene_quirks; } static int map_one_mmio(struct domain *d, const char *what, @@ -216,6 +254,8 @@ static int xgene_storm_init(void) reset_mask = XGENE_RESET_MASK; reset_vals_valid = true; +xgene_check_pirq_eoi(); + return 0; } -- 1.7.9.5 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 0/9] Porting the intel_pstate driver to Xen
On 23/04/2015 15:27, Jan Beulich wrote: On 24.04.15 at 07:12, wei.w.w...@intel.com wrote: On 23/04/2015 22:09, Jan Beulich wrote: On 23.04.16 at 15:31, wei.w.w...@intel.com wrote: The intel_pstate.c file under xen/arch/x86/acpi/cpufreq/ contains all the logic for selecting the current P-state. It follows its implementation in the kernel. Instead of using the traditional cpufreq governors, intel_pstate implements its internal governor in the setpolicy(). And this internal governor behaves how? Like ondemand, powersave, peerformance, or yet something else? And how would its behavior be changed? In the kenel intel_pstate implementation, they have two internal governors: Powersave and Performance. Powersave is similar to the old (cpufreq) ondemand governor. A timer function is periodically invoked to sample the CPU busy info (e.g. will get increased due to the running of a CPU-intensive workload). However, the final calculated target value is clamped into the [min_pct, max_pct] limit interval. The Performance governor is actually a special case of Powersave, when the min_pct= max_pct=100%. This is the same as the old performance governor. So a true powersave one would then be accomplished by setting min_pct = max_pct = some value smaller than 100%. Is there a limit on the valid percentages to be specified here? In the old driver, a powersave governor just sets the CPU to run with the lowest possible performance state. This one does not exist in the intel_pstate driver. The intel_pstate driver changes the terminology by using powersave to refer to the previous ondemand case. This does make people feel confused. But we may think it this way: it only has two modes, the max performance mode and the ondemand mode. ondemand is the one who saves power (actually in a more reasonable way compared to the previous powersave which simply sets the CPU to run with the lowest performance state). Anyway, we can surely change the name if it sounds uncomfortable. The valid pct value range is 0 to 100. Also, you calling powersave what supposedly is ondemand makes me nervous about it not immediately raising the CPU freq when load increases, yet imo that's a fundamental requirement for server kind loads where you don't want to run in performance mode. Can you clarify the behavior here? The timer fires very 10ms to update the CPU P-state according to the sampled workload info. Best, Wei Here in the ported version, the limit interval can be set via the new added interfaces in xenpm. I think we can make use of only the Powersave governor, and the Performance governor can actually be simply achieved by setting min_pct= max_pct=100%. If all of you are agree, I will remove the Performance governor related code in the next version of patchset. Yes, this certainly seems to make sense with the above. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCHv6 5/5] gnttab: use per-VCPU maptrack free lists
On 23/04/15 17:11, Jan Beulich wrote: On 22.04.15 at 18:00, david.vra...@citrix.com wrote: static inline int get_maptrack_handle( struct grant_table *lgt) { +struct vcpu *v = current; int i; grant_handle_thandle; struct grant_mapping *new_mt; unsigned int new_mt_limit, nr_frames; -spin_lock(lgt-maptrack_lock); - -while ( unlikely((handle = __get_maptrack_handle(lgt)) == -1) ) -{ -nr_frames = nr_maptrack_frames(lgt); -if ( nr_frames = max_maptrack_frames ) -break; - -new_mt = alloc_xenheap_page(); -if ( !new_mt ) -break; +handle = __get_maptrack_handle(lgt, v); +if ( likely(handle != -1) ) +return handle; -clear_page(new_mt); +nr_frames = nr_vcpu_maptrack_frames(v); +if ( nr_frames = max_maptrack_frames ) +return -1; -new_mt_limit = lgt-maptrack_limit + MAPTRACK_PER_PAGE; +new_mt = alloc_xenheap_page(); +if ( !new_mt ) +return -1; -for ( i = 1; i MAPTRACK_PER_PAGE; i++ ) -new_mt[i - 1].ref = lgt-maptrack_limit + i; -new_mt[i - 1].ref = lgt-maptrack_head; -lgt-maptrack_head = lgt-maptrack_limit; +clear_page(new_mt); -lgt-maptrack[nr_frames] = new_mt; -smp_wmb(); -lgt-maptrack_limit = new_mt_limit; +new_mt_limit = v-maptrack_limit + MAPTRACK_PER_PAGE; -gdprintk(XENLOG_INFO, Increased maptrack size to %u frames\n, - nr_frames + 1); +for ( i = 1; i MAPTRACK_PER_PAGE; i++ ){ +new_mt[i - 1].ref = (lgt-maptrack_pages * MAPTRACK_PER_PAGE) + i; +new_mt[i - 1].vcpu = v-vcpu_id; +} +/* Set last entry vcpu and ref */ +new_mt[i - 1].ref = v-maptrack_head; +new_mt[i - 1].vcpu = v-vcpu_id; +v-maptrack_head = lgt-maptrack_pages * MAPTRACK_PER_PAGE; +if (v-maptrack_tail == MAPTRACK_TAIL) +{ +v-maptrack_tail = (lgt-maptrack_pages * MAPTRACK_PER_PAGE) ++ MAPTRACK_PER_PAGE - 1; +new_mt[i - 1].ref = MAPTRACK_TAIL; } +spin_lock(lgt-maptrack_lock); +lgt-maptrack[lgt-maptrack_pages++] = new_mt; spin_unlock(lgt-maptrack_lock); The uses of -maptrack_pages ahead of taking the lock can race with updates inside the lock. And with locking elsewhere dropped by the previous patch it looks like you can't update -maptrack[] like you do (you'd need a barrier between the pointer store and the increment, and with that I think the lock would become superfluous if it was needed only for this update). Also note the coding style issues in the changes above^(and also in code further down). This was a last minute optimisation, this isn't on the hot patch so we'll expand the spin_lock to cover all users of maptrack_pages. Sorry about the coding style problems. -return handle; +v-maptrack_limit = new_mt_limit; + +return __get_maptrack_handle(lgt, v); With the lock dropped, nothing guarantees this to succeed, which it ought to unless the table size reached its allowed maximum. @@ -1422,6 +1448,17 @@ gnttab_setup_table( } gt-maptrack_pages = 0; +/* Tracking of mapped foreign frames table */ +if ( (gt-maptrack = xzalloc_array(struct grant_mapping *, + max_maptrack_frames * d-max_vcpus)) == NULL ) +goto out2; See the comments on the similar misplaced hunk in the previous patch. --- a/xen/include/xen/grant_table.h +++ b/xen/include/xen/grant_table.h @@ -60,6 +60,8 @@ struct grant_mapping { u32 ref; /* grant ref */ u16 flags; /* 0-4: GNTMAP_* ; 5-15: unused */ domid_t domid; /* granting domain */ +u32 vcpu; /* vcpu which created the grant mapping */ +u16 pad[2]; }; What is this pad[] good for? The pad is to keep the struct power of 2 sized because this allows the compiler to optimise these macro's to right and left shifts: #define MAPTRACK_PER_PAGE (PAGE_SIZE / sizeof(struct grant_mapping)) #define maptrack_entry(t, e) \ ((t)-maptrack[(e)/MAPTRACK_PER_PAGE][(e)%MAPTRACK_PER_PAGE]) Malcolm Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] libxlu: don't crash on empty lists
On Fri, Apr 24, 2015 at 10:04:31AM +0100, Jan Beulich wrote: Prior to 1a09c5113a (libxlu: rework internal representation of setting) empty lists in config files did get accepted. Restore that behavior. Signed-off-by: Jan Beulich jbeul...@suse.com Thanks for the fix. Acked-by: Wei Liu wei.l...@citrix.com ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] libxl, flexarray_append_pair usage
On Fri, Apr 24, 2015 at 11:16:00AM +0200, Olaf Hering wrote: On Fri, Apr 24, Olaf Hering wrote: flexarray_append(back, state); -flexarray_append(back, GCSPRINTF(%d, 1)); +flexarray_append(back, GCSPRINTF(%d, XenbusStateInitialising)); Should all such code be converted to flexarray_append_pair? To reduce line length a short macro should be added, like FLXAP or FLXPAIR or something else. I don't have very strong opinion on this. Wei. Olaf ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 0/9] Porting the intel_pstate driver to Xen
On 24/04/2015 17:11, Jan Beulich wrote: On 24.04.15 at 10:32, wei.w.w...@intel.com wrote: On 23/04/2015 15:27, Jan Beulich wrote: On 24.04.15 at 07:12, wei.w.w...@intel.com wrote: On 23/04/2015 22:09, Jan Beulich wrote: On 23.04.16 at 15:31, wei.w.w...@intel.com wrote: The intel_pstate.c file under xen/arch/x86/acpi/cpufreq/ contains all the logic for selecting the current P-state. It follows its implementation in the kernel. Instead of using the traditional cpufreq governors, intel_pstate implements its internal governor in the setpolicy(). And this internal governor behaves how? Like ondemand, powersave, peerformance, or yet something else? And how would its behavior be changed? In the kenel intel_pstate implementation, they have two internal governors: Powersave and Performance. Powersave is similar to the old (cpufreq) ondemand governor. A timer function is periodically invoked to sample the CPU busy info (e.g. will get increased due to the running of a CPU-intensive workload). However, the final calculated target value is clamped into the [min_pct, max_pct] limit interval. The Performance governor is actually a special case of Powersave, when the min_pct= max_pct=100%. This is the same as the old performance governor. So a true powersave one would then be accomplished by setting min_pct = max_pct = some value smaller than 100%. Is there a limit on the valid percentages to be specified here? In the old driver, a powersave governor just sets the CPU to run with the lowest possible performance state. This one does not exist in the intel_pstate driver. The intel_pstate driver changes the terminology by using powersave to refer to the previous ondemand case. This does make people feel confused. But we may think it this way: it only has two modes, the max performance mode and the ondemand mode. ondemand is the one who saves power (actually in a more reasonable way compared to the previous powersave which simply sets the CPU to run with the lowest performance state). Anyway, we can surely change the name if it sounds uncomfortable. I think at the very least from a user interface perspective (e.g. the xenpm tool) the meaning of the old governor names should be retained as much as possible. Ok. I am simply using the name internal for user tools. Please see the example below: scaling_driver : intel_pstate scaling_avail_gov: internal current_governor: internal The valid pct value range is 0 to 100. So what does 0% mean then? I.e. (wrt powersave) what does min_pct = max_pct = 0 result in? Probably I misunderstood that question. The CPU actually has its own valid frequency range. On my machine, for example, the minimal is 1.2GHZ (got from an MSR), which corresponds to 32%. Then the valid range is 32 to 100. Any input pct value less than 32 will be set to 32. Also, you calling powersave what supposedly is ondemand makes me nervous about it not immediately raising the CPU freq when load increases, yet imo that's a fundamental requirement for server kind loads where you don't want to run in performance mode. Can you clarify the behavior here? The timer fires very 10ms to update the CPU P-state according to the sampled workload info. But that doesn't tell what the action is that the timer initiates. I.e. under what conditions it would effect a frequency change. Each time when the timer is timer function is invoked, it gets the CPU utilization statistics using APERF and MPERF MSR registers. The current CPU utilization is compared to the previous one to generate a scale factor, which scales the frequency up or down. Best, Wei Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 0/9] Porting the intel_pstate driver to Xen
On 24.04.15 at 12:07, wei.w.w...@intel.com wrote: On 24/04/2015 17:56, Jan Beulich wrote: On 24.04.15 at 11:46, wei.w.w...@intel.com wrote: On 24/04/2015 17:11, Jan Beulich wrote: On 24.04.15 at 10:32, wei.w.w...@intel.com wrote: In the old driver, a powersave governor just sets the CPU to run with the lowest possible performance state. This one does not exist in the intel_pstate driver. The intel_pstate driver changes the terminology by using powersave to refer to the previous ondemand case. This does make people feel confused. But we may think it this way: it only has two modes, the max performance mode and the ondemand mode. ondemand is the one who saves power (actually in a more reasonable way compared to the previous powersave which simply sets the CPU to run with the lowest performance state). Anyway, we can surely change the name if it sounds uncomfortable. I think at the very least from a user interface perspective (e.g. the xenpm tool) the meaning of the old governor names should be retained as much as possible. Ok. I am simply using the name internal for user tools. Please see the example below: scaling_driver : intel_pstate scaling_avail_gov: internal current_governor: internal But xenpm's set-scaling-governor command should still do something useful for the currently available governor options. And with that, showing internal in its output may also end up being confusing (at least I am already being confused). The case is similar to that in the kernel. Xen has two pstate driver, but only one can be loaded. When the intel_pstate driver is used (scaling_driver : intel_pstate ), xenpm's set-scaling-governor will not take effect, since the intel_pstate only implements its internal governor. But that's precisely what I'm asking to be changed: Even if internally is uses its own governor, the user interface of the tool should still be usable to achieve similar effects. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [OSSTEST Nested PATCH v8 5/7] Add new script to customize nested test configuration
-Original Message- From: Ian Campbell [mailto:ian.campb...@citrix.com] Sent: Thursday, April 23, 2015 7:30 PM To: Hu, Robert Cc: Pang, LongtaoX; xen-devel@lists.xen.org; ian.jack...@eu.citrix.com; wei.l...@citrix.com Subject: Re: [OSSTEST Nested PATCH v8 5/7] Add new script to customize nested test configuration On Thu, 2015-04-23 at 17:38 +0800, Robert Hu wrote: As mentioned in [ Patch], 'nested_l1' is ident for L1 guest VM, 'nestedl1' is hostname for L1 guest VM. ' store_runvar('nested_l1',$l1-{Guest});' is used to store L1's ident and L1's hostname to database, that will be used for 'selecthost()' function in later script. Having a runvar with the bare ident as a name doesn't seem right. Perhaps you want to be writing to $r{${ident}_hostname} or some such? What do you actually need the hostname for anyway? Look at selecthost() sub selecthost ($) { my ($ident) = @_; # must be run outside transaction my $name; if ($ident =~ m/=/) { $ident= $`; $name= $'; #' $r{$ident}= $name; } else { $name= $r{$ident}; die no specified $ident unless defined $name; } ... When in L1 iteration invocation of ts-debian-hvm-install(), the ident passed in is L1 ident (nested_l1). Here the 'else' branch will need $r{$ident}, whose value shall be L1's name. Therefore we prepare this runvar in advance. Ah I see, today usually the ident is host or dst_host or src_host so I got confused by it just being nested_l1 here. In the normal case today the ident runvars are set by ts-hosts-allocate. That can't apply to the nested case since it is allocating a guest and not a host, so your ts-nested-setup seems like a reasonable enough place to do it. However, I don't think there is any reason for the indent, hostname and guest name to differ, and I think perhaps it would be clearer to write this as: our $l1_ident = $gho-{Guest}; store_runvar($l1_ident, $gho-{Guest}); So that it is clear to the reader that this runvar is an ident. I suppose a code comment would work as well (or in addition perhaps). Most places you seem to use nestedl1, not nested_l1. I think you ended up with this due to not using guest_var and the various hardcoding you've done elsewhere. I suspect with all that fixed and make-flight updated accordingly you won't need this var any more. I think I should define L1 ident with my $l1_ident = 'nested_l1' in 'ts-nested-setup' Hardcoding anything like that in ts-nested-setup seems wrong to me. The ident should come from the script's parameters (i.e. be part of the sg-run-job invocation of the script). Imagine a hypothetical nested-virt migration test, in that scenario you would want to run ts-nested-setup on two hosts, both nestedl1src and nestedl2dst in order to configure things. That's why we don't hardcode such things. so to summarize, ts-nested-setup will be invoked with parameters of $l0_ident, $l1_ident and $l1_name? or only parameters of $l0_ident and $l1_ident, if we have setup $l1_ident-$l1_name mapping in runvar when in l0's iteration of ts-debian-hvm-install. which would you prefer? $l1_ident and $l1_name should be the same, I think. Semantically the script should be taking $l0_ident and $l1_ident, where $l0 here is a host and $l1 here is a guest, so infact isn't $l1_nae just $l1-{Guest}? (Things get confusing because $l1 can be a Guest or a Host depending on which test script we are in) +store_runvar($l1-{Guest}_ip,$l1-{Ip}); + +my $l2_disk_mb = 2; +my $l2_lv_name = 'nestedl2-disk'; +my $vgname = $l0-{Name}; +$vgname .= .$c{TestHostDomain} if ($l0-{Suite} =~ m/lenny/); +target_cmd_root($l0, END); +lvremove -f /dev/${vgname}/${l2_lv_name} ||: +lvcreate -L ${l2_disk_mb}M -n $l2_lv_name $vgname +dd if=/dev/zero of=/dev/${vgname}/${l2_lv_name} count=10 I think you can do all of this by passing a suitable l2 $gho to prepareguest_part_lvmdisk, can't you? I think you can get that by my $l2 = selectguest($ARGV[], $l1). I think I could try it, that means I will add more parameters for 'ts-nested-setup' script, not just 'host' and 'nestedl1'. I think we can pass in 3 parameters: $l0_ident, $l1_ident, and $l2_ident, as long as we in advance setup their $ident--$name mappings in runvar. Here we have 2 options: 1. setup such mapping in first iteration (l0 interation) of ts-debian-hvm-install 2. setup this in make flight I prefer 2. since such mapping can be fixed from beginning and shall not be changed in run time. The issue we are coming across here is that we are trying to talk about the l2 guest before it really exists, since that happens later when prepareguest is called in the l1 context and here we are
Re: [Xen-devel] [PATCH 0/9] Porting the intel_pstate driver to Xen
On 24.04.15 at 11:46, wei.w.w...@intel.com wrote: On 24/04/2015 17:11, Jan Beulich wrote: On 24.04.15 at 10:32, wei.w.w...@intel.com wrote: In the old driver, a powersave governor just sets the CPU to run with the lowest possible performance state. This one does not exist in the intel_pstate driver. The intel_pstate driver changes the terminology by using powersave to refer to the previous ondemand case. This does make people feel confused. But we may think it this way: it only has two modes, the max performance mode and the ondemand mode. ondemand is the one who saves power (actually in a more reasonable way compared to the previous powersave which simply sets the CPU to run with the lowest performance state). Anyway, we can surely change the name if it sounds uncomfortable. I think at the very least from a user interface perspective (e.g. the xenpm tool) the meaning of the old governor names should be retained as much as possible. Ok. I am simply using the name internal for user tools. Please see the example below: scaling_driver : intel_pstate scaling_avail_gov: internal current_governor: internal But xenpm's set-scaling-governor command should still do something useful for the currently available governor options. And with that, showing internal in its output may also end up being confusing (at least I am already being confused). Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH OSSTEST] standalone: noreinstall - reinstall in help string
Signed-off-by: Wei Liu wei.l...@citrix.com Cc: Ian Campbell ian.campb...@citrix.com Cc: Ian Jackson ian.jack...@eu.citrix.com --- standalone | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/standalone b/standalone index d8113e4..17fa40c 100755 --- a/standalone +++ b/standalone @@ -41,7 +41,7 @@ Options: -f FLIGHT, --flight=FLIGHTOperate on FLIGHT -h HOST, --host=HOST Test host -r, --reuse Do not wipe test host (default) --R, --noreuse, --noreinstall Wipe the test host (if job or test does so) +-R, --noreuse, --reinstallWipe the test host (if job or test does so) -s, --simulate, --dry-run Dry run, printing what would be run --lvextendmax=GB Limit the size of the build logical volume --baselineCreate a baseline flight instead of a tip flight -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 0/9] Porting the intel_pstate driver to Xen
On 24/04/2015 17:56, Jan Beulich wrote: On 24.04.15 at 11:46, wei.w.w...@intel.com wrote: On 24/04/2015 17:11, Jan Beulich wrote: On 24.04.15 at 10:32, wei.w.w...@intel.com wrote: In the old driver, a powersave governor just sets the CPU to run with the lowest possible performance state. This one does not exist in the intel_pstate driver. The intel_pstate driver changes the terminology by using powersave to refer to the previous ondemand case. This does make people feel confused. But we may think it this way: it only has two modes, the max performance mode and the ondemand mode. ondemand is the one who saves power (actually in a more reasonable way compared to the previous powersave which simply sets the CPU to run with the lowest performance state). Anyway, we can surely change the name if it sounds uncomfortable. I think at the very least from a user interface perspective (e.g. the xenpm tool) the meaning of the old governor names should be retained as much as possible. Ok. I am simply using the name internal for user tools. Please see the example below: scaling_driver : intel_pstate scaling_avail_gov: internal current_governor: internal But xenpm's set-scaling-governor command should still do something useful for the currently available governor options. And with that, showing internal in its output may also end up being confusing (at least I am already being confused). The case is similar to that in the kernel. Xen has two pstate driver, but only one can be loaded. When the intel_pstate driver is used (scaling_driver : intel_pstate ), xenpm's set-scaling-governor will not take effect, since the intel_pstate only implements its internal governor. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v3 3/5] libxl: use LIBXL_TOOLSTACK_DOMID
On Fri, Apr 24, 2015 at 10:49:36AM +0200, Olaf Hering wrote: On Fri, Mar 20, Wei Liu wrote: +++ b/tools/libxl/libxl_dm.c @@ -1438,6 +1438,8 @@ retry_transaction: spawn-what = GCSPRINTF(domain %d device model, domid); spawn-xspath = GCSPRINTF(/local/domain/0/device-model/%d/state, domid); +spawn-xspath = libxl__device_model_xs_path(gc, LIBXL_TOOLSTACK_DOMID, +domid, /state); This assigns xspath twice. Good spot. I will send a patch to fix this. Luckily this doesn't cause practical problem except for wasting some memory. :-) Wei. Olaf ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCHv6 5/5] gnttab: use per-VCPU maptrack free lists
On 24.04.15 at 11:09, malcolm.cross...@citrix.com wrote: On 23/04/15 17:11, Jan Beulich wrote: On 22.04.15 at 18:00, david.vra...@citrix.com wrote: --- a/xen/include/xen/grant_table.h +++ b/xen/include/xen/grant_table.h @@ -60,6 +60,8 @@ struct grant_mapping { u32 ref; /* grant ref */ u16 flags; /* 0-4: GNTMAP_* ; 5-15: unused */ domid_t domid; /* granting domain */ +u32 vcpu; /* vcpu which created the grant mapping */ +u16 pad[2]; }; What is this pad[] good for? The pad is to keep the struct power of 2 sized because this allows the compiler to optimise these macro's to right and left shifts: #define MAPTRACK_PER_PAGE (PAGE_SIZE / sizeof(struct grant_mapping)) #define maptrack_entry(t, e) \ ((t)-maptrack[(e)/MAPTRACK_PER_PAGE][(e)%MAPTRACK_PER_PAGE]) Okay, then why u16[2] instead of u32? And please add a brief comment explaining the reason. Apart from that I wonder whether fitting vcpu in the 10 unused flags bits (not 11, as the comment on the field suggests) would be an option. That would require limiting vCPU count to 4k, which I don't think would really be a problem for anyone. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 0/9] Porting the intel_pstate driver to Xen
On 24/04/2015 18:20, Jan Beulich wrote On 24.04.15 at 12:07, wei.w.w...@intel.com wrote: On 24/04/2015 17:56, Jan Beulich wrote: On 24.04.15 at 11:46, wei.w.w...@intel.com wrote: On 24/04/2015 17:11, Jan Beulich wrote: On 24.04.15 at 10:32, wei.w.w...@intel.com wrote: I think at the very least from a user interface perspective (e.g. the xenpm tool) the meaning of the old governor names should be retained as much as possible. Ok. I am simply using the name internal for user tools. Please see the example below: scaling_driver : intel_pstate scaling_avail_gov: internal current_governor: internal But xenpm's set-scaling-governor command should still do something useful for the currently available governor options. And with that, showing internal in its output may also end up being confusing (at least I am already being confused). The case is similar to that in the kernel. Xen has two pstate driver, but only one can be loaded. When the intel_pstate driver is used (scaling_driver : intel_pstate ), xenpm's set-scaling-governor will not take effect, since the intel_pstate only implements its internal governor. But that's precisely what I'm asking to be changed: Even if internally is uses its own governor, the user interface of the tool should still be usable to achieve similar effects. Yes, we should change that if we remain two governors in the intel_pstate driver. But as we discussed before, the internal Performance governor seems redundant and can be removed. In that case, there is no other option of governors that be selected by the user via set-scaling-governor. Best, Wei ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH] libxl: convert strings and ints to xenbus_state
Convert all plain ints and strings which are used for xenbus state files to xenbus_state. This makes it easier to find code which deals with backend/frontend state changes. Convert usage of libxl__sprintf to GCSPRINTF. No change in behaviour is expected by this change, beside a small increase of runtime memory usage in places that used a string constant. Signed-off-by: Olaf Hering o...@aepfle.de Cc: Ian Jackson ian.jack...@eu.citrix.com Cc: Stefano Stabellini stefano.stabell...@eu.citrix.com Cc: Ian Campbell ian.campb...@citrix.com Cc: Wei Liu wei.l...@citrix.com --- tools/libxl/libxl.c| 26 +- tools/libxl/libxl_device.c | 4 ++-- tools/libxl/libxl_pci.c| 18 +- 3 files changed, 24 insertions(+), 24 deletions(-) diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c index feb3aa9..516713e 100644 --- a/tools/libxl/libxl.c +++ b/tools/libxl/libxl.c @@ -2080,7 +2080,7 @@ void libxl__device_vtpm_add(libxl__egc *egc, uint32_t domid, flexarray_append(back, online); flexarray_append(back, 1); flexarray_append(back, state); -flexarray_append(back, GCSPRINTF(%d, 1)); +flexarray_append(back, GCSPRINTF(%d, XenbusStateInitialising)); flexarray_append(back, handle); flexarray_append(back, GCSPRINTF(%d, vtpm-devid)); @@ -2092,7 +2092,7 @@ void libxl__device_vtpm_add(libxl__egc *egc, uint32_t domid, flexarray_append(front, backend-id); flexarray_append(front, GCSPRINTF(%d, vtpm-backend_domid)); flexarray_append(front, state); -flexarray_append(front, GCSPRINTF(%d, 1)); +flexarray_append(front, GCSPRINTF(%d, XenbusStateInitialising)); flexarray_append(front, handle); flexarray_append(front, GCSPRINTF(%d, vtpm-devid)); @@ -2524,7 +2524,7 @@ static void device_disk_add(libxl__egc *egc, uint32_t domid, flexarray_append(back, bootable); flexarray_append(back, libxl__sprintf(gc, %d, 1)); flexarray_append(back, state); -flexarray_append(back, libxl__sprintf(gc, %d, 1)); +flexarray_append(back, GCSPRINTF(%d, XenbusStateInitialising)); flexarray_append(back, dev); flexarray_append(back, disk-vdev); flexarray_append(back, type); @@ -2544,7 +2544,7 @@ static void device_disk_add(libxl__egc *egc, uint32_t domid, flexarray_append(front, backend-id); flexarray_append(front, libxl__sprintf(gc, %d, disk-backend_domid)); flexarray_append(front, state); -flexarray_append(front, libxl__sprintf(gc, %d, 1)); +flexarray_append(front, GCSPRINTF(%d, XenbusStateInitialising)); flexarray_append(front, virtual-device); flexarray_append(front, libxl__sprintf(gc, %d, device-devid)); flexarray_append(front, device-type); @@ -3148,7 +3148,7 @@ static void local_device_attach_cb(libxl__egc *egc, libxl__ao_device *aodev) if (rc 0) goto out; be_path = libxl__device_backend_path(gc, device); -rc = libxl__wait_for_backend(gc, be_path, 4); +rc = libxl__wait_for_backend(gc, be_path, GCSPRINTF(%d, XenbusStateConnected)); if (rc 0) goto out; @@ -3358,7 +3358,7 @@ void libxl__device_nic_add(libxl__egc *egc, uint32_t domid, flexarray_append(back, online); flexarray_append(back, 1); flexarray_append(back, state); -flexarray_append(back, libxl__sprintf(gc, %d, 1)); +flexarray_append(back, GCSPRINTF(%d, XenbusStateInitialising)); if (nic-script) flexarray_append_pair(back, script, libxl__abs_path(gc, nic-script, @@ -3399,7 +3399,7 @@ void libxl__device_nic_add(libxl__egc *egc, uint32_t domid, flexarray_append(front, backend-id); flexarray_append(front, libxl__sprintf(gc, %d, nic-backend_domid)); flexarray_append(front, state); -flexarray_append(front, libxl__sprintf(gc, %d, 1)); +flexarray_append(front, GCSPRINTF(%d, XenbusStateInitialising)); flexarray_append(front, handle); flexarray_append(front, libxl__sprintf(gc, %d, nic-devid)); flexarray_append(front, mac); @@ -3685,7 +3685,7 @@ int libxl__device_console_add(libxl__gc *gc, uint32_t domid, flexarray_append(back, online); flexarray_append(back, 1); flexarray_append(back, state); -flexarray_append(back, libxl__sprintf(gc, %d, 1)); +flexarray_append(back, GCSPRINTF(%d, XenbusStateInitialising)); flexarray_append(back, protocol); flexarray_append(back, LIBXL_XENCONSOLE_PROTOCOL); @@ -3724,7 +3724,7 @@ int libxl__device_console_add(libxl__gc *gc, uint32_t domid, flexarray_append(ro_front, libxl__sprintf(gc, %lu, state-console_mfn)); } else { flexarray_append(front, state); -flexarray_append(front, libxl__sprintf(gc, %d, 1)); +flexarray_append(front, GCSPRINTF(%d, XenbusStateInitialising)); flexarray_append(front, protocol); flexarray_append(front, LIBXL_XENCONSOLE_PROTOCOL); } @@ -4021,12
Re: [Xen-devel] [PATCH 1/2] x86/hvm: actually release ioreq server pages
-Original Message- From: Jan Beulich [mailto:jbeul...@suse.com] Sent: 24 April 2015 09:01 To: Paul Durrant Cc: Andrew Cooper; xen-de...@lists.xenproject.org; Keir (Xen.org) Subject: Re: [PATCH 1/2] x86/hvm: actually release ioreq server pages On 23.04.15 at 17:46, paul.durr...@citrix.com wrote: --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -496,7 +496,7 @@ static void hvm_free_ioreq_gmfn(struct domain *d, unsigned long gmfn) { unsigned int i = gmfn - d-arch.hvm_domain.ioreq_gmfn.base; -clear_bit(i, d-arch.hvm_domain.ioreq_gmfn.mask); +set_bit(i, d-arch.hvm_domain.ioreq_gmfn.mask); } Double checking this I came to wonder whether HVM_PARAM_NR_IOREQ_SERVER_PAGES handling isn't broken when invoked a second time with a.value smaller than what was used the first time, or when any server pages are already in use. Yes, the param is only intended to be a set-once param, the same as HVM_PARAM_IOREQ_SERVER_PFN. If the latter were changed after ioreq servers had been created then it could also lead to strangeness. I'll put together a patch to add checks for both. Paul Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [rumpuserxen test] 52054: regressions - FAIL
flight 52054 rumpuserxen real [real] http://logs.test-lab.xenproject.org/osstest/logs/52054/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: build-amd64-rumpuserxen 5 rumpuserxen-build fail REGR. vs. 33866 build-i386-rumpuserxen5 rumpuserxen-build fail REGR. vs. 33866 Tests which did not succeed, but are not blocking: test-amd64-amd64-rumpuserxen-amd64 1 build-check(1) blocked n/a test-amd64-i386-rumpuserxen-i386 1 build-check(1) blocked n/a version targeted for testing: rumpuserxen 3b91e44996ea6ae1276bce1cc44f38701c53ee6f baseline version: rumpuserxen 30d72f3fc5e35cd53afd82c8179cc0e0b11146ad People who touched revisions under test: Antti Kantee po...@iki.fi Ian Jackson ian.jack...@eu.citrix.com Martin Lucina mar...@lucina.net Wei Liu l...@liuw.name jobs: build-amd64-xsm pass build-i386-xsm pass build-amd64 pass build-i386 pass build-amd64-pvopspass build-i386-pvops pass build-amd64-rumpuserxen fail build-i386-rumpuserxen fail test-amd64-amd64-rumpuserxen-amd64 blocked test-amd64-i386-rumpuserxen-i386 blocked sg-report-flight on osstest.test-lab.xenproject.org logs: /home/osstest/pub/logs images: /home/osstest/pub/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Not pushing. (No revision log; it would be 535 lines long.) ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCHv6 5/5] gnttab: use per-VCPU maptrack free lists
On 24/04/15 10:50, Jan Beulich wrote: On 24.04.15 at 11:09, malcolm.cross...@citrix.com wrote: On 23/04/15 17:11, Jan Beulich wrote: On 22.04.15 at 18:00, david.vra...@citrix.com wrote: --- a/xen/include/xen/grant_table.h +++ b/xen/include/xen/grant_table.h @@ -60,6 +60,8 @@ struct grant_mapping { u32 ref; /* grant ref */ u16 flags; /* 0-4: GNTMAP_* ; 5-15: unused */ domid_t domid; /* granting domain */ +u32 vcpu; /* vcpu which created the grant mapping */ +u16 pad[2]; }; What is this pad[] good for? The pad is to keep the struct power of 2 sized because this allows the compiler to optimise these macro's to right and left shifts: #define MAPTRACK_PER_PAGE (PAGE_SIZE / sizeof(struct grant_mapping)) #define maptrack_entry(t, e) \ ((t)-maptrack[(e)/MAPTRACK_PER_PAGE][(e)%MAPTRACK_PER_PAGE]) Okay, then why u16[2] instead of u32? And please add a brief comment explaining the reason. Apart from that I wonder whether fitting vcpu in the 10 unused flags bits (not 11, as the comment on the field suggests) would be an option. That would require limiting vCPU count to 4k, which I don't think would really be a problem for anyone. I'd like to not have the overhead of bit operations on a hot path. We're going from 512 grant entries per page to 256 grant entries per page. This is taking the Xen memory overhead from 0.2% to 0.4% for grant mapped pages. Is the extra 0.2% memory overhead that concerning? Malcolm Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCHv6 5/5] gnttab: use per-VCPU maptrack free lists
On 24.04.15 at 12:21, malcolm.cross...@citrix.com wrote: On 24/04/15 10:50, Jan Beulich wrote: On 24.04.15 at 11:09, malcolm.cross...@citrix.com wrote: On 23/04/15 17:11, Jan Beulich wrote: On 22.04.15 at 18:00, david.vra...@citrix.com wrote: --- a/xen/include/xen/grant_table.h +++ b/xen/include/xen/grant_table.h @@ -60,6 +60,8 @@ struct grant_mapping { u32 ref; /* grant ref */ u16 flags; /* 0-4: GNTMAP_* ; 5-15: unused */ domid_t domid; /* granting domain */ +u32 vcpu; /* vcpu which created the grant mapping */ +u16 pad[2]; }; What is this pad[] good for? The pad is to keep the struct power of 2 sized because this allows the compiler to optimise these macro's to right and left shifts: #define MAPTRACK_PER_PAGE (PAGE_SIZE / sizeof(struct grant_mapping)) #define maptrack_entry(t, e) \ ((t)-maptrack[(e)/MAPTRACK_PER_PAGE][(e)%MAPTRACK_PER_PAGE]) Okay, then why u16[2] instead of u32? And please add a brief comment explaining the reason. Apart from that I wonder whether fitting vcpu in the 10 unused flags bits (not 11, as the comment on the field suggests) would be an option. That would require limiting vCPU count to 4k, which I don't think would really be a problem for anyone. I'd like to not have the overhead of bit operations on a hot path. We're going from 512 grant entries per page to 256 grant entries per page. This is taking the Xen memory overhead from 0.2% to 0.4% for grant mapped pages. Is the extra 0.2% memory overhead that concerning? Perhaps not so much the memory overhead but the doubled cache bandwidth needed. But anyway, this was only a question, and with both you and Andrew being opposed let's just stay with what you submitted. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] AMD IOMMU: only translate remapped IO-APIC RTEs
On 24.04.15 at 16:05, li...@eikelenboom.it wrote: I see you commited AMD IOMMU: only translate remapped IO-APIC RTEs to staging, any reason why your patch in http://lists.xen.org/archives/html/xen-devel/2015-04/msg02253.html isn't commited yet ? (still waiting for an formal ack from someone ? as Suravee also seems to have tested it) This wasn't even submitted as a patch for inclusion yet (hence it necessarily lacks required acks). And that is because irrespective of it being confirmed to work in some cases, I'm still being concerned about Tim's warning regarding there being possible races left. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] AMD IOMMU: only translate remapped IO-APIC RTEs
Friday, April 24, 2015, 4:12:42 PM, you wrote: On 24.04.15 at 16:05, li...@eikelenboom.it wrote: I see you commited AMD IOMMU: only translate remapped IO-APIC RTEs to staging, any reason why your patch in http://lists.xen.org/archives/html/xen-devel/2015-04/msg02253.html isn't commited yet ? (still waiting for an formal ack from someone ? as Suravee also seems to have tested it) This wasn't even submitted as a patch for inclusion yet (hence it necessarily lacks required acks). And that is because irrespective of it being confirmed to work in some cases, I'm still being concerned about Tim's warning regarding there being possible races left. Jan Ah OK that's clear, i wasn't aware you had some concerns left. Thanks for the explanation ! -- Sander ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2] xen: Add __GFP_DMA flag when xen_swiotlb_init gets free pages on ARM
On Fri, Apr 24, 2015 at 10:16:40AM +0100, Stefano Stabellini wrote: Make sure that xen_swiotlb_init allocates buffers that are DMA capable when at least one memblock is available below 4G. Otherwise we assume that all devices on the SoC can cope with 4G addresses. We do this on ARM and ARM64, where dom0 is mapped 1:1, so pfn == mfn in this case. No functional changes on x86. From: Chen Baozi baoz...@gmail.com Signed-off-by: Chen Baozi baoz...@gmail.com Signed-off-by: Stefano Stabellini stefano.stabell...@eu.citrix.com Tested-by: Chen Baozi baoz...@gmail.com Acked-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com diff --git a/arch/arm/include/asm/xen/page.h b/arch/arm/include/asm/xen/page.h index 2f7e6ff..0b579b2 100644 --- a/arch/arm/include/asm/xen/page.h +++ b/arch/arm/include/asm/xen/page.h @@ -110,5 +110,6 @@ static inline bool set_phys_to_machine(unsigned long pfn, unsigned long mfn) bool xen_arch_need_swiotlb(struct device *dev, unsigned long pfn, unsigned long mfn); +unsigned long xen_get_swiotlb_free_pages(unsigned int order); #endif /* _ASM_ARM_XEN_PAGE_H */ diff --git a/arch/arm/xen/mm.c b/arch/arm/xen/mm.c index 793551d..4983250 100644 --- a/arch/arm/xen/mm.c +++ b/arch/arm/xen/mm.c @@ -4,6 +4,7 @@ #include linux/gfp.h #include linux/highmem.h #include linux/export.h +#include linux/memblock.h #include linux/of_address.h #include linux/slab.h #include linux/types.h @@ -21,6 +22,20 @@ #include asm/xen/hypercall.h #include asm/xen/interface.h +unsigned long xen_get_swiotlb_free_pages(unsigned int order) +{ + struct memblock_region *reg; + gfp_t flags = __GFP_NOWARN; + + for_each_memblock(memory, reg) { + if (reg-base (phys_addr_t)0x) { + flags |= __GFP_DMA; + break; + } + } + return __get_free_pages(flags, order); +} + enum dma_cache_op { DMA_UNMAP, DMA_MAP, diff --git a/arch/x86/include/asm/xen/page.h b/arch/x86/include/asm/xen/page.h index 358dcd3..c44a5d5 100644 --- a/arch/x86/include/asm/xen/page.h +++ b/arch/x86/include/asm/xen/page.h @@ -269,4 +269,9 @@ static inline bool xen_arch_need_swiotlb(struct device *dev, return false; } +static inline unsigned long xen_get_swiotlb_free_pages(unsigned int order) +{ + return __get_free_pages(__GFP_NOWARN, order); +} + #endif /* _ASM_X86_XEN_PAGE_H */ diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c index 810ad41..4c54932 100644 --- a/drivers/xen/swiotlb-xen.c +++ b/drivers/xen/swiotlb-xen.c @@ -235,7 +235,7 @@ retry: #define SLABS_PER_PAGE (1 (PAGE_SHIFT - IO_TLB_SHIFT)) #define IO_TLB_MIN_SLABS ((120) IO_TLB_SHIFT) while ((SLABS_PER_PAGE order) IO_TLB_MIN_SLABS) { - xen_io_tlb_start = (void *)__get_free_pages(__GFP_NOWARN, order); + xen_io_tlb_start = (void *)xen_get_swiotlb_free_pages(order); if (xen_io_tlb_start) break; order--; ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] io.c:164:d25v0 Weird HVM ioemulation status 1. domain_crash called from io.c:165
On Fri, Apr 24, 2015 at 02:02:36PM +0200, Sander Eikelenboom wrote: Hi, On Xen-unstable I encountered this non stopping logspam while trying to shutdown an HVM domain with pci-passthrough: (XEN) [2015-04-24 11:55:47.802] io.c:164:d25v0 Weird HVM ioemulation status 1. (XEN) [2015-04-24 11:55:47.802] domain_crash called from io.c:165 (XEN) [2015-04-24 11:55:47.802] io.c:164:d25v0 Weird HVM ioemulation status 1. And I saw as well with the latest unstable. Hadn't tried to narrow it down. (XEN) [2015-04-24 11:55:47.802] domain_crash called from io.c:165 (XEN) [2015-04-24 11:55:47.802] io.c:164:d25v0 Weird HVM ioemulation status 1. (XEN) [2015-04-24 11:55:47.802] domain_crash called from io.c:165 (XEN) [2015-04-24 11:55:47.802] io.c:164:d25v0 Weird HVM ioemulation status 1. (XEN) [2015-04-24 11:55:47.802] domain_crash called from io.c:165 (XEN) [2015-04-24 11:55:47.802] io.c:164:d25v0 Weird HVM ioemulation status 1. (XEN) [2015-04-24 11:55:47.802] domain_crash called from io.c:165 (XEN) [2015-04-24 11:55:47.802] io.c:164:d25v0 Weird HVM ioemulation status 1. (XEN) [2015-04-24 11:55:47.802] domain_crash called from io.c:165 (XEN) [2015-04-24 11:55:47.802] io.c:164:d25v0 Weird HVM ioemulation status 1. (XEN) [2015-04-24 11:55:47.802] domain_crash called from io.c:165 (XEN) [2015-04-24 11:55:47.802] io.c:164:d25v0 Weird HVM ioemulation status 1. (XEN) [2015-04-24 11:55:47.802] domain_crash called from io.c:165 (XEN) [2015-04-24 11:55:47.802] io.c:164:d25v0 Weird HVM ioemulation status 1. (XEN) [2015-04-24 11:55:47.802] domain_crash called from io.c:165 (XEN) [2015-04-24 11:55:47.802] io.c:164:d25v0 Weird HVM ioemulation status 1. (XEN) [2015-04-24 11:55:47.802] domain_crash called from io.c:165 (XEN) [2015-04-24 11:55:47.802] io.c:164:d25v0 Weird HVM ioemulation status 1. etc. etc. etc. The domain had shutdown perfectly well on earlier occasions, but this time it kept running. It did drop from it's assigned 3 vpcu's to 1 vpcu, but it stalled there. The log spam only stopped by destroying the guest. Since i haven't seen it before, it's probably hard to replicate. -- Sander ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 0/9] Porting the intel_pstate driver to Xen
On 24.04.15 at 13:53, wei.w.w...@intel.com wrote: On 24/04/2015 18:20, Jan Beulich wrote On 24.04.15 at 12:07, wei.w.w...@intel.com wrote: On 24/04/2015 17:56, Jan Beulich wrote: On 24.04.15 at 11:46, wei.w.w...@intel.com wrote: On 24/04/2015 17:11, Jan Beulich wrote: On 24.04.15 at 10:32, wei.w.w...@intel.com wrote: I think at the very least from a user interface perspective (e.g. the xenpm tool) the meaning of the old governor names should be retained as much as possible. Ok. I am simply using the name internal for user tools. Please see the example below: scaling_driver : intel_pstate scaling_avail_gov: internal current_governor: internal But xenpm's set-scaling-governor command should still do something useful for the currently available governor options. And with that, showing internal in its output may also end up being confusing (at least I am already being confused). The case is similar to that in the kernel. Xen has two pstate driver, but only one can be loaded. When the intel_pstate driver is used (scaling_driver : intel_pstate ), xenpm's set-scaling-governor will not take effect, since the intel_pstate only implements its internal governor. But that's precisely what I'm asking to be changed: Even if internally is uses its own governor, the user interface of the tool should still be usable to achieve similar effects. Yes, we should change that if we remain two governors in the intel_pstate driver. But as we discussed before, the internal Performance governor seems redundant and can be removed. In that case, there is no other option of governors that be selected by the user via set-scaling-governor. I'm not sure how else to express what I want (no matter how many internal governors the intel_pstate driver has). xenpm set-scaling-governor powersave xenpm set-scaling-governor ondemand xenpm set-scaling-governor performance each should switch the system into a respective state, no matter whether internally to the driver this means a change of governors or just a modification to {min,max}_pct. And obtaining the current state after any of the above should show the same governor in use that was set (and not internal), again no matter how this is being achieved internally to the driver. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] AMD IOMMU: only translate remapped IO-APIC RTEs
Friday, April 24, 2015, 12:12:32 AM, you wrote: On 4/23/15, 12:59, Sander Eikelenboom li...@eikelenboom.it wrote: On 4/23/15, 08:47, Jan Beulich jbeul...@suse.com wrote: On 23.04.15 at 15:31, suravee.suthikulpa...@amd.com wrote: On 4/17/15, 10:27, Jan Beulich jbeul...@suse.com wrote: 1aeb1156fa (x86 don't change affinity with interrupt unmasked) introducing RTE reads prior to the respective interrupt having got enabled for the first time uncovered a bug in 2ca9fbd739 (AMD IOMMU: allocate IRTE entries instead of using a static mapping): We obviously shouldn't be translating RTEs for which remapping didn't get set up yet. Reported-by: Sander Eikelenboom li...@eikelenboom.it Signed-off-by: Jan Beulich jbeul...@suse.com --- a/xen/drivers/passthrough/amd/iommu_intr.c +++ b/xen/drivers/passthrough/amd/iommu_intr.c @@ -365,15 +365,17 @@ unsigned int amd_iommu_read_ioapic_from_ unsigned int apic, unsigned int reg) { unsigned int val = __io_apic_read(apic, reg); +unsigned int pin = (reg - 0x10) / 2; +unsigned int offset = ioapic_sbdf[IO_APIC_ID(apic)].pin_2_idx[pin]; -if ( !(reg 1) ) +if ( !(reg 1) offset INTREMAP_ENTRIES ) { -unsigned int offset = val (INTREMAP_ENTRIES - 1); u16 bdf = ioapic_sbdf[IO_APIC_ID(apic)].bdf; u16 seg = ioapic_sbdf[IO_APIC_ID(apic)].seg; u16 req_id = get_intremap_requestor_id(seg, bdf); const u32 *entry = get_intremap_entry(seg, req_id, offset); +ASSERT(offset == (val (INTREMAP_ENTRIES - 1))); Jan, could you please explain why the ASSERT is needed here? The previous value offset got assigned was calculated using the right side expression. I.e. the assert makes sure that what we used before and what we use now is the same. Jan Jan, I have tested this patch w/ staging branch booting Dom0, and this patch got rid of the following error from xl dmesg: (XEN) APIC error on CPU0: 00(40) (XEN) APIC error on CPU2: 00(40) However, when I tried starting a guest w/ PCI device passthrough, the system crashed and reboot. Although, I don¹t think this is caused by the patch. Acked-by: Suravee Suthikulpanit suravee.suthikulpa...@amd.com I¹m looking into the issue. I also went back and tested it with 4.5 on the same setup and didn¹t see the issue. Thanks, Suravee If it crashes with: That problem is likely solved by another patch: http://lists.xen.org/archives/html/xen-devel/2015-04/msg02253.html Yes, this has fixed the crash. Thanks, Suravee Hi Jan, I see you commited AMD IOMMU: only translate remapped IO-APIC RTEs to staging, any reason why your patch in http://lists.xen.org/archives/html/xen-devel/2015-04/msg02253.html isn't commited yet ? (still waiting for an formal ack from someone ? as Suravee also seems to have tested it) -- Sander ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] io.c:164:d25v0 Weird HVM ioemulation status 1. domain_crash called from io.c:165
On Fri, Apr 24, Sander Eikelenboom wrote: (XEN) [2015-04-24 11:55:47.802] domain_crash called from io.c:165 Since i haven't seen it before, it's probably hard to replicate. I see it all time on my beloved ProBook. With an ordinary domU.cfg. name=fv-13.1-pvscsi uuid=4ff34139-8381-4f3b-8c3a-0078a794b152 memory=512 builder=hvm boot=cd disk=[ 'vdev=hda, discard, target=/work/xen/pvscsi/fv-13.1-pvscsi/vdisk-fv-13.1-pvscsi.raw', 'vdev=hdc, discard, cdrom, target=/dist/iso/13.2/openSUSE-13.2-DVD-x86_64.iso', ] vif=[ 'mac=00:16:3e:13:01:00,ip=1.1.1.2,type=vif,gatewaydev=xentap,script=vif-route' ] keymap=de vnc=0 sdl=1 vncunused=1 serial=pty Olaf ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v7 2/5] sysctl: Add sysctl interface for querying PCI topology
On 04/24/2015 03:19 AM, Jan Beulich wrote: On 24.04.15 at 00:20, boris.ostrov...@oracle.com wrote: On 04/21/2015 03:01 AM, Jan Beulich wrote: + ((++dev_cnt 0x3f) hypercall_preempt_check()) ) +break; +} + +if ( (!ret || (ret == -ENODEV)) + __copy_field_to_guest(u_sysctl, op, u.pcitopoinfo.first_dev) ) +ret = -EFAULT; +} +break; +#endif With the continuation-less model now used I don't think it makes sense to have first_dev and num_devs - for re-invocation all the caller needs to do is increment the buffer pointer suitably. I.e. you can get away with just a count of devices afaict. This would require walking xc_hypercall_buffer_t-hbuf. Would something like set_xen_guest_handle_raw(sysctl..., (void *)HYPERCALL_BUFFER_AS_ARG(foo) + offset) be acceptable? I don't think I see anything better. I thought of adding set_xen_guest_handle_offset() that would look similar to set_xen_guest_handle() but then I felt that having this in API may not be a good idea since xc_hypercall_buffer_t-hbuf would end up pointing to memory that is not allocated for full xc_hypercall_buffer_t-sz. There ought to be a way to create a guest handle from other than the start of an allocated hypercall buffer, but that's really more a question for the tool stack folks. Yes, this was question for toolstack people. (And my second paragraph was not stated correctly, now that I re-read it. I meant to say that my understanding is that API is expected to make all safety checks on buffers and with set_xen_guest_handle_offset() that I was picturing in my head we could pass in pretty much any pointer. I suppose we could check 'hbuf+offset sz') -boris ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v3] x86: allow 64-bit PV guest kernels to suppress user mode exposure of M2P
Xen L4 entries being uniformly installed into any L4 table and 64-bit PV kernels running in ring 3 means that user mode was able to see the read-only M2P presented by Xen to the guests. While apparently not really representing an exploitable information leak, this still very certainly was never meant to be that way. Building on the fact that these guests already have separate kernel and user mode page tables we can allow guest kernels to tell Xen that they don't want user mode to see this table. We can't, however, do this by default: There is no ABI requirement that kernel and user mode page tables be separate. Therefore introduce a new VM-assist flag allowing the guest to control respective hypervisor behavior: - when not set, L4 tables get created with the respective slot blank, and whenever the L4 table gets used as a kernel one the missing mapping gets inserted, - when set, L4 tables get created with the respective slot initialized as before, and whenever the L4 table gets used as a user one the mapping gets zapped. Signed-off-by: Jan Beulich jbeul...@suse.com --- v3: Shadow mode changes re-written (to address the migration regression in v2). --- a/xen/arch/x86/domain.c +++ b/xen/arch/x86/domain.c @@ -339,7 +339,7 @@ static int setup_compat_l4(struct vcpu * l4tab = __map_domain_page(pg); clear_page(l4tab); -init_guest_l4_table(l4tab, v-domain); +init_guest_l4_table(l4tab, v-domain, 1); unmap_domain_page(l4tab); v-arch.guest_table = pagetable_from_page(pg); @@ -971,7 +971,11 @@ int arch_set_info_guest( case -EINTR: rc = -ERESTART; case -ERESTART: +break; case 0: +if ( !compat !VM_ASSIST(d, m2p_strict) + !paging_mode_refcounts(d) ) +fill_ro_mpt(cr3_gfn); break; default: if ( cr3_page == current-arch.old_guest_table ) @@ -1006,7 +1010,10 @@ int arch_set_info_guest( default: if ( cr3_page == current-arch.old_guest_table ) cr3_page = NULL; +break; case 0: +if ( VM_ASSIST(d, m2p_strict) ) +zap_ro_mpt(cr3_gfn); break; } } --- a/xen/arch/x86/domain_build.c +++ b/xen/arch/x86/domain_build.c @@ -1205,7 +1205,7 @@ int __init construct_dom0( l3start = __va(mpt_alloc); mpt_alloc += PAGE_SIZE; } clear_page(l4tab); -init_guest_l4_table(l4tab, d); +init_guest_l4_table(l4tab, d, 0); v-arch.guest_table = pagetable_from_paddr(__pa(l4start)); if ( is_pv_32on64_domain(d) ) v-arch.guest_table_user = v-arch.guest_table; --- a/xen/arch/x86/mm.c +++ b/xen/arch/x86/mm.c @@ -1380,7 +1380,8 @@ static int alloc_l3_table(struct page_in return rc 0 ? 0 : rc; } -void init_guest_l4_table(l4_pgentry_t l4tab[], const struct domain *d) +void init_guest_l4_table(l4_pgentry_t l4tab[], const struct domain *d, + bool_t zap_ro_mpt) { /* Xen private mappings. */ memcpy(l4tab[ROOT_PAGETABLE_FIRST_XEN_SLOT], @@ -1395,6 +1396,25 @@ void init_guest_l4_table(l4_pgentry_t l4 l4e_from_pfn(domain_page_map_to_mfn(l4tab), __PAGE_HYPERVISOR); l4tab[l4_table_offset(PERDOMAIN_VIRT_START)] = l4e_from_page(d-arch.perdomain_l3_pg, __PAGE_HYPERVISOR); +if ( zap_ro_mpt || is_pv_32on64_domain(d) || paging_mode_refcounts(d) ) +l4tab[l4_table_offset(RO_MPT_VIRT_START)] = l4e_empty(); +} + +void fill_ro_mpt(unsigned long mfn) +{ +l4_pgentry_t *l4tab = map_domain_page(mfn); + +l4tab[l4_table_offset(RO_MPT_VIRT_START)] = +idle_pg_table[l4_table_offset(RO_MPT_VIRT_START)]; +unmap_domain_page(l4tab); +} + +void zap_ro_mpt(unsigned long mfn) +{ +l4_pgentry_t *l4tab = map_domain_page(mfn); + +l4tab[l4_table_offset(RO_MPT_VIRT_START)] = l4e_empty(); +unmap_domain_page(l4tab); } static int alloc_l4_table(struct page_info *page) @@ -1444,7 +1464,7 @@ static int alloc_l4_table(struct page_in adjust_guest_l4e(pl4e[i], d); } -init_guest_l4_table(pl4e, d); +init_guest_l4_table(pl4e, d, !VM_ASSIST(d, m2p_strict)); unmap_domain_page(pl4e); return rc 0 ? 0 : rc; @@ -2754,6 +2774,8 @@ int new_guest_cr3(unsigned long mfn) invalidate_shadow_ldt(curr, 0); +if ( !VM_ASSIST(d, m2p_strict) !paging_mode_refcounts(d) ) +fill_ro_mpt(mfn); curr-arch.guest_table = pagetable_from_pfn(mfn); update_cr3(curr); @@ -3111,6 +3133,8 @@ long do_mmuext_op( op.arg1.mfn); break; } +if ( VM_ASSIST(d, m2p_strict) !paging_mode_refcounts(d) ) +zap_ro_mpt(op.arg1.mfn); } curr-arch.guest_table_user = pagetable_from_pfn(op.arg1.mfn); --- a/xen/arch/x86/mm/shadow/multi.c +++
Re: [Xen-devel] [PATCH 0/9] Porting the intel_pstate driver to Xen
On 24.04.15 at 16:56, wei.w.w...@intel.com wrote: On 24/04/2015 20:57, Jan Beulich wrote I'm not sure how else to express what I want (no matter how many internal governors the intel_pstate driver has). xenpm set-scaling-governor powersave xenpm set-scaling-governor ondemand xenpm set-scaling-governor performance each should switch the system into a respective state, no matter whether internally to the driver this means a change of governors or just a modification to {min,max}_pct. And obtaining the current state after any of the above should show the same governor in use that was set (and not internal), again no matter how this is being achieved internally to the driver. Thanks Jan, that's clear. But this will have another issue. For example, we set-scaling-governor to ondemand, then we adjust min_pct=max_pct = 60%. The timer function may generate results like 35%, 55%, 45%..., but the CPU just keeps running with 60%. So I must be misunderstanding something then: How can the driver do anything at all when told to run the system at 60%? Then, this is not ondemand at all (I think this should be another reason why the intel_pstate driver does not call its governor ondemand). The intel_pstate driver in the kernel has already got rid of the old governor convention. They let the user get what they want through simply adjusting the {min,max}_pct (the {min,max}_pct actually limits how the performance is selected). Adjusting the values individually to me very much looks like the userspace governor. I think we can follow the kernel implementation regarding this point, what do you think? Not sure - I'm not always convinced that what Linux does is the one and only and best way. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH] libxl: remove duplicate check for pci subsystem type
Both attach and detach functions get called only if the type matches. Signed-off-by: Olaf Hering o...@aepfle.de Cc: Jim Fehlig jfeh...@suse.com --- src/libxl/libxl_driver.c | 6 -- 1 file changed, 6 deletions(-) diff --git a/src/libxl/libxl_driver.c b/src/libxl/libxl_driver.c index d76efda..f915f91 100644 --- a/src/libxl/libxl_driver.c +++ b/src/libxl/libxl_driver.c @@ -2901,9 +2901,6 @@ libxlDomainAttachHostPCIDevice(libxlDriverPrivatePtr driver, libxl_device_pci_init(pcidev); -if (hostdev-source.subsys.type != VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI) -goto cleanup; - if (virDomainHostdevFind(vm-def, hostdev, found) = 0) { virReportError(VIR_ERR_OPERATION_FAILED, _(target pci device %.4x:%.2x:%.2x.%.1x already exists), @@ -3239,9 +3236,6 @@ libxlDomainDetachHostPCIDevice(libxlDriverPrivatePtr driver, libxl_device_pci_init(pcidev); -if (subsys-type != VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI) -goto cleanup; - idx = virDomainHostdevFind(vm-def, hostdev, detach); if (idx 0) { virReportError(VIR_ERR_OPERATION_FAILED, ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v3] x86: allow 64-bit PV guest kernels to suppress user mode exposure of M2P
On 24.04.15 at 17:54, andrew.coop...@citrix.com wrote: On 24/04/15 16:07, Jan Beulich wrote: On 24.04.15 at 16:57, andrew.coop...@citrix.com wrote: On 24/04/15 15:31, Jan Beulich wrote: Xen L4 entries being uniformly installed into any L4 table and 64-bit PV kernels running in ring 3 means that user mode was able to see the read-only M2P presented by Xen to the guests. While apparently not really representing an exploitable information leak, this still very certainly was never meant to be that way. Building on the fact that these guests already have separate kernel and user mode page tables we can allow guest kernels to tell Xen that they don't want user mode to see this table. We can't, however, do this by default: There is no ABI requirement that kernel and user mode page tables be separate. Therefore introduce a new VM-assist flag allowing the guest to control respective hypervisor behavior: - when not set, L4 tables get created with the respective slot blank, and whenever the L4 table gets used as a kernel one the missing mapping gets inserted, - when set, L4 tables get created with the respective slot initialized as before, and whenever the L4 table gets used as a user one the mapping gets zapped. Is this complete? Yes. For backwards compatibility, older kernels will not have m2p_strict set, and the m2p should unconditionally appear in all L4s. No. L4s _only_ used for user mode have no need for this entry to be non-zero. There is only ever a single L4 in a particular virtual address space. The M2P is part of the Xen ABI range. Previously, the M2P was present in all L4s which is why they leaked into user context. If we don not wish to break backwards compatibility with this change, then in relaxed mode the M2P must remain in all tables, or Userspace which is actually making use of mappings will suddenly start faulting because of a change in Xen, not a kernel change (and an unknowing kernel might not be prepared to handle this case). Userspace was never supposed to access this table, and hence revoking access to the table cannot be a problem there. By your description, in the relaxed case a newly created L4 which is first used as user table will have the mapping clear. The ordering doesn't matter. In relaxed mode, the first time a table gets used for kernel purposes the entry will be inserted (and hence potentially become visible to user mode). But anyway I'm confused that you're starting this discussion now, when the patch had already gone in and only needed to be reverted because of breaking migration. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 0/9] Porting the intel_pstate driver to Xen
On 24/04/2015 23:54, Jan Beulich wrote On 24.04.15 at 17:42, wei.w.w...@intel.com wrote: On 24/04/2015 23:04, Jan Beulich wrote On 24.04.15 at 16:56, wei.w.w...@intel.com wrote: On 24/04/2015 20:57, Jan Beulich wrote I'm not sure how else to express what I want (no matter how many internal governors the intel_pstate driver has). xenpm set-scaling-governor powersave xenpm set-scaling-governor ondemand xenpm set-scaling-governor performance each should switch the system into a respective state, no matter whether internally to the driver this means a change of governors or just a modification to {min,max}_pct. And obtaining the current state after any of the above should show the same governor in use that was set (and not internal), again no matter how this is being achieved internally to the driver. Thanks Jan, that's clear. But this will have another issue. For example, we set-scaling-governor to ondemand, then we adjust min_pct=max_pct = 60%. The timer function may generate results like 35%, 55%, 45%..., but the CPU just keeps running with 60%. So I must be misunderstanding something then: How can the driver do anything at all when told to run the system at 60%? The {min,max}_pct is a limit. That's clear. The question is whether these are hardware imposed limits or set by the user. In the latter case ... The hardware CPU has its own minimal performance limit, let's call it hw_min_pct. For example, on my machine, the hw_min_cpt=32% (corresponds to 1.2GHZ, which is the minimal frequency). The [min,max]_pct in our discussion is set by the user via xenpm. If min_pct is set to a value less than 32%, than it will be adjusted to hw_min_cpt. Then, this is not ondemand at all (I think this should be another reason why the intel_pstate driver does not call its governor ondemand). The intel_pstate driver in the kernel has already got rid of the old governor convention. They let the user get what they want through simply adjusting the {min,max}_pct (the {min,max}_pct actually limits how the performance is selected). Adjusting the values individually to me very much looks like the userspace governor. Yeah, that example was like userspace. Please take a look at this example: [min_pct=60%, max_pct=80%], the timer generates 45%, 55%, 65%, 70%, 75%, 90%, then the final target values will not be constant. (the 45%, 55%, and 90% here shouldn't happen in any case) Yes. The final values will be 60%, 60%, 65%, 70%, 75%, 80%. This is neither a userspace governor nor a ondemand governor. The ones (65%, 70%, 75%) falling into the limit interval behaves like ondemand, others are not. ... restricting to these would be the job of the driver, and - ondemand would be 0...100 - powersave would be 0 - performance would be 100 Everything else would be userspace. Please see the example above, the internal governor in the new driver is different from these conventional governors. Best, Wei ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v7 2/5] sysctl: Add sysctl interface for querying PCI topology
On 04/24/2015 10:09 AM, Boris Ostrovsky wrote: On 04/24/2015 03:19 AM, Jan Beulich wrote: On 24.04.15 at 00:20, boris.ostrov...@oracle.com wrote: On 04/21/2015 03:01 AM, Jan Beulich wrote: + ((++dev_cnt 0x3f) hypercall_preempt_check()) ) +break; +} + +if ( (!ret || (ret == -ENODEV)) + __copy_field_to_guest(u_sysctl, op, u.pcitopoinfo.first_dev) ) +ret = -EFAULT; +} +break; +#endif With the continuation-less model now used I don't think it makes sense to have first_dev and num_devs - for re-invocation all the caller needs to do is increment the buffer pointer suitably. I.e. you can get away with just a count of devices afaict. This would require walking xc_hypercall_buffer_t-hbuf. Would something like set_xen_guest_handle_raw(sysctl..., (void *)HYPERCALL_BUFFER_AS_ARG(foo) + offset) be acceptable? I don't think I see anything better. I thought of adding set_xen_guest_handle_offset() that would look similar to set_xen_guest_handle() but then I felt that having this in API may not be a good idea since xc_hypercall_buffer_t-hbuf would end up pointing to memory that is not allocated for full xc_hypercall_buffer_t-sz. There ought to be a way to create a guest handle from other than the start of an allocated hypercall buffer, but that's really more a question for the tool stack folks. Yes, this was question for toolstack people. (Adjusted TO/CC to reflect this). (And my second paragraph was not stated correctly, now that I re-read it. I meant to say that my understanding is that API is expected to make all safety checks on buffers and with set_xen_guest_handle_offset() that I was picturing in my head we could pass in pretty much any pointer. I suppose we could check 'hbuf+offset sz') -boris ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [RFC v1 00/15] Add VT-d Posted-Interrupts support
Ping.. Thanks, Feng -Original Message- From: Jan Beulich [mailto:jbeul...@suse.com] Sent: Monday, April 13, 2015 5:13 AM To: Wu, Feng Cc: Tian, Kevin; Zhang, Yang Z; xen-devel@lists.xen.org; k...@xen.org Subject: RE: [RFC v1 00/15] Add VT-d Posted-Interrupts support On 01.04.15 at 15:21, feng...@intel.com wrote: Hi Jan, Any more comments about this series? Thanks a lot! I think it makes little sense to wait for my review before posting v2 with issues addressed which others have pointed out. I just returned from vacation, and would have time to look at v1 next week the earliest (with the week after being occupied by travel). Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v6 03/14] x86: detect and initialize Intel CAT feature
On 23.04.15 at 11:55, chao.p.p...@linux.intel.com wrote: struct psr_cmt *__read_mostly psr_cmt; + +static unsigned long __read_mostly * cat_socket_init_bitmap; +static unsigned long __read_mostly * cat_socket_enable_bitmap; +static struct psr_cat_socket_info *__read_mostly cat_socket_info; So the last line looks right, and the line of context left in place is well formed too. Why did you not follow these good examples for the two newly added lines? And what is cat_socket_enable_bitmap needed for anyway? @@ -194,8 +210,67 @@ void psr_ctxt_switch_to(struct domain *d) } } +static void cat_cpu_init(void) +{ +unsigned int eax, ebx, ecx, edx; +struct psr_cat_socket_info *info; +unsigned int socket; +unsigned int cpu = smp_processor_id(); +const struct cpuinfo_x86 *c = cpu_data + cpu; + +if ( !cpu_has(c, X86_FEATURE_CAT) ) +return; + +socket = cpu_to_socket(cpu); +ASSERT(socket nr_sockets); + +info = cat_socket_info + socket; + +/* Avoid initializing more than one times for the same socket. */ +if ( test_and_set_bit(socket, cat_socket_init_bitmap) ) +return; So what if a package got offlined and then is being brought back online (perhaps after having got replaced)? +static void __init init_psr_cat(void) +{ +size_t nlongs = BITS_TO_LONGS(nr_sockets); + +cat_socket_init_bitmap = xzalloc_array(unsigned long, nlongs); +if ( !cat_socket_init_bitmap ) +return; + +cat_socket_enable_bitmap = xzalloc_array(unsigned long, nlongs); +if ( !cat_socket_enable_bitmap ) +{ +xfree(cat_socket_init_bitmap); +return; +} + +cat_socket_info = xzalloc_array(struct psr_cat_socket_info, nr_sockets); +if ( !cat_socket_info ) +{ +xfree(cat_socket_init_bitmap); +xfree(cat_socket_enable_bitmap); +} +} Please do all three allocs in a row, then check if any of them failed, and on the error path free and clear all three pointers. This will especially pay off if later further allocations get added here. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v3] x86: allow 64-bit PV guest kernels to suppress user mode exposure of M2P
On 24.04.15 at 16:57, andrew.coop...@citrix.com wrote: On 24/04/15 15:31, Jan Beulich wrote: Xen L4 entries being uniformly installed into any L4 table and 64-bit PV kernels running in ring 3 means that user mode was able to see the read-only M2P presented by Xen to the guests. While apparently not really representing an exploitable information leak, this still very certainly was never meant to be that way. Building on the fact that these guests already have separate kernel and user mode page tables we can allow guest kernels to tell Xen that they don't want user mode to see this table. We can't, however, do this by default: There is no ABI requirement that kernel and user mode page tables be separate. Therefore introduce a new VM-assist flag allowing the guest to control respective hypervisor behavior: - when not set, L4 tables get created with the respective slot blank, and whenever the L4 table gets used as a kernel one the missing mapping gets inserted, - when set, L4 tables get created with the respective slot initialized as before, and whenever the L4 table gets used as a user one the mapping gets zapped. Is this complete? Yes. For backwards compatibility, older kernels will not have m2p_strict set, and the m2p should unconditionally appear in all L4s. No. L4s _only_ used for user mode have no need for this entry to be non-zero. The difference between strict and relaxed mode is as described - in strict mode, L4s default to have the entry populated and tables used for user mode get it stripped, while in relaxed mode L4s default to the empty and get the entry populated when used for kernel mode. This guarantees that in non-strict mode all kernel page tables have the entry filled, while in strict mode it guarantees that all user page tables have it cleared. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v6 08/14] x86: add scheduling support for Intel CAT
On 23.04.15 at 11:55, chao.p.p...@linux.intel.com wrote: On context switch, write the the domain's Class of Service(COS) to MSR IA32_PQR_ASSOC, to notify hardware to use the new COS. For performance reason, the COS mask for current cpu is also cached in the local per-CPU variable. Signed-off-by: Chao Peng chao.p.p...@linux.intel.com Acked-by: Jan Beulich jbeul...@suse.com ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v3] x86: allow 64-bit PV guest kernels to suppress user mode exposure of M2P
On 24/04/15 15:31, Jan Beulich wrote: Xen L4 entries being uniformly installed into any L4 table and 64-bit PV kernels running in ring 3 means that user mode was able to see the read-only M2P presented by Xen to the guests. While apparently not really representing an exploitable information leak, this still very certainly was never meant to be that way. Building on the fact that these guests already have separate kernel and user mode page tables we can allow guest kernels to tell Xen that they don't want user mode to see this table. We can't, however, do this by default: There is no ABI requirement that kernel and user mode page tables be separate. Therefore introduce a new VM-assist flag allowing the guest to control respective hypervisor behavior: - when not set, L4 tables get created with the respective slot blank, and whenever the L4 table gets used as a kernel one the missing mapping gets inserted, - when set, L4 tables get created with the respective slot initialized as before, and whenever the L4 table gets used as a user one the mapping gets zapped. Is this complete? For backwards compatibility, older kernels will not have m2p_strict set, and the m2p should unconditionally appear in all L4s. If m2p_strict is set then the mapping should be zapped for user L4s. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Xen on CentoOS7
On 24/04/15 12:26, Benoit L'hoest wrote: Hello, I'm a student in computer science, during my internship i have to go and explore multipathing solution for several hypervisor. I'm looking into MPTCP (multi-path TCP) this is a extension of TCP protocols that allows to aggregate the bandwith of several connection. I'd like tu use MPTCP to abble to have a nice bandwith between the hypervisor and the NAS. That would be a nice software solution to avoid buying expensive 10 Gig or 100 Gig NICs. In order to make some tests i'd like to install MPTCP on Xenserver. There is a .rpm for CentOS 7 but Xenserver is on CentOS 5 and some dependancies are missing. I'm not confident enought to build my own supplemental pack for xenserver 6.5. I'd like to know if there is a easy way to add MPTCP to Xenserver ? Is there a CentOS 7 version that i can get ? Thanks for your help. Ben, PS : If you feel to help me a bit more here is a link to my support topic where i explain the problem i have when i try to boot on my recompiled kernel http://discussions.citrix.com/topic/364033-change-dom0-kernel/#entry1870428 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel Technical discussions specifically related to XenServer are best posted to xs-de...@lists.xenserver.org. There will be a XenServer alpha release based on CentOS7 coming out shortly, Simon ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 0/9] Porting the intel_pstate driver to Xen
On 24/04/2015 23:04, Jan Beulich wrote On 24.04.15 at 16:56, wei.w.w...@intel.com wrote: On 24/04/2015 20:57, Jan Beulich wrote I'm not sure how else to express what I want (no matter how many internal governors the intel_pstate driver has). xenpm set-scaling-governor powersave xenpm set-scaling-governor ondemand xenpm set-scaling-governor performance each should switch the system into a respective state, no matter whether internally to the driver this means a change of governors or just a modification to {min,max}_pct. And obtaining the current state after any of the above should show the same governor in use that was set (and not internal), again no matter how this is being achieved internally to the driver. Thanks Jan, that's clear. But this will have another issue. For example, we set-scaling-governor to ondemand, then we adjust min_pct=max_pct = 60%. The timer function may generate results like 35%, 55%, 45%..., but the CPU just keeps running with 60%. So I must be misunderstanding something then: How can the driver do anything at all when told to run the system at 60%? The {min,max}_pct is a limit. The timer function figures out a proper value based on the sampled statistics, then this value is clamped into [min_pct, max_pct]. When we have [60%, 60%], whatever the value from the timer function is, it will be finally adjusted to 60%, and set to the perf_ctl register. Then, this is not ondemand at all (I think this should be another reason why the intel_pstate driver does not call its governor ondemand). The intel_pstate driver in the kernel has already got rid of the old governor convention. They let the user get what they want through simply adjusting the {min,max}_pct (the {min,max}_pct actually limits how the performance is selected). Adjusting the values individually to me very much looks like the userspace governor. Yeah, that example was like userspace. Please take a look at this example: [min_pct=60%, max_pct=80%], the timer generates 45%, 55%, 65%, 70%, 75%, 90%, then the final target values will not be constant. The ones (65%, 70%, 75%) falling into the limit interval behaves like ondemand, others are not. I think we can follow the kernel implementation regarding this point, what do you think? Not sure - I'm not always convinced that what Linux does is the one and only and best way. Understand it. But I think that usage is good, in terms of supporting future intel processors (e.g. the hardware controlled P-states on Skylake+). The {min,max}_pct needs to be exposed to users to set the limits. Best, Wei ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 0/9] Porting the intel_pstate driver to Xen
On Fri, Apr 24, 2015 at 03:42:40PM +, Wang, Wei W wrote: On 24/04/2015 23:04, Jan Beulich wrote On 24.04.15 at 16:56, wei.w.w...@intel.com wrote: On 24/04/2015 20:57, Jan Beulich wrote I'm not sure how else to express what I want (no matter how many internal governors the intel_pstate driver has). xenpm set-scaling-governor powersave xenpm set-scaling-governor ondemand xenpm set-scaling-governor performance each should switch the system into a respective state, no matter whether internally to the driver this means a change of governors or just a modification to {min,max}_pct. And obtaining the current state after any of the above should show the same governor in use that was set (and not internal), again no matter how this is being achieved internally to the driver. Thanks Jan, that's clear. But this will have another issue. For example, we set-scaling-governor to ondemand, then we adjust min_pct=max_pct = 60%. The timer function may generate results like 35%, 55%, 45%..., but the CPU just keeps running with 60%. So I must be misunderstanding something then: How can the driver do anything at all when told to run the system at 60%? The {min,max}_pct is a limit. The timer function figures out a proper value based on the sampled statistics, then this value is clamped into [min_pct, max_pct]. When we have [60%, 60%], whatever the value from the timer function is, it will be finally adjusted to 60%, and set to the perf_ctl register. Then, this is not ondemand at all (I think this should be another reason why the intel_pstate driver does not call its governor ondemand). The intel_pstate driver in the kernel has already got rid of the old governor convention. They let the user get what they want through simply adjusting the {min,max}_pct (the {min,max}_pct actually limits how the performance is selected). Adjusting the values individually to me very much looks like the userspace governor. Yeah, that example was like userspace. Please take a look at this example: [min_pct=60%, max_pct=80%], the timer generates 45%, 55%, 65%, 70%, 75%, 90%, then the final target values will not be constant. The ones (65%, 70%, 75%) falling into the limit interval behaves like ondemand, others are not. I think we can follow the kernel implementation regarding this point, what do you think? Not sure - I'm not always convinced that what Linux does is the one and only and best way. Understand it. But I think that usage is good, in terms of supporting future intel processors (e.g. the hardware controlled P-states on Skylake+). The {min,max}_pct needs to be exposed to users to set the limits. How will this affect AMD processors which can use the cpufreq? Would the ondemand feature go away? Best, Wei ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v3 01/10] vmx: add new boot parameter to control PML enabling
On 24.04.15 at 10:19, kai.hu...@linux.intel.com wrote: --- a/xen/arch/x86/hvm/vmx/vmcs.c +++ b/xen/arch/x86/hvm/vmx/vmcs.c @@ -64,6 +64,36 @@ integer_param(ple_gap, ple_gap); static unsigned int __read_mostly ple_window = 4096; integer_param(ple_window, ple_window); +static bool_t __read_mostly opt_pml_enabled = 0; + +/* + * The 'ept' parameter controls functionalities that depend on, or impact the + * EPT mechanism. Optional comma separated value may contain: + * + * pml Enable PML + */ +static void __init parse_ept_param(char *s) +{ +char *ss; + +do { +bool_t val = !!strncmp(s, no-, 3); +if ( !val ) In case another round is needed, a blank line is missing above. +s += 3; + +ss = strchr(s, ','); +if ( ss ) +*ss = '\0'; + +if ( !strcmp(s, pml) ) +opt_pml_enabled = val; + +s = ss + 1; +} while ( ss ); +} + +custom_param(ept, parse_ept_param); And a superfluous blank line would want to be dropped here. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 0/9] Porting the intel_pstate driver to Xen
On 24/04/2015 20:57, Jan Beulich wrote On 24.04.15 at 13:53, wei.w.w...@intel.com wrote: On 24/04/2015 18:20, Jan Beulich wrote On 24.04.15 at 12:07, wei.w.w...@intel.com wrote: On 24/04/2015 17:56, Jan Beulich wrote: On 24.04.15 at 11:46, wei.w.w...@intel.com wrote: On 24/04/2015 17:11, Jan Beulich wrote: On 24.04.15 at 10:32, wei.w.w...@intel.com wrote: I think at the very least from a user interface perspective (e.g. the xenpm tool) the meaning of the old governor names should be retained as much as possible. Ok. I am simply using the name internal for user tools. Please see the example below: scaling_driver : intel_pstate scaling_avail_gov: internal current_governor: internal But xenpm's set-scaling-governor command should still do something useful for the currently available governor options. And with that, showing internal in its output may also end up being confusing (at least I am already being confused). The case is similar to that in the kernel. Xen has two pstate driver, but only one can be loaded. When the intel_pstate driver is used (scaling_driver : intel_pstate ), xenpm's set-scaling-governor will not take effect, since the intel_pstate only implements its internal governor. But that's precisely what I'm asking to be changed: Even if internally is uses its own governor, the user interface of the tool should still be usable to achieve similar effects. Yes, we should change that if we remain two governors in the intel_pstate driver. But as we discussed before, the internal Performance governor seems redundant and can be removed. In that case, there is no other option of governors that be selected by the user via set-scaling-governor. I'm not sure how else to express what I want (no matter how many internal governors the intel_pstate driver has). xenpm set-scaling-governor powersave xenpm set-scaling-governor ondemand xenpm set-scaling-governor performance each should switch the system into a respective state, no matter whether internally to the driver this means a change of governors or just a modification to {min,max}_pct. And obtaining the current state after any of the above should show the same governor in use that was set (and not internal), again no matter how this is being achieved internally to the driver. Thanks Jan, that's clear. But this will have another issue. For example, we set-scaling-governor to ondemand, then we adjust min_pct=max_pct = 60%. The timer function may generate results like 35%, 55%, 45%..., but the CPU just keeps running with 60%. Then, this is not ondemand at all (I think this should be another reason why the intel_pstate driver does not call its governor ondemand). The intel_pstate driver in the kernel has already got rid of the old governor convention. They let the user get what they want through simply adjusting the {min,max}_pct (the {min,max}_pct actually limits how the performance is selected). I think we can follow the kernel implementation regarding this point, what do you think? Best, Wei ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v6 06/14] x86: expose CBM length and COS number information
On 23.04.15 at 11:55, chao.p.p...@linux.intel.com wrote: --- a/xen/arch/x86/sysctl.c +++ b/xen/arch/x86/sysctl.c @@ -171,6 +171,24 @@ long arch_do_sysctl( break; +case XEN_SYSCTL_psr_cat_op: +switch ( sysctl-u.psr_cat_op.cmd ) +{ +case XEN_SYSCTL_PSR_CAT_get_l3_info: +ret = psr_get_cat_l3_info(sysctl-u.psr_cat_op.target, + sysctl-u.psr_cat_op.u.l3_info.cbm_len, + sysctl-u.psr_cat_op.u.l3_info.cos_max); + +if ( !ret __copy_to_guest(u_sysctl, sysctl, 1) ) Can't this be __copy_field_to_guest() for just the u.psr_cat_op part? Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH] OSSTEST: introduce a raisin build test
Signed-off-by: Stefano Stabellini stefano.stabell...@eu.citrix.com --- ap-common |5 +++ mfi-common | 19 ++ sg-run-job |5 +++ ts-raisin-build | 104 +++ 4 files changed, 133 insertions(+) create mode 100755 ts-raisin-build diff --git a/ap-common b/ap-common index 64749e3..985eeec 100644 --- a/ap-common +++ b/ap-common @@ -47,13 +47,18 @@ # rumpsrc-related runvars needed only for old rumpuser-xen # (ie ones which need $bodges=1 in ts-rumpuserxen-build) +: ${TREE_RAISIN:=git://xenbits.xen.org/people/sstabellini/raisin.git} +: ${DEFAULT_REVISION_RAISIN:=master} + : ${TREE_SEABIOS_UPSTREAM:=git://git.seabios.org/seabios.git} : ${PUSH_TREE_SEABIOS:=$XENBITS:/home/xen/git/osstest/seabios.git} : ${BASE_TREE_SEABIOS:=git://xenbits.xen.org/osstest/seabios.git} +: ${TREE_SEABIOS:=$TREE_SEABIOS_UPSTREAM} : ${TREE_OVMF_UPSTREAM:=https://github.com/tianocore/edk2.git} : ${PUSH_TREE_OVMF:=$XENBITS:/home/xen/git/osstest/ovmf.git} : ${BASE_TREE_OVMF:=git://xenbits.xen.org/osstest/ovmf.git} +: ${TREE_OVMF:=$TREE_OVMF_UPSTREAM} : ${TREE_LINUXFIRMWARE:=git://xenbits.xen.org/osstest/linux-firmware.git} : ${PUSH_TREE_LINUXFIRMWARE:=$XENBITS:/home/osstest/ext/linux-firmware.git} diff --git a/mfi-common b/mfi-common index 16fc8c5..051c9fc 100644 --- a/mfi-common +++ b/mfi-common @@ -215,6 +215,25 @@ create_build_jobs () { fi +if [ x$REVISION_RAISIN != xdisable ]; then + +./cs-job-create $flight build-$arch-raisin build-raisin \ +arch=$arch \ +tree_xen=$TREE_XEN \ +$RUNVARS $BUILD_RUNVARS $BUILD_RAISIN_RUNVARS $arch_runvars \ +$suite_runvars \ +host_hostflags=$build_hostflags \ +buildjob=${bfi}build-$arch \ +tree_raisin=$TREE_RAISIN \ +tree_qemuu=$TREE_QEMU_UPSTREAM \ +tree_qemu=$TREE_QEMU \ +tree_seabios=$TREE_SEABIOS \ +tree_libvirt=$TREE_LIBVIRT \ +tree_ovmf=$TREE_OVMF \ +revision_raisin=${REVISION_RAISIN:-${DEFAULT_REVISION_RAISIN}}\ + +fi + if branch_wants_rumpkernel_tests; then case $arch in diff --git a/sg-run-job b/sg-run-job index eae159d..449118d 100755 --- a/sg-run-job +++ b/sg-run-job @@ -346,6 +346,7 @@ proc need-hosts/build {} { return BUILD } proc need-hosts/build-kern {} { return BUILD } proc need-hosts/build-libvirt {} { return BUILD } proc need-hosts/build-rumpuserxen {} { return BUILD } +proc need-hosts/build-raisin {} { return BUILD } proc run-job/build {} { run-ts . = ts-xen-build @@ -364,6 +365,10 @@ proc run-job/build-rumpuserxen {} { run-ts . = ts-xen-build + host tools } +proc run-job/build-raisin {} { +run-ts . = ts-raisin-build +} + proc prepare-build-host {} { global jobinfo run-ts broken = ts-hosts-allocate + host diff --git a/ts-raisin-build b/ts-raisin-build new file mode 100755 index 000..f9b8ed6 --- /dev/null +++ b/ts-raisin-build @@ -0,0 +1,104 @@ +#!/usr/bin/perl -w +# This is part of osstest, an automated testing framework for Xen. +# Copyright (C) 2009-2013 Citrix Inc. +# +# This program is free software: you can redistribute it and/or modify +# it under the terms of the GNU Affero General Public License as published by +# the Free Software Foundation, either version 3 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU Affero General Public License for more details. +# +# You should have received a copy of the GNU Affero General Public License +# along with this program. If not, see http://www.gnu.org/licenses/. + +use strict qw(vars); +use DBI; +use Osstest; +use File::Path; +use POSIX; +use Osstest::TestSupport; +use Osstest::BuildSupport; + +tsreadconfig(); +selectbuildhost(\@ARGV); +# remaining arguments are passed as targets to make +builddirsprops(); + +sub checkout () { +prepbuilddirs(); +build_clone($ho, 'raisin', $builddir, 'raisin'); + +target_cmd_build($ho, 100, $builddir, END); +cd $builddir/raisin + config + +echo config ENABLED_COMPONENTS=\\seabios ovmf xen qemu qemu_traditional libvirt\\ +echo config MAKE=\\make $makeflags\\ +echo config PREFIX=\\/usr\\ +echo config DESTDIR=dist + +echo config
Re: [Xen-devel] [PATCH 0/9] Porting the intel_pstate driver to Xen
On 25/04/2015 00:15, Konrad Rzeszutek Wilk wrote On Fri, Apr 24, 2015 at 03:42:40PM +, Wang, Wei W wrote: On 24/04/2015 23:04, Jan Beulich wrote On 24.04.15 at 16:56, wei.w.w...@intel.com wrote: On 24/04/2015 20:57, Jan Beulich wrote I'm not sure how else to express what I want (no matter how many internal governors the intel_pstate driver has). xenpm set-scaling-governor powersave xenpm set-scaling-governor ondemand xenpm set-scaling-governor performance each should switch the system into a respective state, no matter whether internally to the driver this means a change of governors or just a modification to {min,max}_pct. And obtaining the current state after any of the above should show the same governor in use that was set (and not internal), again no matter how this is being achieved internally to the driver. Thanks Jan, that's clear. But this will have another issue. For example, we set-scaling-governor to ondemand, then we adjust min_pct=max_pct = 60%. The timer function may generate results like 35%, 55%, 45%..., but the CPU just keeps running with 60%. So I must be misunderstanding something then: How can the driver do anything at all when told to run the system at 60%? The {min,max}_pct is a limit. The timer function figures out a proper value based on the sampled statistics, then this value is clamped into [min_pct, max_pct]. When we have [60%, 60%], whatever the value from the timer function is, it will be finally adjusted to 60%, and set to the perf_ctl register. Then, this is not ondemand at all (I think this should be another reason why the intel_pstate driver does not call its governor ondemand). The intel_pstate driver in the kernel has already got rid of the old governor convention. They let the user get what they want through simply adjusting the {min,max}_pct (the {min,max}_pct actually limits how the performance is selected). Adjusting the values individually to me very much looks like the userspace governor. Yeah, that example was like userspace. Please take a look at this example: [min_pct=60%, max_pct=80%], the timer generates 45%, 55%, 65%, 70%, 75%, 90%, then the final target values will not be constant. The ones (65%, 70%, 75%) falling into the limit interval behaves like ondemand, others are not. I think we can follow the kernel implementation regarding this point, what do you think? Not sure - I'm not always convinced that what Linux does is the one and only and best way. Understand it. But I think that usage is good, in terms of supporting future intel processors (e.g. the hardware controlled P-states on Skylake+). The {min,max}_pct needs to be exposed to users to set the limits. How will this affect AMD processors which can use the cpufreq? Would the ondemand feature go away? No, this won't affect them. When the intel_pstate=disable is added to the booting parameter list, the old cpufreq driver will be used, and everything, including xenpm, will work in the old style. The new driver, intel_pstate, actually works in a mode similar to the ondemand (probably can be called enhanced ondemand - the user can set a limit range for the ondemand). Best, Wei ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH 2/3] x86/hvm: introduce functions for HVMOP_get/set_param allowance checks
Some parameters can only (validly) be set once. Some cannot be set by a guest for its own domain. Consolidate these checks, along with the XSM check, in a new hvm_allow_set_param() function for clarity. Also, introduce hvm_allow_get_param() for similar reasons. Signed-off-by: Paul Durrant paul.durr...@citrix.com Cc: Keir Fraser k...@xen.org Cc: Jan Beulich jbeul...@suse.com Cc: Andrew Cooper andrew.coop...@citrix.com --- xen/arch/x86/hvm/hvm.c | 141 +++- 1 file changed, 79 insertions(+), 62 deletions(-) diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c index 87c47b1..23c604d 100644 --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -5640,6 +5640,57 @@ static int hvmop_set_evtchn_upcall_vector( return 0; } +static int hvm_allow_set_param(struct domain *d, + struct xen_hvm_param *a) +{ +uint64_t value = d-arch.hvm_domain.params[a-index]; +int rc = 0; + +rc = xsm_hvm_param(XSM_TARGET, d, HVMOP_set_param); +if ( rc ) +return rc; + +/* The following parameters cannot be set by the guest */ +switch (a-index) +{ +case HVM_PARAM_VIRIDIAN: +case HVM_PARAM_IDENT_PT: +case HVM_PARAM_DM_DOMAIN: +case HVM_PARAM_ACPI_S_STATE: +case HVM_PARAM_MEMORY_EVENT_CR0: +case HVM_PARAM_MEMORY_EVENT_CR3: +case HVM_PARAM_MEMORY_EVENT_CR4: +case HVM_PARAM_MEMORY_EVENT_INT3: +case HVM_PARAM_MEMORY_EVENT_SINGLE_STEP: +case HVM_PARAM_MEMORY_EVENT_MSR: +case HVM_PARAM_IOREQ_SERVER_PFN: +case HVM_PARAM_NR_IOREQ_SERVER_PAGES: +if ( d == current-domain ) +rc = -EPERM; +break; +default: +break; +} + +if ( rc ) +return rc; + +/* The following parameters can only be changed once */ +switch (a-index) +{ +case HVM_PARAM_VIRIDIAN: +case HVM_PARAM_IOREQ_SERVER_PFN: +case HVM_PARAM_NR_IOREQ_SERVER_PAGES: +if (value != 0 a-value != value) +rc = -EEXIST; +break; +default: +break; +} + +return rc; +} + static int hvmop_set_param( XEN_GUEST_HANDLE_PARAM(xen_hvm_param_t) arg) { @@ -5666,7 +5717,7 @@ static int hvmop_set_param( (a.index != HVM_PARAM_CALLBACK_IRQ) ) goto out; -rc = xsm_hvm_param(XSM_TARGET, d, HVMOP_set_param); +rc = hvm_allow_set_param(d, a); if ( rc ) goto out; @@ -5681,31 +5732,11 @@ static int hvmop_set_param( rc = -EINVAL; break; case HVM_PARAM_VIRIDIAN: -/* This should only ever be set once by the tools and read by the guest. */ -rc = -EPERM; -if ( d == current-domain ) -break; - -if ( a.value != d-arch.hvm_domain.params[a.index] ) -{ -rc = -EEXIST; -if ( d-arch.hvm_domain.params[a.index] != 0 ) -break; - +if ( (a.value ~HVMPV_feature_mask) || + !(a.value HVMPV_base_freq) ) rc = -EINVAL; -if ( (a.value ~HVMPV_feature_mask) || - !(a.value HVMPV_base_freq) ) -break; -} - -rc = 0; break; case HVM_PARAM_IDENT_PT: -/* Not reflexive, as we must domain_pause(). */ -rc = -EPERM; -if ( d == current-domain ) -break; - rc = -EINVAL; if ( d-arch.hvm_domain.params[a.index] != 0 ) break; @@ -5733,22 +5764,12 @@ static int hvmop_set_param( domctl_lock_release(); break; case HVM_PARAM_DM_DOMAIN: -/* Not reflexive, as we may need to domain_pause(). */ -rc = -EPERM; -if ( d == current-domain ) -break; - if ( a.value == DOMID_SELF ) a.value = current-domain-domain_id; rc = hvm_set_dm_domain(d, a.value); break; case HVM_PARAM_ACPI_S_STATE: -/* Not reflexive, as we must domain_pause(). */ -rc = -EPERM; -if ( current-domain == d ) -break; - rc = 0; if ( a.value == 3 ) hvm_s3_suspend(d); @@ -5761,20 +5782,9 @@ static int hvmop_set_param( case HVM_PARAM_ACPI_IOPORTS_LOCATION: rc = pmtimer_change_ioport(d, a.value); break; -case HVM_PARAM_MEMORY_EVENT_CR0: -case HVM_PARAM_MEMORY_EVENT_CR3: -case HVM_PARAM_MEMORY_EVENT_CR4: -if ( d == current-domain ) -rc = -EPERM; -break; case HVM_PARAM_MEMORY_EVENT_INT3: case HVM_PARAM_MEMORY_EVENT_SINGLE_STEP: case HVM_PARAM_MEMORY_EVENT_MSR: -if ( d == current-domain ) -{ -rc = -EPERM; -break; -} if ( a.value HVMPME_onchangeonly ) rc = -EINVAL; break; @@ -5807,22 +5817,12 @@ static int hvmop_set_param( rc = -EINVAL; break; case HVM_PARAM_IOREQ_SERVER_PFN: -if ( d == current-domain ) -{ -
Re: [Xen-devel] [PATCH 1/3] x86/hvm: give HVMOP_set_param and HVMOP_get_param their own functions
On 24/04/15 17:21, Paul Durrant wrote: The level of switch nesting in those ops is getting unreadable. Giving them their own functions does introduce some code duplication in the the pre-op checks but the overall result is easier to follow. This patch is code movement. There is no functional change. Signed-off-by: Paul Durrant paul.durr...@citrix.com Cc: Keir Fraser k...@xen.org Cc: Jan Beulich jbeul...@suse.com Cc: Andrew Cooper andrew.coop...@citrix.com Please latch current at the top of the function, and fix Xen style during motion. (newlines between break and case lines, drop superfluous braces, brace layout around get bufiorq_evtchn). With that fixed, Reviewed-by: Andrew Cooper andrew.coop...@citrix.com --- xen/arch/x86/hvm/hvm.c | 562 ++-- 1 file changed, 300 insertions(+), 262 deletions(-) diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c index 3c62b80..87c47b1 100644 --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -5640,6 +5640,300 @@ static int hvmop_set_evtchn_upcall_vector( return 0; } +static int hvmop_set_param( +XEN_GUEST_HANDLE_PARAM(xen_hvm_param_t) arg) +{ +struct xen_hvm_param a; +struct domain *d; +struct vcpu *v; +int rc = 0; + +if ( copy_from_guest(a, arg, 1) ) +return -EFAULT; + +if ( a.index = HVM_NR_PARAMS ) +return -EINVAL; + +d = rcu_lock_domain_by_any_id(a.domid); +if ( d == NULL ) +return -ESRCH; + +rc = -EINVAL; +if ( !has_hvm_container_domain(d) ) +goto out; + +if ( is_pvh_domain(d) + (a.index != HVM_PARAM_CALLBACK_IRQ) ) +goto out; + +rc = xsm_hvm_param(XSM_TARGET, d, HVMOP_set_param); +if ( rc ) +goto out; + +switch ( a.index ) +{ +case HVM_PARAM_CALLBACK_IRQ: +hvm_set_callback_via(d, a.value); +hvm_latch_shinfo_size(d); +break; +case HVM_PARAM_TIMER_MODE: +if ( a.value HVMPTM_one_missed_tick_pending ) +rc = -EINVAL; +break; +case HVM_PARAM_VIRIDIAN: +/* This should only ever be set once by the tools and read by the guest. */ +rc = -EPERM; +if ( d == current-domain ) +break; + +if ( a.value != d-arch.hvm_domain.params[a.index] ) +{ +rc = -EEXIST; +if ( d-arch.hvm_domain.params[a.index] != 0 ) +break; + +rc = -EINVAL; +if ( (a.value ~HVMPV_feature_mask) || + !(a.value HVMPV_base_freq) ) +break; +} + +rc = 0; +break; +case HVM_PARAM_IDENT_PT: +/* Not reflexive, as we must domain_pause(). */ +rc = -EPERM; +if ( d == current-domain ) +break; + +rc = -EINVAL; +if ( d-arch.hvm_domain.params[a.index] != 0 ) +break; + +rc = 0; +if ( !paging_mode_hap(d) ) +break; + +/* + * Update GUEST_CR3 in each VMCS to point at identity map. + * All foreign updates to guest state must synchronise on + * the domctl_lock. + */ +rc = -ERESTART; +if ( !domctl_lock_acquire() ) +break; + +rc = 0; +domain_pause(d); +d-arch.hvm_domain.params[a.index] = a.value; +for_each_vcpu ( d, v ) +paging_update_cr3(v); +domain_unpause(d); + +domctl_lock_release(); +break; +case HVM_PARAM_DM_DOMAIN: +/* Not reflexive, as we may need to domain_pause(). */ +rc = -EPERM; +if ( d == current-domain ) +break; + +if ( a.value == DOMID_SELF ) +a.value = current-domain-domain_id; + +rc = hvm_set_dm_domain(d, a.value); +break; +case HVM_PARAM_ACPI_S_STATE: +/* Not reflexive, as we must domain_pause(). */ +rc = -EPERM; +if ( current-domain == d ) +break; + +rc = 0; +if ( a.value == 3 ) +hvm_s3_suspend(d); +else if ( a.value == 0 ) +hvm_s3_resume(d); +else +rc = -EINVAL; + +break; +case HVM_PARAM_ACPI_IOPORTS_LOCATION: +rc = pmtimer_change_ioport(d, a.value); +break; +case HVM_PARAM_MEMORY_EVENT_CR0: +case HVM_PARAM_MEMORY_EVENT_CR3: +case HVM_PARAM_MEMORY_EVENT_CR4: +if ( d == current-domain ) +rc = -EPERM; +break; +case HVM_PARAM_MEMORY_EVENT_INT3: +case HVM_PARAM_MEMORY_EVENT_SINGLE_STEP: +case HVM_PARAM_MEMORY_EVENT_MSR: +if ( d == current-domain ) +{ +rc = -EPERM; +break; +} +if ( a.value HVMPME_onchangeonly ) +rc = -EINVAL; +break; +case HVM_PARAM_NESTEDHVM: +
Re: [Xen-devel] [PATCH 0/2] x86/hvm: HVMOP_get/set_param improvements
The subject line should obviously be prefixed [PATCH 0/3], but the content is at least correct. Apologies for that, Paul -Original Message- From: Paul Durrant [mailto:paul.durr...@citrix.com] Sent: 24 April 2015 17:21 To: xen-de...@lists.xenproject.org Cc: Paul Durrant Subject: [PATCH 0/2] x86/hvm: HVMOP_get/set_param improvements The following 3 patches re-structure the code implementing HVMOP_set_param and HVMOP_get_param. Patch #1 gives each operation its own function Patch #2 splits out checks for getting/setting non-reflexive params and setting params with change-once semantics, as well as the XSM check into separate functions Patch #3 adds all remaining ioreq server params to non-reflexive lists ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH 1/3] x86/hvm: give HVMOP_set_param and HVMOP_get_param their own functions
The level of switch nesting in those ops is getting unreadable. Giving them their own functions does introduce some code duplication in the the pre-op checks but the overall result is easier to follow. This patch is code movement. There is no functional change. Signed-off-by: Paul Durrant paul.durr...@citrix.com Cc: Keir Fraser k...@xen.org Cc: Jan Beulich jbeul...@suse.com Cc: Andrew Cooper andrew.coop...@citrix.com --- xen/arch/x86/hvm/hvm.c | 562 ++-- 1 file changed, 300 insertions(+), 262 deletions(-) diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c index 3c62b80..87c47b1 100644 --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -5640,6 +5640,300 @@ static int hvmop_set_evtchn_upcall_vector( return 0; } +static int hvmop_set_param( +XEN_GUEST_HANDLE_PARAM(xen_hvm_param_t) arg) +{ +struct xen_hvm_param a; +struct domain *d; +struct vcpu *v; +int rc = 0; + +if ( copy_from_guest(a, arg, 1) ) +return -EFAULT; + +if ( a.index = HVM_NR_PARAMS ) +return -EINVAL; + +d = rcu_lock_domain_by_any_id(a.domid); +if ( d == NULL ) +return -ESRCH; + +rc = -EINVAL; +if ( !has_hvm_container_domain(d) ) +goto out; + +if ( is_pvh_domain(d) + (a.index != HVM_PARAM_CALLBACK_IRQ) ) +goto out; + +rc = xsm_hvm_param(XSM_TARGET, d, HVMOP_set_param); +if ( rc ) +goto out; + +switch ( a.index ) +{ +case HVM_PARAM_CALLBACK_IRQ: +hvm_set_callback_via(d, a.value); +hvm_latch_shinfo_size(d); +break; +case HVM_PARAM_TIMER_MODE: +if ( a.value HVMPTM_one_missed_tick_pending ) +rc = -EINVAL; +break; +case HVM_PARAM_VIRIDIAN: +/* This should only ever be set once by the tools and read by the guest. */ +rc = -EPERM; +if ( d == current-domain ) +break; + +if ( a.value != d-arch.hvm_domain.params[a.index] ) +{ +rc = -EEXIST; +if ( d-arch.hvm_domain.params[a.index] != 0 ) +break; + +rc = -EINVAL; +if ( (a.value ~HVMPV_feature_mask) || + !(a.value HVMPV_base_freq) ) +break; +} + +rc = 0; +break; +case HVM_PARAM_IDENT_PT: +/* Not reflexive, as we must domain_pause(). */ +rc = -EPERM; +if ( d == current-domain ) +break; + +rc = -EINVAL; +if ( d-arch.hvm_domain.params[a.index] != 0 ) +break; + +rc = 0; +if ( !paging_mode_hap(d) ) +break; + +/* + * Update GUEST_CR3 in each VMCS to point at identity map. + * All foreign updates to guest state must synchronise on + * the domctl_lock. + */ +rc = -ERESTART; +if ( !domctl_lock_acquire() ) +break; + +rc = 0; +domain_pause(d); +d-arch.hvm_domain.params[a.index] = a.value; +for_each_vcpu ( d, v ) +paging_update_cr3(v); +domain_unpause(d); + +domctl_lock_release(); +break; +case HVM_PARAM_DM_DOMAIN: +/* Not reflexive, as we may need to domain_pause(). */ +rc = -EPERM; +if ( d == current-domain ) +break; + +if ( a.value == DOMID_SELF ) +a.value = current-domain-domain_id; + +rc = hvm_set_dm_domain(d, a.value); +break; +case HVM_PARAM_ACPI_S_STATE: +/* Not reflexive, as we must domain_pause(). */ +rc = -EPERM; +if ( current-domain == d ) +break; + +rc = 0; +if ( a.value == 3 ) +hvm_s3_suspend(d); +else if ( a.value == 0 ) +hvm_s3_resume(d); +else +rc = -EINVAL; + +break; +case HVM_PARAM_ACPI_IOPORTS_LOCATION: +rc = pmtimer_change_ioport(d, a.value); +break; +case HVM_PARAM_MEMORY_EVENT_CR0: +case HVM_PARAM_MEMORY_EVENT_CR3: +case HVM_PARAM_MEMORY_EVENT_CR4: +if ( d == current-domain ) +rc = -EPERM; +break; +case HVM_PARAM_MEMORY_EVENT_INT3: +case HVM_PARAM_MEMORY_EVENT_SINGLE_STEP: +case HVM_PARAM_MEMORY_EVENT_MSR: +if ( d == current-domain ) +{ +rc = -EPERM; +break; +} +if ( a.value HVMPME_onchangeonly ) +rc = -EINVAL; +break; +case HVM_PARAM_NESTEDHVM: +rc = xsm_hvm_param_nested(XSM_PRIV, d); +if ( rc ) +break; +if ( a.value 1 ) +rc = -EINVAL; +/* Remove the check below once we have + * shadow-on-shadow. + */ +if ( cpu_has_svm !paging_mode_hap(d) a.value ) +rc = -EINVAL; +/* Set up NHVM state for any vcpus that are already up */ +if ( a.value +
[Xen-devel] [PATCH 3/3] x86/hvm: disallow guest get and set of all ioreq server HVM params
A guest has no need to touch these parameters and reading HVM_PARAM_IOREQ_PFN, HVM_PARAM_BUFIOREQ_PFN, or HVM_PARAM_BUFIOREQ_EVTCHN may cause Xen to create a default ioreq server where one did not already exist. Signed-off-by: Paul Durrant paul.durr...@citrix.com Cc: Keir Fraser k...@xen.org Cc: Jan Beulich jbeul...@suse.com Cc: Andrew Cooper andrew.coop...@citrix.com --- xen/arch/x86/hvm/hvm.c |7 +++ 1 file changed, 7 insertions(+) diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c index 23c604d..b51c1d5 100644 --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -5665,6 +5665,9 @@ static int hvm_allow_set_param(struct domain *d, case HVM_PARAM_MEMORY_EVENT_MSR: case HVM_PARAM_IOREQ_SERVER_PFN: case HVM_PARAM_NR_IOREQ_SERVER_PAGES: +case HVM_PARAM_IOREQ_PFN: +case HVM_PARAM_BUFIOREQ_PFN: +case HVM_PARAM_BUFIOREQ_EVTCHN: if ( d == current-domain ) rc = -EPERM; break; @@ -5880,6 +5883,10 @@ static int hvm_allow_get_param(struct domain *d, { case HVM_PARAM_IOREQ_SERVER_PFN: case HVM_PARAM_NR_IOREQ_SERVER_PAGES: +case HVM_PARAM_IOREQ_PFN: +case HVM_PARAM_BUFIOREQ_PFN: +case HVM_PARAM_BUFIOREQ_EVTCHN: +case HVM_PARAM_DM_DOMAIN: if ( d == current-domain ) rc = -EPERM; break; -- 1.7.10.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Archiving Xen on ARM and PVOPS subprojects
On 24 Apr 2015, at 16:22, Lars Kurth lars.kurth@gmail.com wrote: On 22 Apr 2015, at 14:04, Ian Jackson ian.jack...@eu.citrix.com wrote: Lars Kurth writes (Re: Archiving Xen on ARM and PVOPS subprojects): Any other votes by committers (in the TO list) before I tally the votes? +1 to both. We had several +1's and no -1's. So this vote carries Lars Just as a note, I removed the PVOPS and ARM pages from the Developer menu and http://www.xenproject.org/developers/teams.html and merged relevant content into http://www.xenproject.org/developers/teams/hypervisor.html I will do the PVOPS wiki stuff at some point when back from Shanghai Regards Lars ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 2/3] x86/hvm: introduce functions for HVMOP_get/set_param allowance checks
On 24/04/15 17:21, Paul Durrant wrote: Some parameters can only (validly) be set once. Some cannot be set by a guest for its own domain. Consolidate these checks, along with the XSM check, in a new hvm_allow_set_param() function for clarity. Also, introduce hvm_allow_get_param() for similar reasons. Signed-off-by: Paul Durrant paul.durr...@citrix.com Cc: Keir Fraser k...@xen.org Cc: Jan Beulich jbeul...@suse.com Cc: Andrew Cooper andrew.coop...@citrix.com --- xen/arch/x86/hvm/hvm.c | 141 +++- 1 file changed, 79 insertions(+), 62 deletions(-) diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c index 87c47b1..23c604d 100644 --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -5640,6 +5640,57 @@ static int hvmop_set_evtchn_upcall_vector( return 0; } +static int hvm_allow_set_param(struct domain *d, + struct xen_hvm_param *a) const +{ +uint64_t value = d-arch.hvm_domain.params[a-index]; +int rc = 0; No need for the 0 initialiser (and I know Coverity will complain about it). + +rc = xsm_hvm_param(XSM_TARGET, d, HVMOP_set_param); +if ( rc ) +return rc; + +/* The following parameters cannot be set by the guest */ +switch (a-index) style +{ +case HVM_PARAM_VIRIDIAN: +case HVM_PARAM_IDENT_PT: +case HVM_PARAM_DM_DOMAIN: +case HVM_PARAM_ACPI_S_STATE: +case HVM_PARAM_MEMORY_EVENT_CR0: +case HVM_PARAM_MEMORY_EVENT_CR3: +case HVM_PARAM_MEMORY_EVENT_CR4: +case HVM_PARAM_MEMORY_EVENT_INT3: +case HVM_PARAM_MEMORY_EVENT_SINGLE_STEP: +case HVM_PARAM_MEMORY_EVENT_MSR: +case HVM_PARAM_IOREQ_SERVER_PFN: +case HVM_PARAM_NR_IOREQ_SERVER_PAGES: +if ( d == current-domain ) +rc = -EPERM; +break; +default: +break; Please can we turn the guest-settable list into a whitelist rather than a blacklist. We have had several XSAs in the past where a guest has been able to modify a value it shouldn't have because new params were default writable by everyone. +} + +if ( rc ) +return rc; + +/* The following parameters can only be changed once */ +switch (a-index) +{ +case HVM_PARAM_VIRIDIAN: +case HVM_PARAM_IOREQ_SERVER_PFN: +case HVM_PARAM_NR_IOREQ_SERVER_PAGES: +if (value != 0 a-value != value) +rc = -EEXIST; +break; +default: +break; +} + +return rc; +} + static int hvmop_set_param( XEN_GUEST_HANDLE_PARAM(xen_hvm_param_t) arg) { @@ -5666,7 +5717,7 @@ static int hvmop_set_param( (a.index != HVM_PARAM_CALLBACK_IRQ) ) goto out; -rc = xsm_hvm_param(XSM_TARGET, d, HVMOP_set_param); +rc = hvm_allow_set_param(d, a); if ( rc ) goto out; @@ -5681,31 +5732,11 @@ static int hvmop_set_param( rc = -EINVAL; break; case HVM_PARAM_VIRIDIAN: -/* This should only ever be set once by the tools and read by the guest. */ -rc = -EPERM; -if ( d == current-domain ) -break; - -if ( a.value != d-arch.hvm_domain.params[a.index] ) -{ -rc = -EEXIST; -if ( d-arch.hvm_domain.params[a.index] != 0 ) -break; - +if ( (a.value ~HVMPV_feature_mask) || + !(a.value HVMPV_base_freq) ) rc = -EINVAL; -if ( (a.value ~HVMPV_feature_mask) || - !(a.value HVMPV_base_freq) ) -break; -} - -rc = 0; break; case HVM_PARAM_IDENT_PT: -/* Not reflexive, as we must domain_pause(). */ -rc = -EPERM; -if ( d == current-domain ) -break; - rc = -EINVAL; if ( d-arch.hvm_domain.params[a.index] != 0 ) break; @@ -5733,22 +5764,12 @@ static int hvmop_set_param( domctl_lock_release(); break; case HVM_PARAM_DM_DOMAIN: -/* Not reflexive, as we may need to domain_pause(). */ -rc = -EPERM; -if ( d == current-domain ) -break; - if ( a.value == DOMID_SELF ) a.value = current-domain-domain_id; rc = hvm_set_dm_domain(d, a.value); break; case HVM_PARAM_ACPI_S_STATE: -/* Not reflexive, as we must domain_pause(). */ -rc = -EPERM; -if ( current-domain == d ) -break; - rc = 0; if ( a.value == 3 ) hvm_s3_suspend(d); @@ -5761,20 +5782,9 @@ static int hvmop_set_param( case HVM_PARAM_ACPI_IOPORTS_LOCATION: rc = pmtimer_change_ioport(d, a.value); break; -case HVM_PARAM_MEMORY_EVENT_CR0: -case HVM_PARAM_MEMORY_EVENT_CR3: -case HVM_PARAM_MEMORY_EVENT_CR4: -if ( d == current-domain ) -rc =
[Xen-devel] [seabios test] 51929: regressions - FAIL
flight 51929 seabios real [real] http://logs.test-lab.xenproject.org/osstest/logs/51929/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-i386-freebsd10-amd64 15 guest-localmigrate.2 fail REGR. vs. 36656 test-amd64-i386-freebsd10-i386 16 guest-localmigrate/x10 fail REGR. vs. 36656 test-amd64-i386-pair 21 guest-migrate/src_host/dst_host fail REGR. vs. 36656 test-amd64-amd64-xl-win7-amd64 9 windows-install fail REGR. vs. 36656 test-amd64-i386-xl-qemuu-win7-amd64 9 windows-installfail REGR. vs. 36656 Tests which did not succeed, but are not blocking: test-amd64-amd64-xl-qemut-debianhvm-amd64-xsm 9 debian-hvm-install fail never pass test-amd64-amd64-libvirt-xsm 11 guest-start fail never pass test-amd64-i386-xl-qemuu-debianhvm-amd64-xsm 9 debian-hvm-install fail never pass test-amd64-amd64-xl-pvh-intel 11 guest-start fail never pass test-amd64-amd64-xl-qemuu-debianhvm-amd64-xsm 9 debian-hvm-install fail never pass test-amd64-i386-xl-qemut-debianhvm-amd64-xsm 9 debian-hvm-install fail never pass test-amd64-amd64-xl-pvh-amd 11 guest-start fail never pass test-amd64-i386-xl-xsm 11 guest-start fail never pass test-amd64-amd64-xl-xsm 11 guest-start fail never pass test-amd64-amd64-libvirt 12 migrate-support-checkfail never pass test-amd64-i386-libvirt 12 migrate-support-checkfail never pass test-amd64-i386-xl-qemut-winxpsp3-vcpus1 16 guest-stop fail never pass test-amd64-i386-xl-win7-amd64 16 guest-stop fail never pass test-amd64-amd64-xl-qemuu-win7-amd64 16 guest-stop fail never pass test-amd64-i386-xl-qemut-win7-amd64 16 guest-stop fail never pass test-amd64-i386-xl-qemut-winxpsp3 16 guest-stopfail never pass test-amd64-i386-xl-qemuu-winxpsp3 16 guest-stopfail never pass test-amd64-amd64-xl-qemut-win7-amd64 16 guest-stop fail never pass test-amd64-amd64-xl-qemut-winxpsp3 16 guest-stop fail never pass test-amd64-amd64-xl-qemuu-winxpsp3 16 guest-stop fail never pass test-amd64-i386-libvirt-xsm 11 guest-start fail never pass test-amd64-i386-xl-winxpsp3-vcpus1 16 guest-stop fail never pass test-amd64-i386-xl-qemuu-winxpsp3-vcpus1 16 guest-stop fail never pass test-amd64-amd64-xl-winxpsp3 16 guest-stop fail never pass test-amd64-i386-xl-winxpsp3 16 guest-stop fail never pass version targeted for testing: seabios 0b2165d191f51ed2a522142c3ed3db55bc852627 baseline version: seabios b4581224824871ad2909f84fc4a9e067cda663f2 People who touched revisions under test: Bruce Rogers brog...@suse.com Kevin O'Connor ke...@koconnor.net jobs: build-amd64-xsm pass build-i386-xsm pass build-amd64 pass build-i386 pass build-amd64-libvirt pass build-i386-libvirt pass build-amd64-pvopspass build-i386-pvops pass test-amd64-amd64-xl pass test-amd64-i386-xl pass test-amd64-amd64-xl-qemut-debianhvm-amd64-xsmfail test-amd64-i386-xl-qemut-debianhvm-amd64-xsm fail test-amd64-amd64-xl-qemuu-debianhvm-amd64-xsmfail test-amd64-i386-xl-qemuu-debianhvm-amd64-xsm fail test-amd64-amd64-libvirt-xsm fail test-amd64-i386-libvirt-xsm fail test-amd64-amd64-xl-xsm fail test-amd64-i386-xl-xsm fail test-amd64-amd64-xl-pvh-amd fail test-amd64-i386-rhel6hvm-amd pass test-amd64-i386-qemut-rhel6hvm-amd pass test-amd64-i386-qemuu-rhel6hvm-amd pass test-amd64-amd64-xl-qemut-debianhvm-amd64pass test-amd64-i386-xl-qemut-debianhvm-amd64 pass test-amd64-amd64-xl-qemuu-debianhvm-amd64pass test-amd64-i386-xl-qemuu-debianhvm-amd64 pass test-amd64-i386-freebsd10-amd64 fail
Re: [Xen-devel] [PATCH v2] xen: Add __GFP_DMA flag when xen_swiotlb_init gets free pages on ARM
On Fri, 24 Apr 2015, Konrad Rzeszutek Wilk wrote: On Fri, Apr 24, 2015 at 10:16:40AM +0100, Stefano Stabellini wrote: Make sure that xen_swiotlb_init allocates buffers that are DMA capable when at least one memblock is available below 4G. Otherwise we assume that all devices on the SoC can cope with 4G addresses. We do this on ARM and ARM64, where dom0 is mapped 1:1, so pfn == mfn in this case. No functional changes on x86. From: Chen Baozi baoz...@gmail.com Signed-off-by: Chen Baozi baoz...@gmail.com Signed-off-by: Stefano Stabellini stefano.stabell...@eu.citrix.com Tested-by: Chen Baozi baoz...@gmail.com Acked-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com Thanks! We are still early in the release cycle, should I add it to xentip/stable/for-linus-4.1? diff --git a/arch/arm/include/asm/xen/page.h b/arch/arm/include/asm/xen/page.h index 2f7e6ff..0b579b2 100644 --- a/arch/arm/include/asm/xen/page.h +++ b/arch/arm/include/asm/xen/page.h @@ -110,5 +110,6 @@ static inline bool set_phys_to_machine(unsigned long pfn, unsigned long mfn) bool xen_arch_need_swiotlb(struct device *dev, unsigned long pfn, unsigned long mfn); +unsigned long xen_get_swiotlb_free_pages(unsigned int order); #endif /* _ASM_ARM_XEN_PAGE_H */ diff --git a/arch/arm/xen/mm.c b/arch/arm/xen/mm.c index 793551d..4983250 100644 --- a/arch/arm/xen/mm.c +++ b/arch/arm/xen/mm.c @@ -4,6 +4,7 @@ #include linux/gfp.h #include linux/highmem.h #include linux/export.h +#include linux/memblock.h #include linux/of_address.h #include linux/slab.h #include linux/types.h @@ -21,6 +22,20 @@ #include asm/xen/hypercall.h #include asm/xen/interface.h +unsigned long xen_get_swiotlb_free_pages(unsigned int order) +{ + struct memblock_region *reg; + gfp_t flags = __GFP_NOWARN; + + for_each_memblock(memory, reg) { + if (reg-base (phys_addr_t)0x) { + flags |= __GFP_DMA; + break; + } + } + return __get_free_pages(flags, order); +} + enum dma_cache_op { DMA_UNMAP, DMA_MAP, diff --git a/arch/x86/include/asm/xen/page.h b/arch/x86/include/asm/xen/page.h index 358dcd3..c44a5d5 100644 --- a/arch/x86/include/asm/xen/page.h +++ b/arch/x86/include/asm/xen/page.h @@ -269,4 +269,9 @@ static inline bool xen_arch_need_swiotlb(struct device *dev, return false; } +static inline unsigned long xen_get_swiotlb_free_pages(unsigned int order) +{ + return __get_free_pages(__GFP_NOWARN, order); +} + #endif /* _ASM_X86_XEN_PAGE_H */ diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c index 810ad41..4c54932 100644 --- a/drivers/xen/swiotlb-xen.c +++ b/drivers/xen/swiotlb-xen.c @@ -235,7 +235,7 @@ retry: #define SLABS_PER_PAGE (1 (PAGE_SHIFT - IO_TLB_SHIFT)) #define IO_TLB_MIN_SLABS ((120) IO_TLB_SHIFT) while ((SLABS_PER_PAGE order) IO_TLB_MIN_SLABS) { - xen_io_tlb_start = (void *)__get_free_pages(__GFP_NOWARN, order); + xen_io_tlb_start = (void *)xen_get_swiotlb_free_pages(order); if (xen_io_tlb_start) break; order--; ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v6 01/14] x86: add socket_to_cpumask
On 23.04.15 at 11:55, chao.p.p...@linux.intel.com wrote: @@ -301,6 +304,8 @@ static void set_cpu_sibling_map(int cpu) } } } + +cpumask_set_cpu(cpu, socket_to_cpumask[cpu_to_socket(cpu)]); } There is an early return path in this function, which you need to deal with. @@ -717,6 +722,14 @@ void __init smp_prepare_cpus(unsigned int max_cpus) stack_base[0] = stack_start; +nr_sockets = DIV_ROUND_UP(nr_cpu_ids, boot_cpu_data.x86_max_cores * + boot_cpu_data.x86_num_siblings); I think this is going to be problematic when there are more CPUs in the system than Xen can (or is told to) handle. +ASSERT(nr_sockets 0); Due to the DIV_ROUND_UP() I think this can never trigger. +socket_to_cpumask = xzalloc_array(cpumask_t, nr_sockets); This is going to be very wasteful when Xen was built for very many CPUs. I think the array should be of cpumask_var_t type, and the array elements be initialized individually. --- a/xen/include/asm-x86/smp.h +++ b/xen/include/asm-x86/smp.h @@ -58,6 +58,14 @@ int hard_smp_processor_id(void); void __stop_this_cpu(void); +/* + * This value is considered to not change from the initial startup. + * Otherwise all the relevant places need to be retrofitted. + */ +extern unsigned int nr_sockets; + +/* Representing HT and core siblings in each socket */ +extern cpumask_t *socket_to_cpumask; #endif /* !__ASSEMBLY__ */ Please retain the blank line before the #endif. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Archiving Xen on ARM and PVOPS subprojects
On 22 Apr 2015, at 14:04, Ian Jackson ian.jack...@eu.citrix.com wrote: Lars Kurth writes (Re: Archiving Xen on ARM and PVOPS subprojects): Any other votes by committers (in the TO list) before I tally the votes? +1 to both. We had several +1's and no -1's. So this vote carries Lars ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v6 07/14] x86: dynamically get/set CBM for a domain
On 23.04.15 at 11:55, chao.p.p...@linux.intel.com wrote: +int psr_set_l3_cbm(struct domain *d, unsigned int socket, uint64_t cbm) +{ +unsigned int old_cos, cos; +struct psr_cat_cbm *map, *find; I think the variable would better be named found and ... +struct psr_cat_socket_info *info; +int ret = get_cat_socket_info(socket, info); + +if ( ret ) +return ret; + +if ( !psr_check_cbm(info-cbm_len, cbm) ) +return -EINVAL; + +old_cos = d-arch.psr_cos_ids[socket]; +map = info-cos_to_cbm; +find = NULL; ... NULL could be its initializer. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v6 07/14] x86: dynamically get/set CBM for a domain
On 23.04.15 at 11:55, chao.p.p...@linux.intel.com wrote: Changes in v5: * Add spin_lock to protect cbm_map. * Add spin_lock to read path aswell. I don't think the second bullet point is true - there's only a single spin_lock() invocation being added here afaics. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v3] x86: allow 64-bit PV guest kernels to suppress user mode exposure of M2P
On 24/04/15 16:07, Jan Beulich wrote: On 24.04.15 at 16:57, andrew.coop...@citrix.com wrote: On 24/04/15 15:31, Jan Beulich wrote: Xen L4 entries being uniformly installed into any L4 table and 64-bit PV kernels running in ring 3 means that user mode was able to see the read-only M2P presented by Xen to the guests. While apparently not really representing an exploitable information leak, this still very certainly was never meant to be that way. Building on the fact that these guests already have separate kernel and user mode page tables we can allow guest kernels to tell Xen that they don't want user mode to see this table. We can't, however, do this by default: There is no ABI requirement that kernel and user mode page tables be separate. Therefore introduce a new VM-assist flag allowing the guest to control respective hypervisor behavior: - when not set, L4 tables get created with the respective slot blank, and whenever the L4 table gets used as a kernel one the missing mapping gets inserted, - when set, L4 tables get created with the respective slot initialized as before, and whenever the L4 table gets used as a user one the mapping gets zapped. Is this complete? Yes. For backwards compatibility, older kernels will not have m2p_strict set, and the m2p should unconditionally appear in all L4s. No. L4s _only_ used for user mode have no need for this entry to be non-zero. There is only ever a single L4 in a particular virtual address space. The M2P is part of the Xen ABI range. Previously, the M2P was present in all L4s which is why they leaked into user context. If we don not wish to break backwards compatibility with this change, then in relaxed mode the M2P must remain in all tables, or Userspace which is actually making use of mappings will suddenly start faulting because of a change in Xen, not a kernel change (and an unknowing kernel might not be prepared to handle this case). By your description, in the relaxed case a newly created L4 which is first used as user table will have the mapping clear. Or am I missing something? ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 0/9] Porting the intel_pstate driver to Xen
On 24.04.15 at 17:42, wei.w.w...@intel.com wrote: On 24/04/2015 23:04, Jan Beulich wrote On 24.04.15 at 16:56, wei.w.w...@intel.com wrote: On 24/04/2015 20:57, Jan Beulich wrote I'm not sure how else to express what I want (no matter how many internal governors the intel_pstate driver has). xenpm set-scaling-governor powersave xenpm set-scaling-governor ondemand xenpm set-scaling-governor performance each should switch the system into a respective state, no matter whether internally to the driver this means a change of governors or just a modification to {min,max}_pct. And obtaining the current state after any of the above should show the same governor in use that was set (and not internal), again no matter how this is being achieved internally to the driver. Thanks Jan, that's clear. But this will have another issue. For example, we set-scaling-governor to ondemand, then we adjust min_pct=max_pct = 60%. The timer function may generate results like 35%, 55%, 45%..., but the CPU just keeps running with 60%. So I must be misunderstanding something then: How can the driver do anything at all when told to run the system at 60%? The {min,max}_pct is a limit. That's clear. The question is whether these are hardware imposed limits or set by the user. In the latter case ... Then, this is not ondemand at all (I think this should be another reason why the intel_pstate driver does not call its governor ondemand). The intel_pstate driver in the kernel has already got rid of the old governor convention. They let the user get what they want through simply adjusting the {min,max}_pct (the {min,max}_pct actually limits how the performance is selected). Adjusting the values individually to me very much looks like the userspace governor. Yeah, that example was like userspace. Please take a look at this example: [min_pct=60%, max_pct=80%], the timer generates 45%, 55%, 65%, 70%, 75%, 90%, then the final target values will not be constant. (the 45%, 55%, and 90% here shouldn't happen in any case) The ones (65%, 70%, 75%) falling into the limit interval behaves like ondemand, others are not. ... restricting to these would be the job of the driver, and - ondemand would be 0...100 - powersave would be 0 - performance would be 100 Everything else would be userspace. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v16 04/14] qspinlock: Extract out code snippets for the next patch
This is a preparatory patch that extracts out the following 2 code snippets to prepare for the next performance optimization patch. 1) the logic for the exchange of new and previous tail code words into a new xchg_tail() function. 2) the logic for clearing the pending bit and setting the locked bit into a new clear_pending_set_locked() function. This patch also simplifies the trylock operation before queuing by calling queue_spin_trylock() directly. Signed-off-by: Waiman Long waiman.l...@hp.com Signed-off-by: Peter Zijlstra (Intel) pet...@infradead.org --- include/asm-generic/qspinlock_types.h |2 + kernel/locking/qspinlock.c| 79 - 2 files changed, 50 insertions(+), 31 deletions(-) diff --git a/include/asm-generic/qspinlock_types.h b/include/asm-generic/qspinlock_types.h index 9c3f5c2..ef36613 100644 --- a/include/asm-generic/qspinlock_types.h +++ b/include/asm-generic/qspinlock_types.h @@ -58,6 +58,8 @@ typedef struct qspinlock { #define _Q_TAIL_CPU_BITS (32 - _Q_TAIL_CPU_OFFSET) #define _Q_TAIL_CPU_MASK _Q_SET_MASK(TAIL_CPU) +#define _Q_TAIL_MASK (_Q_TAIL_IDX_MASK | _Q_TAIL_CPU_MASK) + #define _Q_LOCKED_VAL (1U _Q_LOCKED_OFFSET) #define _Q_PENDING_VAL (1U _Q_PENDING_OFFSET) diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c index 0351f78..11f6ad9 100644 --- a/kernel/locking/qspinlock.c +++ b/kernel/locking/qspinlock.c @@ -97,6 +97,42 @@ static inline struct mcs_spinlock *decode_tail(u32 tail) #define _Q_LOCKED_PENDING_MASK (_Q_LOCKED_MASK | _Q_PENDING_MASK) /** + * clear_pending_set_locked - take ownership and clear the pending bit. + * @lock: Pointer to queue spinlock structure + * + * *,1,0 - *,0,1 + */ +static __always_inline void clear_pending_set_locked(struct qspinlock *lock) +{ + atomic_add(-_Q_PENDING_VAL + _Q_LOCKED_VAL, lock-val); +} + +/** + * xchg_tail - Put in the new queue tail code word retrieve previous one + * @lock : Pointer to queue spinlock structure + * @tail : The new queue tail code word + * Return: The previous queue tail code word + * + * xchg(lock, tail) + * + * p,*,* - n,*,* ; prev = xchg(lock, node) + */ +static __always_inline u32 xchg_tail(struct qspinlock *lock, u32 tail) +{ + u32 old, new, val = atomic_read(lock-val); + + for (;;) { + new = (val _Q_LOCKED_PENDING_MASK) | tail; + old = atomic_cmpxchg(lock-val, val, new); + if (old == val) + break; + + val = old; + } + return old; +} + +/** * queue_spin_lock_slowpath - acquire the queue spinlock * @lock: Pointer to queue spinlock structure * @val: Current value of the queue spinlock 32-bit word @@ -178,15 +214,7 @@ void queue_spin_lock_slowpath(struct qspinlock *lock, u32 val) * * *,1,0 - *,0,1 */ - for (;;) { - new = (val ~_Q_PENDING_MASK) | _Q_LOCKED_VAL; - - old = atomic_cmpxchg(lock-val, val, new); - if (old == val) - break; - - val = old; - } + clear_pending_set_locked(lock); return; /* @@ -203,37 +231,26 @@ queue: node-next = NULL; /* -* We have already touched the queueing cacheline; don't bother with -* pending stuff. -* -* trylock || xchg(lock, node) -* -* 0,0,0 - 0,0,1 ; no tail, not locked - no tail, locked. -* p,y,x - n,y,x ; tail was p - tail is n; preserving locked. +* We touched a (possibly) cold cacheline in the per-cpu queue node; +* attempt the trylock once more in the hope someone let go while we +* weren't watching. */ - for (;;) { - new = _Q_LOCKED_VAL; - if (val) - new = tail | (val _Q_LOCKED_PENDING_MASK); - - old = atomic_cmpxchg(lock-val, val, new); - if (old == val) - break; - - val = old; - } + if (queue_spin_trylock(lock)) + goto release; /* -* we won the trylock; forget about queueing. +* We have already touched the queueing cacheline; don't bother with +* pending stuff. +* +* p,*,* - n,*,* */ - if (new == _Q_LOCKED_VAL) - goto release; + old = xchg_tail(lock, tail); /* * if there was a previous node; link it and wait until reaching the * head of the waitqueue. */ - if (old ~_Q_LOCKED_PENDING_MASK) { + if (old _Q_TAIL_MASK) { prev = decode_tail(old); WRITE_ONCE(prev-next, node); -- 1.7.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v16 01/14] qspinlock: A simple generic 4-byte queue spinlock
This patch introduces a new generic queue spinlock implementation that can serve as an alternative to the default ticket spinlock. Compared with the ticket spinlock, this queue spinlock should be almost as fair as the ticket spinlock. It has about the same speed in single-thread and it can be much faster in high contention situations especially when the spinlock is embedded within the data structure to be protected. Only in light to moderate contention where the average queue depth is around 1-3 will this queue spinlock be potentially a bit slower due to the higher slowpath overhead. This queue spinlock is especially suit to NUMA machines with a large number of cores as the chance of spinlock contention is much higher in those machines. The cost of contention is also higher because of slower inter-node memory traffic. Due to the fact that spinlocks are acquired with preemption disabled, the process will not be migrated to another CPU while it is trying to get a spinlock. Ignoring interrupt handling, a CPU can only be contending in one spinlock at any one time. Counting soft IRQ, hard IRQ and NMI, a CPU can only have a maximum of 4 concurrent lock waiting activities. By allocating a set of per-cpu queue nodes and used them to form a waiting queue, we can encode the queue node address into a much smaller 24-bit size (including CPU number and queue node index) leaving one byte for the lock. Please note that the queue node is only needed when waiting for the lock. Once the lock is acquired, the queue node can be released to be used later. Signed-off-by: Waiman Long waiman.l...@hp.com Signed-off-by: Peter Zijlstra (Intel) pet...@infradead.org --- include/asm-generic/qspinlock.h | 132 + include/asm-generic/qspinlock_types.h | 58 + kernel/Kconfig.locks |7 + kernel/locking/Makefile |1 + kernel/locking/mcs_spinlock.h |1 + kernel/locking/qspinlock.c| 209 + 6 files changed, 408 insertions(+), 0 deletions(-) create mode 100644 include/asm-generic/qspinlock.h create mode 100644 include/asm-generic/qspinlock_types.h create mode 100644 kernel/locking/qspinlock.c diff --git a/include/asm-generic/qspinlock.h b/include/asm-generic/qspinlock.h new file mode 100644 index 000..315d6dc --- /dev/null +++ b/include/asm-generic/qspinlock.h @@ -0,0 +1,132 @@ +/* + * Queue spinlock + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * (C) Copyright 2013-2015 Hewlett-Packard Development Company, L.P. + * + * Authors: Waiman Long waiman.l...@hp.com + */ +#ifndef __ASM_GENERIC_QSPINLOCK_H +#define __ASM_GENERIC_QSPINLOCK_H + +#include asm-generic/qspinlock_types.h + +/** + * queue_spin_is_locked - is the spinlock locked? + * @lock: Pointer to queue spinlock structure + * Return: 1 if it is locked, 0 otherwise + */ +static __always_inline int queue_spin_is_locked(struct qspinlock *lock) +{ + return atomic_read(lock-val); +} + +/** + * queue_spin_value_unlocked - is the spinlock structure unlocked? + * @lock: queue spinlock structure + * Return: 1 if it is unlocked, 0 otherwise + * + * N.B. Whenever there are tasks waiting for the lock, it is considered + * locked wrt the lockref code to avoid lock stealing by the lockref + * code and change things underneath the lock. This also allows some + * optimizations to be applied without conflict with lockref. + */ +static __always_inline int queue_spin_value_unlocked(struct qspinlock lock) +{ + return !atomic_read(lock.val); +} + +/** + * queue_spin_is_contended - check if the lock is contended + * @lock : Pointer to queue spinlock structure + * Return: 1 if lock contended, 0 otherwise + */ +static __always_inline int queue_spin_is_contended(struct qspinlock *lock) +{ + return atomic_read(lock-val) ~_Q_LOCKED_MASK; +} +/** + * queue_spin_trylock - try to acquire the queue spinlock + * @lock : Pointer to queue spinlock structure + * Return: 1 if lock acquired, 0 if failed + */ +static __always_inline int queue_spin_trylock(struct qspinlock *lock) +{ + if (!atomic_read(lock-val) + (atomic_cmpxchg(lock-val, 0, _Q_LOCKED_VAL) == 0)) + return 1; + return 0; +} + +extern void queue_spin_lock_slowpath(struct qspinlock *lock, u32 val); + +/** + * queue_spin_lock - acquire a queue spinlock + * @lock: Pointer to queue spinlock structure + */ +static __always_inline void queue_spin_lock(struct qspinlock *lock) +{ + u32
[Xen-devel] [PATCH v16 00/14] qspinlock: a 4-byte queue spinlock with PV support
v15-v16: - Remove the lfsr patch and use linear probing as lfsr is not really necessary in most cases. - Move the paravirt PV_CALLEE_SAVE_REGS_THUNK code to an asm header. - Add a patch to collect PV qspinlock statistics which also supersedes the PV lock hash debug patch. - Add PV qspinlock performance numbers. v14-v15: - Incorporate PeterZ's v15 qspinlock patch and improve upon the PV qspinlock code by dynamically allocating the hash table as well as some other performance optimization. - Simplified the Xen PV qspinlock code as suggested by David Vrabel david.vra...@citrix.com. - Add benchmarking data for 3.19 kernel to compare the performance of a spinlock heavy test with and without the qspinlock patch under different cpufreq drivers and scaling governors. v13-v14: - Patches 1 2: Add queue_spin_unlock_wait() to accommodate commit 78bff1c86 from Oleg Nesterov. - Fix the system hang problem when using PV qspinlock in an over-committed guest due to a racing condition in the pv_set_head_in_tail() function. - Increase the MAYHALT_THRESHOLD from 10 to 1024. - Change kick_cpu into a regular function pointer instead of a callee-saved function. - Change lock statistics code to use separate bits for different statistics. v12-v13: - Change patch 9 to generate separate versions of the queue_spin_lock_slowpath functions for bare metal and PV guest. This reduces the performance impact of the PV code on bare metal systems. v11-v12: - Based on PeterZ's version of the qspinlock patch (https://lkml.org/lkml/2014/6/15/63). - Incorporated many of the review comments from Konrad Wilk and Paolo Bonzini. - The pvqspinlock code is largely from my previous version with PeterZ's way of going from queue tail to head and his idea of using callee saved calls to KVM and XEN codes. v10-v11: - Use a simple test-and-set unfair lock to simplify the code, but performance may suffer a bit for large guest with many CPUs. - Take out Raghavendra KT's test results as the unfair lock changes may render some of his results invalid. - Add PV support without increasing the size of the core queue node structure. - Other minor changes to address some of the feedback comments. v9-v10: - Make some minor changes to qspinlock.c to accommodate review feedback. - Change author to PeterZ for 2 of the patches. - Include Raghavendra KT's test results in patch 18. v8-v9: - Integrate PeterZ's version of the queue spinlock patch with some modification: http://lkml.kernel.org/r/20140310154236.038181...@infradead.org - Break the more complex patches into smaller ones to ease review effort. - Fix a racing condition in the PV qspinlock code. v7-v8: - Remove one unneeded atomic operation from the slowpath, thus improving performance. - Simplify some of the codes and add more comments. - Test for X86_FEATURE_HYPERVISOR CPU feature bit to enable/disable unfair lock. - Reduce unfair lock slowpath lock stealing frequency depending on its distance from the queue head. - Add performance data for IvyBridge-EX CPU. v6-v7: - Remove an atomic operation from the 2-task contending code - Shorten the names of some macros - Make the queue waiter to attempt to steal lock when unfair lock is enabled. - Remove lock holder kick from the PV code and fix a race condition - Run the unfair lock PV code on overcommitted KVM guests to collect performance data. v5-v6: - Change the optimized 2-task contending code to make it fairer at the expense of a bit of performance. - Add a patch to support unfair queue spinlock for Xen. - Modify the PV qspinlock code to follow what was done in the PV ticketlock. - Add performance data for the unfair lock as well as the PV support code. v4-v5: - Move the optimized 2-task contending code to the generic file to enable more architectures to use it without code duplication. - Address some of the style-related comments by PeterZ. - Allow the use of unfair queue spinlock in a real para-virtualized execution environment. - Add para-virtualization support to the qspinlock code by ensuring that the lock holder and queue head stay alive as much as possible. v3-v4: - Remove debugging code and fix a configuration error - Simplify the qspinlock structure and streamline the code to make it perform a bit better - Add an x86 version of asm/qspinlock.h for holding x86 specific optimization. - Add an optimized x86 code path for 2 contending tasks to improve low contention performance. v2-v3: - Simplify the code by using numerous mode only without an unfair option. - Use the latest smp_load_acquire()/smp_store_release() barriers. - Move the queue spinlock code to kernel/locking. - Make the use of queue spinlock the default for x86-64 without user configuration. - Additional performance tuning. v1-v2: - Add some more comments to document what the code does. -
[Xen-devel] [PATCH v16 09/14] pvqspinlock, x86: Implement the paravirt qspinlock call patching
From: Peter Zijlstra (Intel) pet...@infradead.org We use the regular paravirt call patching to switch between: native_queue_spin_lock_slowpath() __pv_queue_spin_lock_slowpath() native_queue_spin_unlock()__pv_queue_spin_unlock() We use a callee saved call for the unlock function which reduces the i-cache footprint and allows 'inlining' of SPIN_UNLOCK functions again. We further optimize the unlock path by patching the direct call with a movb $0,%arg1 if we are indeed using the native unlock code. This makes the unlock code almost as fast as the !PARAVIRT case. This significantly lowers the overhead of having CONFIG_PARAVIRT_SPINLOCKS enabled, even for native code. Signed-off-by: Peter Zijlstra (Intel) pet...@infradead.org Signed-off-by: Waiman Long waiman.l...@hp.com --- arch/x86/Kconfig |2 +- arch/x86/include/asm/paravirt.h | 29 - arch/x86/include/asm/paravirt_types.h | 10 ++ arch/x86/include/asm/qspinlock.h | 25 - arch/x86/include/asm/qspinlock_paravirt.h |6 ++ arch/x86/kernel/paravirt-spinlocks.c | 24 +++- arch/x86/kernel/paravirt_patch_32.c | 22 ++ arch/x86/kernel/paravirt_patch_64.c | 22 ++ 8 files changed, 128 insertions(+), 12 deletions(-) create mode 100644 arch/x86/include/asm/qspinlock_paravirt.h diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 49fecb1..a0946e7 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -661,7 +661,7 @@ config PARAVIRT_DEBUG config PARAVIRT_SPINLOCKS bool Paravirtualization layer for spinlocks depends on PARAVIRT SMP - select UNINLINE_SPIN_UNLOCK + select UNINLINE_SPIN_UNLOCK if !QUEUE_SPINLOCK ---help--- Paravirtualized spinlocks allow a pvops backend to replace the spinlock implementation with something virtualization-friendly diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h index 965c47d..f76199a 100644 --- a/arch/x86/include/asm/paravirt.h +++ b/arch/x86/include/asm/paravirt.h @@ -712,6 +712,31 @@ static inline void __set_fixmap(unsigned /* enum fixed_addresses */ idx, #if defined(CONFIG_SMP) defined(CONFIG_PARAVIRT_SPINLOCKS) +#ifdef CONFIG_QUEUE_SPINLOCK + +static __always_inline void pv_queue_spin_lock_slowpath(struct qspinlock *lock, + u32 val) +{ + PVOP_VCALL2(pv_lock_ops.queue_spin_lock_slowpath, lock, val); +} + +static __always_inline void pv_queue_spin_unlock(struct qspinlock *lock) +{ + PVOP_VCALLEE1(pv_lock_ops.queue_spin_unlock, lock); +} + +static __always_inline void pv_wait(u8 *ptr, u8 val) +{ + PVOP_VCALL2(pv_lock_ops.wait, ptr, val); +} + +static __always_inline void pv_kick(int cpu) +{ + PVOP_VCALL1(pv_lock_ops.kick, cpu); +} + +#else /* !CONFIG_QUEUE_SPINLOCK */ + static __always_inline void __ticket_lock_spinning(struct arch_spinlock *lock, __ticket_t ticket) { @@ -724,7 +749,9 @@ static __always_inline void __ticket_unlock_kick(struct arch_spinlock *lock, PVOP_VCALL2(pv_lock_ops.unlock_kick, lock, ticket); } -#endif +#endif /* CONFIG_QUEUE_SPINLOCK */ + +#endif /* SMP PARAVIRT_SPINLOCKS */ #ifdef CONFIG_X86_32 #define PV_SAVE_REGS pushl %ecx; pushl %edx; diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h index 7549b8b..f6acaea 100644 --- a/arch/x86/include/asm/paravirt_types.h +++ b/arch/x86/include/asm/paravirt_types.h @@ -333,9 +333,19 @@ struct arch_spinlock; typedef u16 __ticket_t; #endif +struct qspinlock; + struct pv_lock_ops { +#ifdef CONFIG_QUEUE_SPINLOCK + void (*queue_spin_lock_slowpath)(struct qspinlock *lock, u32 val); + struct paravirt_callee_save queue_spin_unlock; + + void (*wait)(u8 *ptr, u8 val); + void (*kick)(int cpu); +#else /* !CONFIG_QUEUE_SPINLOCK */ struct paravirt_callee_save lock_spinning; void (*unlock_kick)(struct arch_spinlock *lock, __ticket_t ticket); +#endif /* !CONFIG_QUEUE_SPINLOCK */ }; /* This contains all the paravirt structures: we get a convenient diff --git a/arch/x86/include/asm/qspinlock.h b/arch/x86/include/asm/qspinlock.h index 64c925e..c8290db 100644 --- a/arch/x86/include/asm/qspinlock.h +++ b/arch/x86/include/asm/qspinlock.h @@ -3,6 +3,7 @@ #include asm/cpufeature.h #include asm-generic/qspinlock_types.h +#include asm/paravirt.h #definequeue_spin_unlock queue_spin_unlock /** @@ -11,11 +12,33 @@ * * A smp_store_release() on the least-significant byte. */ -static inline void queue_spin_unlock(struct qspinlock *lock) +static inline void native_queue_spin_unlock(struct qspinlock *lock) { smp_store_release((u8 *)lock, 0); } +#ifdef CONFIG_PARAVIRT_SPINLOCKS +extern void
[Xen-devel] [PATCH v16 10/14] pvqspinlock, x86: Enable PV qspinlock for KVM
This patch adds the necessary KVM specific code to allow KVM to support the CPU halting and kicking operations needed by the queue spinlock PV code. Signed-off-by: Waiman Long waiman.l...@hp.com --- arch/x86/kernel/kvm.c | 43 +++ kernel/Kconfig.locks |2 +- 2 files changed, 44 insertions(+), 1 deletions(-) diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c index e354cc6..4bb42c0 100644 --- a/arch/x86/kernel/kvm.c +++ b/arch/x86/kernel/kvm.c @@ -584,6 +584,39 @@ static void kvm_kick_cpu(int cpu) kvm_hypercall2(KVM_HC_KICK_CPU, flags, apicid); } + +#ifdef CONFIG_QUEUE_SPINLOCK + +#include asm/qspinlock.h + +static void kvm_wait(u8 *ptr, u8 val) +{ + unsigned long flags; + + if (in_nmi()) + return; + + local_irq_save(flags); + + if (READ_ONCE(*ptr) != val) + goto out; + + /* +* halt until it's our turn and kicked. Note that we do safe halt +* for irq enabled case to avoid hang when lock info is overwritten +* in irq spinlock slowpath and no spurious interrupt occur to save us. +*/ + if (arch_irqs_disabled_flags(flags)) + halt(); + else + safe_halt(); + +out: + local_irq_restore(flags); +} + +#else /* !CONFIG_QUEUE_SPINLOCK */ + enum kvm_contention_stat { TAKEN_SLOW, TAKEN_SLOW_PICKUP, @@ -817,6 +850,8 @@ static void kvm_unlock_kick(struct arch_spinlock *lock, __ticket_t ticket) } } +#endif /* !CONFIG_QUEUE_SPINLOCK */ + /* * Setup pv_lock_ops to exploit KVM_FEATURE_PV_UNHALT if present. */ @@ -828,8 +863,16 @@ void __init kvm_spinlock_init(void) if (!kvm_para_has_feature(KVM_FEATURE_PV_UNHALT)) return; +#ifdef CONFIG_QUEUE_SPINLOCK + __pv_init_lock_hash(); + pv_lock_ops.queue_spin_lock_slowpath = __pv_queue_spin_lock_slowpath; + pv_lock_ops.queue_spin_unlock = PV_CALLEE_SAVE(__pv_queue_spin_unlock); + pv_lock_ops.wait = kvm_wait; + pv_lock_ops.kick = kvm_kick_cpu; +#else /* !CONFIG_QUEUE_SPINLOCK */ pv_lock_ops.lock_spinning = PV_CALLEE_SAVE(kvm_lock_spinning); pv_lock_ops.unlock_kick = kvm_unlock_kick; +#endif } static __init int kvm_spinlock_init_jump(void) diff --git a/kernel/Kconfig.locks b/kernel/Kconfig.locks index c6a8f7c..537b13e 100644 --- a/kernel/Kconfig.locks +++ b/kernel/Kconfig.locks @@ -240,7 +240,7 @@ config ARCH_USE_QUEUE_SPINLOCK config QUEUE_SPINLOCK def_bool y if ARCH_USE_QUEUE_SPINLOCK - depends on SMP !PARAVIRT_SPINLOCKS + depends on SMP (!PARAVIRT_SPINLOCKS || !XEN) config ARCH_USE_QUEUE_RWLOCK bool -- 1.7.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v16 06/14] qspinlock: Use a simple write to grab the lock
Currently, atomic_cmpxchg() is used to get the lock. However, this is not really necessary if there is more than one task in the queue and the queue head don't need to reset the tail code. For that case, a simple write to set the lock bit is enough as the queue head will be the only one eligible to get the lock as long as it checks that both the lock and pending bits are not set. The current pending bit waiting code will ensure that the bit will not be set as soon as the tail code in the lock is set. With that change, the are some slight improvement in the performance of the queue spinlock in the 5M loop micro-benchmark run on a 4-socket Westere-EX machine as shown in the tables below. [Standalone/Embedded - same node] # of tasksBefore patchAfter patch %Change ----- -- --- 3 2324/2321 2248/2265-3%/-2% 4 2890/2896 2819/2831-2%/-2% 5 3611/3595 3522/3512-2%/-2% 6 4281/4276 4173/4160-3%/-3% 7 5018/5001 4875/4861-3%/-3% 8 5759/5750 5563/5568-3%/-3% [Standalone/Embedded - different nodes] # of tasksBefore patchAfter patch %Change ----- -- --- 312242/12237 12087/12093 -1%/-1% 410688/10696 10507/10521 -2%/-2% It was also found that this change produced a much bigger performance improvement in the newer IvyBridge-EX chip and was essentially to close the performance gap between the ticket spinlock and queue spinlock. The disk workload of the AIM7 benchmark was run on a 4-socket Westmere-EX machine with both ext4 and xfs RAM disks at 3000 users on a 3.14 based kernel. The results of the test runs were: AIM7 XFS Disk Test kernel JPMReal Time Sys TimeUsr Time - ---- ticketlock56782333.17 96.61 5.81 qspinlock 57507993.13 94.83 5.97 AIM7 EXT4 Disk Test kernel JPMReal Time Sys TimeUsr Time - ---- ticketlock1114551 16.15 509.72 7.11 qspinlock 21844668.24 232.99 6.01 The ext4 filesystem run had a much higher spinlock contention than the xfs filesystem run. The ebizzy -m test was also run with the following results: kernel records/s Real Time Sys TimeUsr Time -- - ticketlock 2075 10.00 216.35 3.49 qspinlock 3023 10.00 198.20 4.80 Signed-off-by: Waiman Long waiman.l...@hp.com Signed-off-by: Peter Zijlstra (Intel) pet...@infradead.org --- kernel/locking/qspinlock.c | 66 +-- 1 files changed, 50 insertions(+), 16 deletions(-) diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c index bcc99e6..99503ef 100644 --- a/kernel/locking/qspinlock.c +++ b/kernel/locking/qspinlock.c @@ -105,24 +105,37 @@ static inline struct mcs_spinlock *decode_tail(u32 tail) * By using the whole 2nd least significant byte for the pending bit, we * can allow better optimization of the lock acquisition for the pending * bit holder. + * + * This internal structure is also used by the set_locked function which + * is not restricted to _Q_PENDING_BITS == 8. */ -#if _Q_PENDING_BITS == 8 - struct __qspinlock { union { atomic_t val; - struct { #ifdef __LITTLE_ENDIAN + struct { + u8 locked; + u8 pending; + }; + struct { u16 locked_pending; u16 tail; + }; #else + struct { u16 tail; u16 locked_pending; -#endif }; + struct { + u8 reserved[2]; + u8 pending; + u8 locked; + }; +#endif }; }; +#if _Q_PENDING_BITS == 8 /** * clear_pending_set_locked - take ownership and clear the pending bit. * @lock: Pointer to queue spinlock structure @@ -195,6 +208,19 @@ static __always_inline u32 xchg_tail(struct qspinlock *lock, u32 tail) #endif /* _Q_PENDING_BITS == 8 */ /** + * set_locked - Set the lock bit and own the lock + * @lock: Pointer to queue spinlock structure + * + * *,*,0 - *,0,1 + */ +static __always_inline void set_locked(struct qspinlock *lock) +{ + struct __qspinlock *l = (void *)lock; + + WRITE_ONCE(l-locked, _Q_LOCKED_VAL); +} + +/** *