date:20150424

Re: [Xen-devel] [OSSTEST Nested PATCH v8 6/7] Compose the main recipe of nested test job

2015-04-24 Thread Robert Hu

On Thu, 2015-04-23 at 12:05 +0100, Ian Jackson wrote:
 Pang, LongtaoX writes (RE: [Xen-devel] [OSSTEST Nested PATCH v8 6/7] Compose 
 the main recipe of nested test job):
  Ian Campbell [mailto:ian.campb...@citrix.com]:
   The arguments passed to ts-* in sg-run-job should _always_ be just a
   bare indent. The ident=foo thing is a special case for standalone mode.
   
  Do you means that the parameter passed to ts-* in sg-run-job should 
  _always_ be just a bare ident?
 
 Yes, that's what Ian means.
 
  If yes. I think it needs to pass l1's '$gn' which is not a bare ident to 
  'ts-debian-hvm-install' for installing L1 guest VM, since the default '$gn 
  ||= 'debianhvm' if no other '$gn' was passed to this script, but for nested 
  job we expect a different '$gn' from 'debianhvm', such as 'nestedl1'. So it 
  need to pass 'host' and 'nestedl1' to  'ts-debian-hvm-install' script. 
  +run-ts . = ts-debian-hvm-install + host + nestedl1
 
 Indeed.
 
  For 'ts-nested-setup' script, it need to call 'ts_get_host_guest'
  which need two parameters 'host' and 'nestedl1' to get $l0 and $l1
  scope, so it also need to pass 'nestedl1' to this script.
 
 Yes.
 
  For ' ts-guest-destroy', it also need two parameters of 'host' and 
  'nestedl1' which is L1's '$gn'.
 
 If you're intending to destroy the l1, then yes.  If you want to
 destroy theh l2 then you need to pass `nestedl1' and whatever the L2
 guest name/ident is.
Here comes a common issue we need to settle down:
To not break the principle that we shall only pass in ident in recipe,
we shall either
1. for l1 and l2, identical their ident/name/hostname, make this a
principle as well. This shall avoid much complex design but adding one
additional rule to bear in mind.
2. setup $ident -- $name mapping somewhere at the beginning (either
alloc host or make flight, you name it, but somewhere).

I prefer Ian C's suggestion, option 1, that why should we differ the
ident and name for l1 and l2.
 
 Ian.



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [v2 06/11] vmx: add help functions to support PML

2015-04-24 Thread Kai Huang




On 04/17/2015 03:37 PM, Jan Beulich wrote:

On 17.04.15 at 09:23, kai.hu...@linux.intel.com wrote:

On 04/17/2015 02:58 PM, Jan Beulich wrote:

On 17.04.15 at 08:51, kai.hu...@linux.intel.com wrote:

On 04/17/2015 02:23 PM, Jan Beulich wrote:

On 17.04.15 at 05:10, kai.hu...@linux.intel.com wrote:

On 04/16/2015 11:42 PM, Jan Beulich wrote:

On 15.04.15 at 09:03, kai.hu...@linux.intel.com wrote:

+void vmx_vcpu_flush_pml_buffer(struct vcpu *v)
+{
+uint64_t *pml_buf;
+unsigned long pml_idx;
+
+ASSERT(vmx_vcpu_pml_enabled(v));
+
+vmx_vmcs_enter(v);
+
+__vmread(GUEST_PML_INDEX, pml_idx);

Don't you require the vCPU to be non-running or current when you
get here? If so, perhaps add a respective ASSERT()?

Yes an ASSERT would be better.

v-pause_count will be increased if vcpu is kicked out by domain_pause
explicitly, but looks the same thing won't be done if vcpu is kicked out
by PML buffer full VMEXIT. So should the ASSERT be done like below?

ASSERT(atomic_read(v-pause_count) || (v == current));

For one I'd reverse the two parts. And then I think pause count
being non-zero is not a sufficient condition - if a non-synchronous
pause was issued against the vCPU it may still be running. I'd
suggest !vcpu_runnable(v)  !v-is_running, possibly with the
pause count check instead of the runnable one if the only
permitted case where v != current requires the vCPU to be
paused.

The vmx_vcpu_flush_pml_buffer is only supposed to be called in below cases:

   - When PML full VMEXIT happens
   - In paging_log_dirty_op  hap_track_dirty_vram, before reporting
dirty pages to userspace.
   - In vmx_vcpu_disable_pml, called from vmx_vcpu_destroy, or when
log-dirty mode is disabled.

In the latter two cases, domain_pause is guaranteed to be called before
vmx_vcpu_flush_pml_buffer is called, therefore looks there's no
possibility of non-synchronous pause of the vcpu.

Or are you suggesting we should suppose this function can be called from
any caller, and meanwhile is able to act reasonably?

No. All I'm saying is in order to protect against eventual undue
future callers, it should assert that its preconditions are met. I.e.
if the vCPU is expected to be paused, check that the pause
count in non-zero _and_ that the pause actually took effect.

I see. I will do as you suggested:

ASSERT((v == current) || (!vcpu_runnable(v)  !v-is_running));

And v != current should be the only case requires the vcpu to be paused.

But if you require (or at least expect) the vCPU to be paused, this
isn't what I suggested. Afaict

ASSERT((v == current) || (!v-is_running  atomic_read(v-pause_count)));

would then be the right check (and, while not be a full guarantee that
things wouldn't change behind your back, would at least increase
chances that the vCPU's runnable state won't change, as the vCPU
could have been non-runnable for reasons other than having been
paused).

Hi Jan,

I tried the ASSERT with atomic_read(v-pause_count), and it turns out 
the ASSERT would fail and panic Xen. The reason is domain_pause only 
increases d-pause_count, but it doesn't internally increase 
v-pause_count for all vcpus.


vmx_vcpu_flush_pml_buffer is only supposed to be called from PML buffer 
full VMEXIT, and vmx_domain_flush_pml_buffer, before which domain_pause 
should be called.


Sorry that obviously I had some misunderstanding regarding to require 
(or at least expect) vCPU to be paused, and looks !vcpu_runnable(v) is 
the right choice.


What's your opinion?

Thanks,
-Kai


Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCHv6 5/5] gnttab: use per-VCPU maptrack free lists

2015-04-24 Thread Jan Beulich

 On 23.04.15 at 18:29, david.vra...@citrix.com wrote:
 On 23/04/15 17:11, Jan Beulich wrote:
 On 22.04.15 at 18:00, david.vra...@citrix.com wrote:
 -return handle;
 +v-maptrack_limit = new_mt_limit;
 +
 +return __get_maptrack_handle(lgt, v);
 
 With the lock dropped, nothing guarantees this to succeed, which it
 ought to unless the table size reached its allowed maximum.
 
 We've just added a new page of handles for this VCPU.  This will succeed.

Ah, right, this is per-vCPU now.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 3/5] libxl: use LIBXL_TOOLSTACK_DOMID

2015-04-24 Thread Olaf Hering

On Fri, Mar 20, Wei Liu wrote:

 +++ b/tools/libxl/libxl_dm.c
 @@ -1438,6 +1438,8 @@ retry_transaction:
  
  spawn-what = GCSPRINTF(domain %d device model, domid);
  spawn-xspath = GCSPRINTF(/local/domain/0/device-model/%d/state, 
 domid);
 +spawn-xspath = libxl__device_model_xs_path(gc, LIBXL_TOOLSTACK_DOMID,
 +domid, /state);

This assigns xspath twice.

Olaf

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH] libxlu: don't crash on empty lists

2015-04-24 Thread Jan Beulich

Prior to 1a09c5113a (libxlu: rework internal representation of
setting) empty lists in config files did get accepted. Restore that
behavior.

Signed-off-by: Jan Beulich jbeul...@suse.com

--- a/tools/libxl/libxlu_cfg.c
+++ b/tools/libxl/libxlu_cfg.c
@@ -397,7 +397,7 @@ XLU_ConfigValue *xlu__cfg_list_mk(CfgPar
 value = malloc(sizeof(*value));
 if (!value) goto xe;
 value-type = XLU_LIST;
-value-u.list.nvalues = 1;
+value-u.list.nvalues = !!val;
 value-u.list.avalues = 1;
 value-u.list.values = values;
 memcpy(value-loc, loc, sizeof(*loc));



libxlu: don't crash on empty lists

Prior to 1a09c5113a (libxlu: rework internal representation of
setting) empty lists in config files did get accepted. Restore that
behavior.

Signed-off-by: Jan Beulich jbeul...@suse.com

--- a/tools/libxl/libxlu_cfg.c
+++ b/tools/libxl/libxlu_cfg.c
@@ -397,7 +397,7 @@ XLU_ConfigValue *xlu__cfg_list_mk(CfgPar
 value = malloc(sizeof(*value));
 if (!value) goto xe;
 value-type = XLU_LIST;
-value-u.list.nvalues = 1;
+value-u.list.nvalues = !!val;
 value-u.list.avalues = 1;
 value-u.list.values = values;
 memcpy(value-loc, loc, sizeof(*loc));
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCHv6 5/5] gnttab: use per-VCPU maptrack free lists

2015-04-24 Thread Andrew Cooper

On 24/04/15 10:50, Jan Beulich wrote:
 On 24.04.15 at 11:09, malcolm.cross...@citrix.com wrote:
 On 23/04/15 17:11, Jan Beulich wrote:
 On 22.04.15 at 18:00, david.vra...@citrix.com wrote:
 --- a/xen/include/xen/grant_table.h
 +++ b/xen/include/xen/grant_table.h
 @@ -60,6 +60,8 @@ struct grant_mapping {
  u32  ref;   /* grant ref */
  u16  flags; /* 0-4: GNTMAP_* ; 5-15: unused */
  domid_t  domid; /* granting domain */
 +u32  vcpu;  /* vcpu which created the grant mapping */
 +u16  pad[2];
  };
 What is this pad[] good for?
 The pad is to keep the struct power of 2 sized because this allows the
 compiler to optimise these macro's to right and left shifts:

 #define MAPTRACK_PER_PAGE (PAGE_SIZE / sizeof(struct grant_mapping))
 #define maptrack_entry(t, e) \
 ((t)-maptrack[(e)/MAPTRACK_PER_PAGE][(e)%MAPTRACK_PER_PAGE])
 Okay, then why u16[2] instead of u32? And please add a brief
 comment explaining the reason.

Most likely because vcpu was a u16 in the first draft, and got bumped
during internal review.


 Apart from that I wonder whether fitting vcpu in the 10 unused
 flags bits (not 11, as the comment on the field suggests) would be
 an option. That would require limiting vCPU count to 4k, which I
 don't think would really be a problem for anyone.

8k VCPU PV guests do function, and are very good at finding scalability
limits.

It would be nice not to break this.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] io.c:164:d25v0 Weird HVM ioemulation status 1. domain_crash called from io.c:165

2015-04-24 Thread Sander Eikelenboom

Hi, 

On Xen-unstable I encountered this non stopping logspam while trying to 
shutdown an HVM domain with 
pci-passthrough:

(XEN) [2015-04-24 11:55:47.802] io.c:164:d25v0 Weird HVM ioemulation status 1.
(XEN) [2015-04-24 11:55:47.802] domain_crash called from io.c:165
(XEN) [2015-04-24 11:55:47.802] io.c:164:d25v0 Weird HVM ioemulation status 1.
(XEN) [2015-04-24 11:55:47.802] domain_crash called from io.c:165
(XEN) [2015-04-24 11:55:47.802] io.c:164:d25v0 Weird HVM ioemulation status 1.
(XEN) [2015-04-24 11:55:47.802] domain_crash called from io.c:165
(XEN) [2015-04-24 11:55:47.802] io.c:164:d25v0 Weird HVM ioemulation status 1.
(XEN) [2015-04-24 11:55:47.802] domain_crash called from io.c:165
(XEN) [2015-04-24 11:55:47.802] io.c:164:d25v0 Weird HVM ioemulation status 1.
(XEN) [2015-04-24 11:55:47.802] domain_crash called from io.c:165
(XEN) [2015-04-24 11:55:47.802] io.c:164:d25v0 Weird HVM ioemulation status 1.
(XEN) [2015-04-24 11:55:47.802] domain_crash called from io.c:165
(XEN) [2015-04-24 11:55:47.802] io.c:164:d25v0 Weird HVM ioemulation status 1.
(XEN) [2015-04-24 11:55:47.802] domain_crash called from io.c:165
(XEN) [2015-04-24 11:55:47.802] io.c:164:d25v0 Weird HVM ioemulation status 1.
(XEN) [2015-04-24 11:55:47.802] domain_crash called from io.c:165
(XEN) [2015-04-24 11:55:47.802] io.c:164:d25v0 Weird HVM ioemulation status 1.
(XEN) [2015-04-24 11:55:47.802] domain_crash called from io.c:165
(XEN) [2015-04-24 11:55:47.802] io.c:164:d25v0 Weird HVM ioemulation status 1.
(XEN) [2015-04-24 11:55:47.802] domain_crash called from io.c:165
(XEN) [2015-04-24 11:55:47.802] io.c:164:d25v0 Weird HVM ioemulation status 1.
 etc. etc. etc.

The domain had shutdown perfectly well on earlier occasions, but this time it
kept running. It did drop from it's assigned 3 vpcu's to 1 vpcu, but it stalled 
there.
The log spam only stopped by destroying the guest.

Since i haven't seen it before, it's probably hard to replicate.

--
Sander


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [v3 04/10] vmx: add new data structure member to support PML

2015-04-24 Thread Kai Huang

A new 4K page pointer is added to arch_vmx_struct as PML buffer for vcpu. And a
new 'status' field is added to vmx_domain to indicate whether PML is enabled for
the domain or not.

Signed-off-by: Kai Huang kai.hu...@linux.intel.com
---
 xen/include/asm-x86/hvm/vmx/vmcs.h | 8 
 1 file changed, 8 insertions(+)

diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h 
b/xen/include/asm-x86/hvm/vmx/vmcs.h
index f831a78..441e974 100644
--- a/xen/include/asm-x86/hvm/vmx/vmcs.h
+++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
@@ -70,8 +70,12 @@ struct ept_data {
 cpumask_var_t synced_mask;
 };
 
+#define _VMX_DOMAIN_PML_ENABLED0
+#define VMX_DOMAIN_PML_ENABLED (1ul  _VMX_DOMAIN_PML_ENABLED)
 struct vmx_domain {
 unsigned long apic_access_mfn;
+/* VMX_DOMAIN_* */
+unsigned int status;
 };
 
 struct pi_desc {
@@ -85,6 +89,8 @@ struct pi_desc {
 #define ept_get_eptp(ept) ((ept)-eptp)
 #define ept_get_synced_mask(ept) ((ept)-synced_mask)
 
+#define NR_PML_ENTRIES   512
+
 struct arch_vmx_struct {
 /* Virtual address of VMCS. */
 struct vmcs_struct  *vmcs;
@@ -142,6 +148,8 @@ struct arch_vmx_struct {
 /* Bitmap to control vmexit policy for Non-root VMREAD/VMWRITE */
 struct page_info *vmread_bitmap;
 struct page_info *vmwrite_bitmap;
+
+struct page_info *pml_pg;
 };
 
 int vmx_create_vmcs(struct vcpu *v);
-- 
2.1.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [v3 02/10] log-dirty: add new paging_mark_gfn_dirty

2015-04-24 Thread Kai Huang

PML logs GPA in PML buffer. Original paging_mark_dirty takes MFN as parameter
but it gets guest pfn internally and use guest pfn to as index for looking up
radix log-dirty tree. In flushing PML buffer, calling paging_mark_dirty directly
introduces redundant p2m lookups (gfn-mfn-gfn), therefore we introduce
paging_mark_gfn_dirty which is bulk of paging_mark_dirty but takes guest pfn as
parameter, and in flushing PML buffer we call paging_mark_gfn_dirty directly.
Original paging_mark_dirty then simply is a wrapper of paging_mark_gfn_dirty.

Signed-off-by: Kai Huang kai.hu...@linux.intel.com
---
 xen/arch/x86/mm/paging.c | 31 +--
 xen/include/asm-x86/paging.h |  2 ++
 2 files changed, 23 insertions(+), 10 deletions(-)

diff --git a/xen/arch/x86/mm/paging.c b/xen/arch/x86/mm/paging.c
index b54d76a..77c929b 100644
--- a/xen/arch/x86/mm/paging.c
+++ b/xen/arch/x86/mm/paging.c
@@ -266,24 +266,17 @@ static int paging_log_dirty_disable(struct domain *d, 
bool_t resuming)
 return ret;
 }
 
-/* Mark a page as dirty */
-void paging_mark_dirty(struct domain *d, unsigned long guest_mfn)
+/* Mark a page as dirty, with taking guest pfn as parameter */
+void paging_mark_gfn_dirty(struct domain *d, unsigned long pfn)
 {
-unsigned long pfn;
-mfn_t gmfn;
 int changed;
 mfn_t mfn, *l4, *l3, *l2;
 unsigned long *l1;
 int i1, i2, i3, i4;
 
-gmfn = _mfn(guest_mfn);
-
-if ( !paging_mode_log_dirty(d) || !mfn_valid(gmfn) ||
- page_get_owner(mfn_to_page(gmfn)) != d )
+if ( !paging_mode_log_dirty(d) )
 return;
 
-/* We /really/ mean PFN here, even for non-translated guests. */
-pfn = get_gpfn_from_mfn(mfn_x(gmfn));
 /* Shared MFNs should NEVER be marked dirty */
 BUG_ON(SHARED_M2P(pfn));
 
@@ -351,6 +344,24 @@ out:
 return;
 }
 
+/* Mark a page as dirty */
+void paging_mark_dirty(struct domain *d, unsigned long guest_mfn)
+{
+unsigned long pfn;
+mfn_t gmfn;
+
+gmfn = _mfn(guest_mfn);
+
+if ( !paging_mode_log_dirty(d) || !mfn_valid(gmfn) ||
+ page_get_owner(mfn_to_page(gmfn)) != d )
+return;
+
+/* We /really/ mean PFN here, even for non-translated guests. */
+pfn = get_gpfn_from_mfn(mfn_x(gmfn));
+
+paging_mark_gfn_dirty(d, pfn);
+}
+
 
 /* Is this guest page dirty? */
 int paging_mfn_is_dirty(struct domain *d, mfn_t gmfn)
diff --git a/xen/include/asm-x86/paging.h b/xen/include/asm-x86/paging.h
index 53de715..c99324c 100644
--- a/xen/include/asm-x86/paging.h
+++ b/xen/include/asm-x86/paging.h
@@ -156,6 +156,8 @@ void paging_log_dirty_init(struct domain *d,
 
 /* mark a page as dirty */
 void paging_mark_dirty(struct domain *d, unsigned long guest_mfn);
+/* mark a page as dirty with taking guest pfn as parameter */
+void paging_mark_gfn_dirty(struct domain *d, unsigned long pfn);
 
 /* is this guest page dirty? 
  * This is called from inside paging code, with the paging lock held. */
-- 
2.1.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [v3 09/10] log-dirty: refine common code to support PML

2015-04-24 Thread Kai Huang

Using PML, it's possible there are dirty GPAs logged in vcpus' PML buffers
when userspace peek/clear dirty pages, therefore we need to flush them befor
reporting dirty pages to userspace. This applies to both video ram tracking and
paging_log_dirty_op.

This patch adds new p2m layer functions to enable/disable PML and flush PML
buffers. The new functions are named to be generic to cover potential futher
PML-like features for other platforms.

Signed-off-by: Kai Huang kai.hu...@linux.intel.com
---
 xen/arch/x86/mm/hap/hap.c | 29 +
 xen/arch/x86/mm/p2m.c | 36 
 xen/arch/x86/mm/paging.c  | 10 ++
 xen/include/asm-x86/p2m.h | 11 +++
 4 files changed, 82 insertions(+), 4 deletions(-)

diff --git a/xen/arch/x86/mm/hap/hap.c b/xen/arch/x86/mm/hap/hap.c
index 4ecb2e2..1099670 100644
--- a/xen/arch/x86/mm/hap/hap.c
+++ b/xen/arch/x86/mm/hap/hap.c
@@ -121,7 +121,10 @@ int hap_track_dirty_vram(struct domain *d,
 p2m_change_type_range(d, ostart, oend,
   p2m_ram_logdirty, p2m_ram_rw);
 
-/* set l1e entries of range within P2M table to be read-only. */
+/*
+ * switch vram to log dirty mode, either by setting l1e entries of
+ * P2M table to be read-only, or via hardware-assisted log-dirty.
+ */
 p2m_change_type_range(d, begin_pfn, begin_pfn + nr,
   p2m_ram_rw, p2m_ram_logdirty);
 
@@ -135,6 +138,9 @@ int hap_track_dirty_vram(struct domain *d,
 
 domain_pause(d);
 
+/* flush dirty GFNs potentially cached by hardware */
+p2m_flush_hardware_cached_dirty(d);
+
 /* get the bitmap */
 paging_log_dirty_range(d, begin_pfn, nr, dirty_bitmap);
 
@@ -190,9 +196,15 @@ static int hap_enable_log_dirty(struct domain *d, bool_t 
log_global)
 d-arch.paging.mode |= PG_log_dirty;
 paging_unlock(d);
 
+/* enable hardware-assisted log-dirty if it is supported */
+p2m_enable_hardware_log_dirty(d);
+
 if ( log_global )
 {
-/* set l1e entries of P2M table to be read-only. */
+/*
+ * switch to log dirty mode, either by setting l1e entries of P2M table
+ * to be read-only, or via hardware-assisted log-dirty.
+ */
 p2m_change_entry_type_global(d, p2m_ram_rw, p2m_ram_logdirty);
 flush_tlb_mask(d-domain_dirty_cpumask);
 }
@@ -205,14 +217,23 @@ static int hap_disable_log_dirty(struct domain *d)
 d-arch.paging.mode = ~PG_log_dirty;
 paging_unlock(d);
 
-/* set l1e entries of P2M table with normal mode */
+/* disable hardware-assisted log-dirty if it is supported */
+p2m_disable_hardware_log_dirty(d);
+
+/*
+ * switch to normal mode, either by setting l1e entries of P2M table to
+ * normal mode, or via hardware-assisted log-dirty.
+ */
 p2m_change_entry_type_global(d, p2m_ram_logdirty, p2m_ram_rw);
 return 0;
 }
 
 static void hap_clean_dirty_bitmap(struct domain *d)
 {
-/* set l1e entries of P2M table to be read-only. */
+/*
+ * switch to log-dirty mode, either by setting l1e entries of P2M table to
+ * be read-only, or via hardware-assisted log-dirty.
+ */
 p2m_change_entry_type_global(d, p2m_ram_rw, p2m_ram_logdirty);
 flush_tlb_mask(d-domain_dirty_cpumask);
 }
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index df4a485..67edf89 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -239,6 +239,42 @@ void p2m_memory_type_changed(struct domain *d)
 }
 }
 
+void p2m_enable_hardware_log_dirty(struct domain *d)
+{
+struct p2m_domain *p2m = p2m_get_hostp2m(d);
+
+if ( p2m-enable_hardware_log_dirty )
+{
+p2m_lock(p2m);
+p2m-enable_hardware_log_dirty(p2m);
+p2m_unlock(p2m);
+}
+}
+
+void p2m_disable_hardware_log_dirty(struct domain *d)
+{
+struct p2m_domain *p2m = p2m_get_hostp2m(d);
+
+if ( p2m-disable_hardware_log_dirty )
+{
+p2m_lock(p2m);
+p2m-disable_hardware_log_dirty(p2m);
+p2m_unlock(p2m);
+}
+}
+
+void p2m_flush_hardware_cached_dirty(struct domain *d)
+{
+struct p2m_domain *p2m = p2m_get_hostp2m(d);
+
+if ( p2m-flush_hardware_cached_dirty )
+{
+p2m_lock(p2m);
+p2m-flush_hardware_cached_dirty(p2m);
+p2m_unlock(p2m);
+}
+}
+
 mfn_t __get_gfn_type_access(struct p2m_domain *p2m, unsigned long gfn,
 p2m_type_t *t, p2m_access_t *a, p2m_query_t q,
 unsigned int *page_order, bool_t locked)
diff --git a/xen/arch/x86/mm/paging.c b/xen/arch/x86/mm/paging.c
index 77c929b..59d4720 100644
--- a/xen/arch/x86/mm/paging.c
+++ b/xen/arch/x86/mm/paging.c
@@ -422,7 +422,17 @@ static int paging_log_dirty_op(struct domain *d,
 int i4, i3, i2;
 
 if ( !resuming )
+{
 domain_pause(d);
+
+/*
+ * Flush

[Xen-devel] [v3 01/10] vmx: add new boot parameter to control PML enabling

2015-04-24 Thread Kai Huang

A top level EPT parameter ept=options and a sub boolean opt_pml_enabled
are added to control PML. Other booleans can be further added for any other EPT
related features.

The document description for the new parameter is also added.

Signed-off-by: Kai Huang kai.hu...@linux.intel.com
---
 docs/misc/xen-command-line.markdown | 15 +++
 xen/arch/x86/hvm/vmx/vmcs.c | 30 ++
 2 files changed, 45 insertions(+)

diff --git a/docs/misc/xen-command-line.markdown 
b/docs/misc/xen-command-line.markdown
index 1dda1f0..4889e27 100644
--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -685,6 +685,21 @@ requirement can be relaxed.  This option is particularly 
useful for nested
 virtualization, to allow the L1 hypervisor to use EPT even if the L0 hypervisor
 does not provide VM\_ENTRY\_LOAD\_GUEST\_PAT.
 
+### ept (Intel)
+ `= List of ( pmlboolean )`
+
+ Default: `false`
+
+Controls EPT related features. Currently only Page Modification Logging (PML) 
is
+the controllable feature as boolean type.
+
+PML is a new hardware feature in Intel's Broadwell Server and further platforms
+which reduces hypervisor overhead of log-dirty mechanism by automatically
+recording GPAs (guest physical addresses) when guest memory gets dirty, and
+therefore significantly reducing number of EPT violation caused by write
+protection of guest memory, which is a necessity to implement log-dirty
+mechanism before PML.
+
 ### gdb
  `= baud[/clock_hz][,DPS[,io-base[,irq[,port-bdf[,bridge-bdf 
  | pci | amt ] `
 
diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index 63007a9..79efa42 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -64,6 +64,36 @@ integer_param(ple_gap, ple_gap);
 static unsigned int __read_mostly ple_window = 4096;
 integer_param(ple_window, ple_window);
 
+static bool_t __read_mostly opt_pml_enabled = 0;
+
+/*
+ * The 'ept' parameter controls functionalities that depend on, or impact the
+ * EPT mechanism. Optional comma separated value may contain:
+ *
+ *  pml Enable PML
+ */
+static void __init parse_ept_param(char *s)
+{
+char *ss;
+
+do {
+bool_t val = !!strncmp(s, no-, 3);
+if ( !val )
+s += 3;
+
+ss = strchr(s, ',');
+if ( ss )
+*ss = '\0';
+
+if ( !strcmp(s, pml) )
+opt_pml_enabled = val;
+
+s = ss + 1;
+} while ( ss );
+}
+
+custom_param(ept, parse_ept_param);
+
 /* Dynamic (run-time adjusted) execution control flags. */
 u32 vmx_pin_based_exec_control __read_mostly;
 u32 vmx_cpu_based_exec_control __read_mostly;
-- 
2.1.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [v3 03/10] vmx: add PML definition and feature detection

2015-04-24 Thread Kai Huang

The patch adds PML definition and feature detection. Note PML won't be detected
if PML is disabled from boot parameter. PML is also disabled in construct_vmcs,
as it will only be enabled when domain is switched to log dirty mode.

Signed-off-by: Kai Huang kai.hu...@linux.intel.com
---
 xen/arch/x86/hvm/vmx/vmcs.c| 18 ++
 xen/include/asm-x86/hvm/vmx/vmcs.h |  6 ++
 xen/include/asm-x86/hvm/vmx/vmx.h  |  1 +
 3 files changed, 25 insertions(+)

diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index 79efa42..04fdca3 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -140,6 +140,7 @@ static void __init vmx_display_features(void)
 P(cpu_has_vmx_virtual_intr_delivery, Virtual Interrupt Delivery);
 P(cpu_has_vmx_posted_intr_processing, Posted Interrupt Processing);
 P(cpu_has_vmx_vmcs_shadowing, VMCS shadowing);
+P(cpu_has_vmx_pml, Page Modification Logging);
 #undef P
 
 if ( !printed )
@@ -237,6 +238,8 @@ static int vmx_init_vmcs_config(void)
 opt |= SECONDARY_EXEC_ENABLE_VPID;
 if ( opt_unrestricted_guest_enabled )
 opt |= SECONDARY_EXEC_UNRESTRICTED_GUEST;
+if ( opt_pml_enabled )
+opt |= SECONDARY_EXEC_ENABLE_PML;
 
 /*
  * APIC Register Virtualization and Virtual Interrupt Delivery
@@ -283,6 +286,10 @@ static int vmx_init_vmcs_config(void)
  */
 if ( !(_vmx_ept_vpid_cap  VMX_VPID_INVVPID_ALL_CONTEXT) )
 _vmx_secondary_exec_control = ~SECONDARY_EXEC_ENABLE_VPID;
+
+/* EPT A/D bits is required for PML */
+if ( !(_vmx_ept_vpid_cap  VMX_EPT_AD_BIT) )
+_vmx_secondary_exec_control = ~SECONDARY_EXEC_ENABLE_PML;
 }
 
 if ( _vmx_secondary_exec_control  SECONDARY_EXEC_ENABLE_EPT )
@@ -303,6 +310,14 @@ static int vmx_init_vmcs_config(void)
   SECONDARY_EXEC_UNRESTRICTED_GUEST);
 }
 
+/* PML cannot be supported if EPT is not used */
+if ( !(_vmx_secondary_exec_control  SECONDARY_EXEC_ENABLE_EPT) )
+_vmx_secondary_exec_control = ~SECONDARY_EXEC_ENABLE_PML;
+
+/* Turn off opt_pml_enabled if PML feature is not present */
+if ( !(_vmx_secondary_exec_control  SECONDARY_EXEC_ENABLE_PML) )
+opt_pml_enabled = 0;
+
 if ( (_vmx_secondary_exec_control  SECONDARY_EXEC_PAUSE_LOOP_EXITING) 
   ple_gap == 0 )
 {
@@ -1038,6 +1053,9 @@ static int construct_vmcs(struct vcpu *v)
 __vmwrite(POSTED_INTR_NOTIFICATION_VECTOR, posted_intr_vector);
 }
 
+/* Disable PML anyway here as it will only be enabled in log dirty mode */
+v-arch.hvm_vmx.secondary_exec_control = ~SECONDARY_EXEC_ENABLE_PML;
+
 /* Host data selectors. */
 __vmwrite(HOST_SS_SELECTOR, __HYPERVISOR_DS);
 __vmwrite(HOST_DS_SELECTOR, __HYPERVISOR_DS);
diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h 
b/xen/include/asm-x86/hvm/vmx/vmcs.h
index 6fce6aa..f831a78 100644
--- a/xen/include/asm-x86/hvm/vmx/vmcs.h
+++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
@@ -215,6 +215,7 @@ extern u32 vmx_vmentry_control;
 #define SECONDARY_EXEC_ENABLE_INVPCID   0x1000
 #define SECONDARY_EXEC_ENABLE_VMFUNC0x2000
 #define SECONDARY_EXEC_ENABLE_VMCS_SHADOWING0x4000
+#define SECONDARY_EXEC_ENABLE_PML   0x0002
 extern u32 vmx_secondary_exec_control;
 
 #define VMX_EPT_EXEC_ONLY_SUPPORTED 0x0001
@@ -226,6 +227,7 @@ extern u32 vmx_secondary_exec_control;
 #define VMX_EPT_INVEPT_INSTRUCTION  0x0010
 #define VMX_EPT_INVEPT_SINGLE_CONTEXT   0x0200
 #define VMX_EPT_INVEPT_ALL_CONTEXT  0x0400
+#define VMX_EPT_AD_BIT  0x0020
 
 #define VMX_MISC_VMWRITE_ALL0x2000
 
@@ -274,6 +276,8 @@ extern u32 vmx_secondary_exec_control;
 (vmx_pin_based_exec_control  PIN_BASED_POSTED_INTERRUPT)
 #define cpu_has_vmx_vmcs_shadowing \
 (vmx_secondary_exec_control  SECONDARY_EXEC_ENABLE_VMCS_SHADOWING)
+#define cpu_has_vmx_pml \
+(vmx_secondary_exec_control  SECONDARY_EXEC_ENABLE_PML)
 
 #define VMCS_RID_TYPE_MASK  0x8000
 
@@ -318,6 +322,7 @@ enum vmcs_field {
 GUEST_LDTR_SELECTOR = 0x080c,
 GUEST_TR_SELECTOR   = 0x080e,
 GUEST_INTR_STATUS   = 0x0810,
+GUEST_PML_INDEX = 0x0812,
 HOST_ES_SELECTOR= 0x0c00,
 HOST_CS_SELECTOR= 0x0c02,
 HOST_SS_SELECTOR= 0x0c04,
@@ -331,6 +336,7 @@ enum vmcs_field {
 VM_EXIT_MSR_STORE_ADDR  = 0x2006,
 VM_EXIT_MSR_LOAD_ADDR   = 0x2008,
 VM_ENTRY_MSR_LOAD_ADDR  = 0x200a,
+PML_ADDRESS = 0x200e,
 TSC_OFFSET  = 0x2010,
 VIRTUAL_APIC_PAGE_ADDR  = 0x2012,
 APIC_ACCESS_ADDR= 0x2014,
diff --git

[Xen-devel] [v3 08/10] vmx: disable PML in vmx_vcpu_destroy

2015-04-24 Thread Kai Huang

It's possible domain still remains in log-dirty mode when it is about to be
destroyed, in which case we should manually disable PML for it.

Signed-off-by: Kai Huang kai.hu...@linux.intel.com
---
 xen/arch/x86/hvm/vmx/vmx.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index e5471b8..e189424 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -153,6 +153,14 @@ static int vmx_vcpu_initialise(struct vcpu *v)
 
 static void vmx_vcpu_destroy(struct vcpu *v)
 {
+/*
+ * There are cases that domain still remains in log-dirty mode when it is
+ * about to be destroyed (ex, user types 'xl destroy dom'), in which case
+ * we should disable PML manually here. Note that vmx_vcpu_destroy is 
called
+ * prior to vmx_domain_destroy so we need to disable PML for each vcpu
+ * separately here.
+ */
+vmx_vcpu_disable_pml(v);
 vmx_destroy_vmcs(v);
 vpmu_destroy(v);
 passive_domain_destroy(v);
-- 
2.1.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [v3 00/10] PML (Page Modification Logging) support

2015-04-24 Thread Kai Huang

v2-v3:

- Merged v2 patch 02 (document change) to patch 01 as a single patch, and
  changed new parameter description as suggested by Andrew.
- changed vmx_vcpu_flush_pml_buffer to call mark_dirty for all logged GFNs, and
  call p2m_change_type_one regardless of return value.
- Added ASSERT for vcpu (being current, or being non-running and unrunnable) to
  vmx_vcpu_flush_pml_buffer
- Other refinement in coding style, comments description, etc.

Sanity test of live migration has been tested both with and without PML.

Kai Huang (10):
  vmx: add new boot parameter to control PML enabling
  log-dirty: add new paging_mark_gfn_dirty
  vmx: add PML definition and feature detection
  vmx: add new data structure member to support PML
  vmx: add help functions to support PML
  vmx: handle PML buffer full VMEXIT
  vmx: handle PML enabling in vmx_vcpu_initialise
  vmx: disable PML in vmx_vcpu_destroy
  log-dirty: refine common code to support PML
  p2m/ept: enable PML in p2m-ept for log-dirty

 docs/misc/xen-command-line.markdown |  15 +++
 xen/arch/x86/hvm/vmx/vmcs.c | 227 
 xen/arch/x86/hvm/vmx/vmx.c  |  35 ++
 xen/arch/x86/mm/hap/hap.c   |  29 -
 xen/arch/x86/mm/p2m-ept.c   |  79 +++--
 xen/arch/x86/mm/p2m.c   |  36 ++
 xen/arch/x86/mm/paging.c|  41 +--
 xen/include/asm-x86/hvm/vmx/vmcs.h  |  26 -
 xen/include/asm-x86/hvm/vmx/vmx.h   |   4 +-
 xen/include/asm-x86/p2m.h   |  11 ++
 xen/include/asm-x86/paging.h|   2 +
 11 files changed, 482 insertions(+), 23 deletions(-)

-- 
2.1.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [v3 07/10] vmx: handle PML enabling in vmx_vcpu_initialise

2015-04-24 Thread Kai Huang

It's possible domain has already been in log-dirty mode when creating vcpu, in
which case we should enable PML for this vcpu if PML has been enabled for the
domain.

Signed-off-by: Kai Huang kai.hu...@linux.intel.com
---
 xen/arch/x86/hvm/vmx/vmx.c | 23 +++
 1 file changed, 23 insertions(+)

diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index f11ac46..e5471b8 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -117,6 +117,29 @@ static int vmx_vcpu_initialise(struct vcpu *v)
 return rc;
 }
 
+/*
+ * It's rare but still possible that domain has already been in log-dirty
+ * mode when vcpu is being created (commented by Tim), in which case we
+ * should enable PML for this vcpu if PML has been enabled for the domain,
+ * and failure to enable results in failure of creating this vcpu.
+ *
+ * Note even there's no vcpu created for the domain, vmx_domain_enable_pml
+ * will return successful in which case vmx_domain_pml_enabled will also
+ * return true. And even this is the first vcpu to be created with
+ * vmx_domain_pml_enabled being true, failure of enabling PML still results
+ * in failure of creating vcpu, to avoid complicated logic to revert PML
+ * style EPT table to non-PML style EPT table.
+ */
+if ( vmx_domain_pml_enabled(v-domain) )
+{
+if ( (rc = vmx_vcpu_enable_pml(v)) != 0 )
+{
+dprintk(XENLOG_ERR, %pv: Failed to enable PML.\n, v);
+vmx_destroy_vmcs(v);
+return rc;
+}
+}
+
 vpmu_initialise(v);
 
 vmx_install_vlapic_mapping(v);
-- 
2.1.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [v3 05/10] vmx: add help functions to support PML

2015-04-24 Thread Kai Huang

This patch adds help functions to enable/disable PML, and flush PML buffer for
single vcpu and particular domain for further use.

Signed-off-by: Kai Huang kai.hu...@linux.intel.com
---
 xen/arch/x86/hvm/vmx/vmcs.c| 179 +
 xen/include/asm-x86/hvm/vmx/vmcs.h |   9 ++
 2 files changed, 188 insertions(+)

diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index 04fdca3..f797fde 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -1323,6 +1323,185 @@ void vmx_clear_eoi_exit_bitmap(struct vcpu *v, u8 
vector)
 v-arch.hvm_vmx.eoi_exitmap_changed);
 }
 
+bool_t vmx_vcpu_pml_enabled(const struct vcpu *v)
+{
+return !!(v-arch.hvm_vmx.secondary_exec_control 
+  SECONDARY_EXEC_ENABLE_PML);
+}
+
+int vmx_vcpu_enable_pml(struct vcpu *v)
+{
+if ( vmx_vcpu_pml_enabled(v) )
+return 0;
+
+v-arch.hvm_vmx.pml_pg = v-domain-arch.paging.alloc_page(v-domain);
+if ( !v-arch.hvm_vmx.pml_pg )
+return -ENOMEM;
+
+vmx_vmcs_enter(v);
+
+__vmwrite(PML_ADDRESS, page_to_mfn(v-arch.hvm_vmx.pml_pg)  PAGE_SHIFT);
+__vmwrite(GUEST_PML_INDEX, NR_PML_ENTRIES - 1);
+
+v-arch.hvm_vmx.secondary_exec_control |= SECONDARY_EXEC_ENABLE_PML;
+
+__vmwrite(SECONDARY_VM_EXEC_CONTROL,
+  v-arch.hvm_vmx.secondary_exec_control);
+
+vmx_vmcs_exit(v);
+
+return 0;
+}
+
+void vmx_vcpu_disable_pml(struct vcpu *v)
+{
+if ( !vmx_vcpu_pml_enabled(v) )
+return;
+
+/* Make sure we don't lose any logged GPAs */
+vmx_vcpu_flush_pml_buffer(v);
+
+vmx_vmcs_enter(v);
+
+v-arch.hvm_vmx.secondary_exec_control = ~SECONDARY_EXEC_ENABLE_PML;
+__vmwrite(SECONDARY_VM_EXEC_CONTROL,
+  v-arch.hvm_vmx.secondary_exec_control);
+
+vmx_vmcs_exit(v);
+
+v-domain-arch.paging.free_page(v-domain, v-arch.hvm_vmx.pml_pg);
+v-arch.hvm_vmx.pml_pg = NULL;
+}
+
+void vmx_vcpu_flush_pml_buffer(struct vcpu *v)
+{
+uint64_t *pml_buf;
+unsigned long pml_idx;
+
+ASSERT((v == current) || (!vcpu_runnable(v)  !v-is_running));
+ASSERT(vmx_vcpu_pml_enabled(v));
+
+vmx_vmcs_enter(v);
+
+__vmread(GUEST_PML_INDEX, pml_idx);
+
+/* Do nothing if PML buffer is empty */
+if ( pml_idx == (NR_PML_ENTRIES - 1) )
+goto out;
+
+pml_buf = __map_domain_page(v-arch.hvm_vmx.pml_pg);
+
+/*
+ * PML index can be either 2^16-1 (buffer is full), or 0 ~ NR_PML_ENTRIES-1
+ * (buffer is not full), and in latter case PML index always points to next
+ * available entity.
+ */
+if (pml_idx = NR_PML_ENTRIES)
+pml_idx = 0;
+else
+pml_idx++;
+
+for ( ; pml_idx  NR_PML_ENTRIES; pml_idx++ )
+{
+unsigned long gfn = pml_buf[pml_idx]  PAGE_SHIFT;
+
+/*
+ * Need to change type from log-dirty to normal memory for logged GFN.
+ * hap_track_dirty_vram depends on it to work. And we mark all logged
+ * GFNs to be dirty, as we cannot be sure whether it's safe to ignore
+ * GFNs on which p2m_change_type_one returns failure. The failure cases
+ * are very rare, and additional cost is negligible, but a missing mark
+ * is extremely difficult to debug.
+ */
+p2m_change_type_one(v-domain, gfn, p2m_ram_logdirty, p2m_ram_rw);
+paging_mark_gfn_dirty(v-domain, gfn);
+}
+
+unmap_domain_page(pml_buf);
+
+/* Reset PML index */
+__vmwrite(GUEST_PML_INDEX, NR_PML_ENTRIES - 1);
+
+ out:
+vmx_vmcs_exit(v);
+}
+
+bool_t vmx_domain_pml_enabled(const struct domain *d)
+{
+return !!(d-arch.hvm_domain.vmx.status  VMX_DOMAIN_PML_ENABLED);
+}
+
+/*
+ * This function enables PML for particular domain. It should be called when
+ * domain is paused.
+ *
+ * PML needs to be enabled globally for all vcpus of the domain, as PML buffer
+ * and PML index are pre-vcpu, but EPT table is shared by vcpus, therefore
+ * enabling PML on partial vcpus won't work.
+ */
+int vmx_domain_enable_pml(struct domain *d)
+{
+struct vcpu *v;
+int rc;
+
+ASSERT(atomic_read(d-pause_count));
+
+if ( vmx_domain_pml_enabled(d) )
+return 0;
+
+for_each_vcpu( d, v )
+if ( (rc = vmx_vcpu_enable_pml(v)) != 0 )
+goto error;
+
+d-arch.hvm_domain.vmx.status |= VMX_DOMAIN_PML_ENABLED;
+
+return 0;
+
+ error:
+for_each_vcpu( d, v )
+if ( vmx_vcpu_pml_enabled(v) )
+vmx_vcpu_disable_pml(v);
+return rc;
+}
+
+/*
+ * Disable PML for particular domain. Called when domain is paused.
+ *
+ * The same as enabling PML for domain, disabling PML should be done for all
+ * vcpus at once.
+ */
+void vmx_domain_disable_pml(struct domain *d)
+{
+struct vcpu *v;
+
+ASSERT(atomic_read(d-pause_count));
+
+if ( !vmx_domain_pml_enabled(d) )
+return;
+
+for_each_vcpu( d, v )
+vmx_vcpu_disable_pml(v);
+
+d-arch.hvm_domain.vmx.status = ~VMX_DOMAIN_PML_ENABLED;
+}

[Xen-devel] [PATCH 6/7] tools: replace private LIBDIR with automake libdir

2015-04-24 Thread Olaf Hering

The result of this command:
 git grep -wnl LIBDIR | xargs sed -i 's@LIBDIR@libdir@g'

Signed-off-by: Olaf Hering o...@aepfle.de
Cc: Ian Jackson ian.jack...@eu.citrix.com
Cc: Stefano Stabellini stefano.stabell...@eu.citrix.com
Cc: Ian Campbell ian.campb...@citrix.com
Cc: Wei Liu wei.l...@citrix.com
---
 Config.mk|  4 ++--
 config/Paths.mk.in   |  1 -
 tools/Makefile   | 10 +-
 tools/blktap2/control/Makefile   |  8 
 tools/blktap2/vhd/lib/Makefile   |  2 +-
 tools/hotplug/FreeBSD/rc.d/xencommons.in |  2 +-
 tools/hotplug/Linux/xen-hotplug-common.sh.in |  2 +-
 tools/hotplug/NetBSD/rc.d/xen-watchdog   |  2 +-
 tools/hotplug/NetBSD/rc.d/xencommons.in  |  2 +-
 tools/hotplug/NetBSD/rc.d/xendomains |  2 +-
 tools/libfsimage/Rules.mk|  2 +-
 tools/libfsimage/common/Makefile |  8 
 tools/libvchan/Makefile  | 10 +-
 tools/libxc/Makefile | 18 +-
 tools/libxl/Makefile | 18 +-
 tools/xenstat/libxenstat/Makefile|  8 
 tools/xenstore/Makefile  | 10 +-
 17 files changed, 54 insertions(+), 55 deletions(-)

diff --git a/Config.mk b/Config.mk
index ec16961..46928ca 100644
--- a/Config.mk
+++ b/Config.mk
@@ -157,7 +157,7 @@ define move-if-changed
if ! cmp -s $(1) $(2); then mv -f $(1) $(2); else rm -f $(1); fi
 endef
 
-BUILD_MAKE_VARS := sbindir bindir LIBEXEC LIBEXEC_BIN LIBDIR SHAREDIR \
+BUILD_MAKE_VARS := sbindir bindir LIBEXEC LIBEXEC_BIN libdir SHAREDIR \
XENFIRMWAREDIR XEN_CONFIG_DIR XEN_SCRIPT_DIR XEN_LOCK_DIR \
XEN_RUN_DIR XEN_PAGING_DIR
 
@@ -204,7 +204,7 @@ CFLAGS += $(foreach i, $(EXTRA_INCLUDES), -I$(i))
 LDFLAGS += $(foreach i, $(PREPEND_LIB), -L$(i))
 CFLAGS += $(foreach i, $(PREPEND_INCLUDES), -I$(i))
 ifeq ($(XEN_TOOLS_RPATH),y)
-LDFLAGS += -Wl,-rpath,$(LIBDIR)
+LDFLAGS += -Wl,-rpath,$(libdir)
 endif
 APPEND_LDFLAGS += $(foreach i, $(APPEND_LIB), -L$(i))
 APPEND_CFLAGS += $(foreach i, $(APPEND_INCLUDES), -I$(i))
diff --git a/config/Paths.mk.in b/config/Paths.mk.in
index 8ca02bb..6db108a 100644
--- a/config/Paths.mk.in
+++ b/config/Paths.mk.in
@@ -37,7 +37,6 @@ LIBEXEC_INC  := $(LIBEXEC)/include
 SHAREDIR := @SHAREDIR@
 MAN1DIR  := $(mandir)/man1
 MAN8DIR  := $(mandir)/man8
-LIBDIR   := $(libdir)
 DOCDIR   := $(docdir)
 
 XEN_RUN_DIR  := @XEN_RUN_DIR@
diff --git a/tools/Makefile b/tools/Makefile
index 1e0c0a7..383d4ca 100644
--- a/tools/Makefile
+++ b/tools/Makefile
@@ -89,9 +89,9 @@ uninstall:
rm -rf $(D)$(includedir)/xen
rm -rf $(D)$(includedir)/_libxl* $(D)$(includedir)/libxl*
rm -rf $(D)$(includedir)/xenstat.h $(D)$(includedir)/xentoollog.h
-   rm -rf $(D)$(LIBDIR)/libxenctrl* $(D)$(LIBDIR)/libxenguest*
-   rm -rf $(D)$(LIBDIR)/libxenstore* $(D)$(LIBDIR)/libxlutil*
-   rm -rf $(D)$(LIBDIR)/python/xen $(D)$(LIBDIR)/python/grub
+   rm -rf $(D)$(libdir)/libxenctrl* $(D)$(libdir)/libxenguest*
+   rm -rf $(D)$(libdir)/libxenstore* $(D)$(libdir)/libxlutil*
+   rm -rf $(D)$(libdir)/python/xen $(D)$(libdir)/python/grub
rm -rf $(D)$(LIBEXEC)
rm -rf $(D)$(sbindir)/setmask
rm -rf $(D)$(sbindir)/xen* $(D)$(sbindir)/netfix $(D)$(sbindir)/xm
@@ -117,8 +117,8 @@ IOEMU_CONFIGURE_CROSS ?= --cross-prefix=$(CROSS_COMPILE) \
 endif
 
 ifeq ($(XEN_TOOLS_RPATH),y)
-QEMU_UPSTREAM_RPATH := -Wl,-rpath,$(LIBEXEC_LIB):$(LIBDIR)
-IOEMU_EXTRA_LDFLAGS := --extra-ldflags=-Wl,-rpath,$(LIBDIR)
+QEMU_UPSTREAM_RPATH := -Wl,-rpath,$(LIBEXEC_LIB):$(libdir)
+IOEMU_EXTRA_LDFLAGS := --extra-ldflags=-Wl,-rpath,$(libdir)
 else
 QEMU_UPSTREAM_RPATH := -Wl,-rpath,$(LIBEXEC_LIB)
 IOEMU_EXTRA_LDFLAGS :=
diff --git a/tools/blktap2/control/Makefile b/tools/blktap2/control/Makefile
index 9ae2e51..767f52a 100644
--- a/tools/blktap2/control/Makefile
+++ b/tools/blktap2/control/Makefile
@@ -63,10 +63,10 @@ $(LIB_SHARED): $(CTL_PICS)
 install: $(IBIN) $(LIB_STATIC) $(LIB_SHARED)
$(INSTALL_DIR) -p $(DESTDIR)$(sbindir)
$(INSTALL_PROG) $(IBIN) $(DESTDIR)$(sbindir)
-   $(INSTALL_DATA) $(LIB_STATIC) $(DESTDIR)$(LIBDIR)
-   $(INSTALL_PROG) $(LIB_SHARED) $(DESTDIR)$(LIBDIR)
-   ln -sf $(LIBSONAME) $(DESTDIR)$(LIBDIR)/$(LIBNAME).so
-   ln -sf $(LIB_SHARED) $(DESTDIR)$(LIBDIR)/$(LIBSONAME)
+   $(INSTALL_DATA) $(LIB_STATIC) $(DESTDIR)$(libdir)
+   $(INSTALL_PROG) $(LIB_SHARED) $(DESTDIR)$(libdir)
+   ln -sf $(LIBSONAME) $(DESTDIR)$(libdir)/$(LIBNAME).so
+   ln -sf $(LIB_SHARED) $(DESTDIR)$(libdir)/$(LIBSONAME)
 
 clean:
rm -f $(OBJS) $(PICS) $(DEPS) $(IBIN) $(LIB_STATIC) $(LIB_SHARED)
diff --git a/tools/blktap2/vhd/lib/Makefile

[Xen-devel] [PATCH 1/7] tools: replace private SBINDIR with automake sbindir

2015-04-24 Thread Olaf Hering

The result of this command:
 git grep -wnl SBINDIR | xargs sed -i 's@SBINDIR@sbindir@g'

Signed-off-by: Olaf Hering o...@aepfle.de
Cc: Ian Jackson ian.jack...@eu.citrix.com
Cc: Stefano Stabellini stefano.stabell...@eu.citrix.com
Cc: Ian Campbell ian.campb...@citrix.com
Cc: Wei Liu wei.l...@citrix.com
Cc: Mukesh Rathor mukesh.rat...@oracle.com
Cc: Tim Deegan t...@xen.org
Cc: Daniel De Graaf dgde...@tycho.nsa.gov
Cc: David Scott dave.sc...@eu.citrix.com
---
 Config.mk|  2 +-
 config/Paths.mk.in   |  1 -
 tools/Makefile   |  6 +++---
 tools/blktap2/control/Makefile   |  4 ++--
 tools/blktap2/drivers/Makefile   |  2 +-
 tools/blktap2/vhd/Makefile   |  2 +-
 tools/console/Makefile   |  4 ++--
 tools/debugger/gdbsx/Makefile|  4 ++--
 tools/debugger/kdd/Makefile  |  4 ++--
 tools/flask/utils/Makefile   |  4 ++--
 tools/hotplug/FreeBSD/rc.d/xencommons.in | 12 ++--
 tools/hotplug/FreeBSD/vif-bridge |  2 +-
 tools/hotplug/Linux/init.d/xen-watchdog.in   |  2 +-
 tools/hotplug/Linux/init.d/xencommons.in |  2 +-
 tools/hotplug/Linux/xen-hotplug-common.sh.in |  2 +-
 tools/hotplug/Linux/xendomains.in|  2 +-
 tools/hotplug/NetBSD/block   |  2 +-
 tools/hotplug/NetBSD/rc.d/xen-watchdog   |  4 ++--
 tools/hotplug/NetBSD/rc.d/xencommons.in  | 12 ++--
 tools/hotplug/NetBSD/rc.d/xendomains |  2 +-
 tools/hotplug/NetBSD/vif-bridge  |  2 +-
 tools/hotplug/NetBSD/vif-ip  |  2 +-
 tools/libxl/Makefile |  4 ++--
 tools/misc/Makefile  |  4 ++--
 tools/ocaml/xenstored/Makefile   |  4 ++--
 tools/tests/mce-test/tools/Makefile  |  2 +-
 tools/xenbackendd/Makefile   |  4 ++--
 tools/xenmon/Makefile|  8 
 tools/xenpmd/Makefile|  4 ++--
 tools/xenstat/xentop/Makefile|  4 ++--
 tools/xenstore/Makefile  |  4 ++--
 31 files changed, 58 insertions(+), 59 deletions(-)

diff --git a/Config.mk b/Config.mk
index 28b77d6..abb1e44 100644
--- a/Config.mk
+++ b/Config.mk
@@ -157,7 +157,7 @@ define move-if-changed
if ! cmp -s $(1) $(2); then mv -f $(1) $(2); else rm -f $(1); fi
 endef
 
-BUILD_MAKE_VARS := SBINDIR BINDIR LIBEXEC LIBEXEC_BIN LIBDIR SHAREDIR \
+BUILD_MAKE_VARS := sbindir BINDIR LIBEXEC LIBEXEC_BIN LIBDIR SHAREDIR \
XENFIRMWAREDIR XEN_CONFIG_DIR XEN_SCRIPT_DIR XEN_LOCK_DIR \
XEN_RUN_DIR XEN_PAGING_DIR
 
diff --git a/config/Paths.mk.in b/config/Paths.mk.in
index 150bae7..26674b1 100644
--- a/config/Paths.mk.in
+++ b/config/Paths.mk.in
@@ -31,7 +31,6 @@ sysconfdir   := @sysconfdir@
 
 PREFIX   := $(prefix)
 
-SBINDIR  := $(sbindir)
 BINDIR   := $(bindir)
 LIBEXEC  := $(libexecdir)/$(PACKAGE_TARNAME)
 LIBEXEC_BIN  := @LIBEXEC_BIN@
diff --git a/tools/Makefile b/tools/Makefile
index 966354a..77625cb 100644
--- a/tools/Makefile
+++ b/tools/Makefile
@@ -71,7 +71,7 @@ uninstall:
rm -f  $(D)$(CONFIG_DIR)/udev/rules.d/xen-backend.rules
rm -f  $(D)$(CONFIG_DIR)/udev/rules.d/xend.rules
rm -f  $(D)$(SYSCONFIG_DIR)/xendomains
-   rm -f  $(D)$(SBINDIR)/xendomains
+   rm -f  $(D)$(sbindir)/xendomains
rm -f  $(D)$(SYSCONFIG_DIR)/xencommons
rm -rf $(D)/var/lib/xen*
rm -rf $(D)$(BINDIR)/cpuperf-perfcntr $(D)$(BINDIR)/cpuperf-xen
@@ -93,8 +93,8 @@ uninstall:
rm -rf $(D)$(LIBDIR)/libxenstore* $(D)$(LIBDIR)/libxlutil*
rm -rf $(D)$(LIBDIR)/python/xen $(D)$(LIBDIR)/python/grub
rm -rf $(D)$(LIBEXEC)
-   rm -rf $(D)$(SBINDIR)/setmask
-   rm -rf $(D)$(SBINDIR)/xen* $(D)$(SBINDIR)/netfix $(D)$(SBINDIR)/xm
+   rm -rf $(D)$(sbindir)/setmask
+   rm -rf $(D)$(sbindir)/xen* $(D)$(sbindir)/netfix $(D)$(sbindir)/xm
rm -rf $(D)$(SHAREDIR)/doc/xen
rm -rf $(D)$(SHAREDIR)/xen
rm -rf $(D)$(SHAREDIR)/qemu-xen
diff --git a/tools/blktap2/control/Makefile b/tools/blktap2/control/Makefile
index 1ae6902..9ae2e51 100644
--- a/tools/blktap2/control/Makefile
+++ b/tools/blktap2/control/Makefile
@@ -61,8 +61,8 @@ $(LIB_SHARED): $(CTL_PICS)
$(CC) $(LDFLAGS) -fPIC  -Wl,$(SONAME_LDFLAG) -Wl,$(LIBSONAME) 
$(SHLIB_LDFLAGS) -rdynamic $^ -o $@  $(APPEND_LDFLAGS)
 
 install: $(IBIN) $(LIB_STATIC) $(LIB_SHARED)
-   $(INSTALL_DIR) -p $(DESTDIR)$(SBINDIR)
-   $(INSTALL_PROG) $(IBIN) $(DESTDIR)$(SBINDIR)
+   $(INSTALL_DIR) -p $(DESTDIR)$(sbindir)
+   $(INSTALL_PROG) $(IBIN) $(DESTDIR)$(sbindir)
$(INSTALL_DATA) $(LIB_STATIC) $(DESTDIR)$(LIBDIR)
$(INSTALL_PROG) $(LIB_SHARED) $(DESTDIR)$(LIBDIR)
ln -sf $(LIBSONAME)

[Xen-devel] [PATCH 0/7] tools: remove private Makefile variables

2015-04-24 Thread Olaf Hering

Replace all private variables in Makefiles with automake variables.
This series is based on 92ff75384bce7a11e27fbfaf0c531e88dd1ab4c7.

Olaf

Olaf Hering (7):
  tools: replace private SBINDIR with automake sbindir
  tools: replace private BINDIR with automake bindir
  tools: replace private PREFIX with automake prefix
  tools: replace private INCLUDEDIR with automake includedir
  tools: replace private MANDIR with automake mandir
  tools: replace private LIBDIR with automake libdir
  tools: replace private DOCDIR with automake docdir

 Config.mk| 10 +++---
 config/Paths.mk.in   | 12 ++--
 docs/Makefile| 10 +++---
 stubdom/Makefile |  4 +--
 tools/Makefile   | 46 ++--
 tools/blktap2/control/Makefile   | 12 
 tools/blktap2/drivers/Makefile   |  2 +-
 tools/blktap2/include/Makefile   |  2 +-
 tools/blktap2/vhd/Makefile   |  2 +-
 tools/blktap2/vhd/lib/Makefile   |  2 +-
 tools/console/Makefile   |  4 +--
 tools/debugger/gdbsx/Makefile|  4 +--
 tools/debugger/kdd/Makefile  |  4 +--
 tools/flask/utils/Makefile   |  4 +--
 tools/hotplug/FreeBSD/rc.d/xencommons.in | 16 +-
 tools/hotplug/FreeBSD/vif-bridge |  2 +-
 tools/hotplug/Linux/init.d/xen-watchdog.in   |  2 +-
 tools/hotplug/Linux/init.d/xencommons.in |  8 ++---
 tools/hotplug/Linux/xen-hotplug-common.sh.in |  4 +--
 tools/hotplug/Linux/xendomains.in|  2 +-
 tools/hotplug/NetBSD/block   |  2 +-
 tools/hotplug/NetBSD/rc.d/xen-watchdog   |  6 ++--
 tools/hotplug/NetBSD/rc.d/xencommons.in  | 16 +-
 tools/hotplug/NetBSD/rc.d/xendomains |  4 +--
 tools/hotplug/NetBSD/vif-bridge  |  2 +-
 tools/hotplug/NetBSD/vif-ip  |  2 +-
 tools/include/Makefile   | 40 
 tools/libfsimage/Rules.mk|  2 +-
 tools/libfsimage/common/Makefile | 16 +-
 tools/libvchan/Makefile  | 14 -
 tools/libxc/Makefile | 24 +++
 tools/libxl/Makefile | 26 
 tools/misc/Makefile  |  8 ++---
 tools/ocaml/xenstored/Makefile   |  4 +--
 tools/pygrub/Makefile|  6 ++--
 tools/tests/mce-test/tools/Makefile  |  2 +-
 tools/xenbackendd/Makefile   |  4 +--
 tools/xenmon/Makefile| 12 
 tools/xenpmd/Makefile|  4 +--
 tools/xenstat/libxenstat/Makefile| 10 +++---
 tools/xenstat/xentop/Makefile|  4 +--
 tools/xenstore/Makefile  | 44 +-
 tools/xentrace/Makefile  |  6 ++--
 43 files changed, 201 insertions(+), 209 deletions(-)


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH 4/7] tools: replace private INCLUDEDIR with automake includedir

2015-04-24 Thread Olaf Hering

The result of this command:
 git grep -wnl INCLUDEDIR | xargs sed -i 's@INCLUDEDIR@includedir@g'

Signed-off-by: Olaf Hering o...@aepfle.de
Cc: Ian Jackson ian.jack...@eu.citrix.com
Cc: Stefano Stabellini stefano.stabell...@eu.citrix.com
Cc: Ian Campbell ian.campb...@citrix.com
Cc: Wei Liu wei.l...@citrix.com
---
 config/Paths.mk.in|  1 -
 tools/Makefile| 14 +++---
 tools/blktap2/include/Makefile|  2 +-
 tools/include/Makefile| 40 +++
 tools/libfsimage/common/Makefile  |  8 
 tools/libvchan/Makefile   |  4 ++--
 tools/libxc/Makefile  |  6 +++---
 tools/libxl/Makefile  |  4 ++--
 tools/xenstat/libxenstat/Makefile |  2 +-
 tools/xenstore/Makefile   | 16 
 10 files changed, 48 insertions(+), 49 deletions(-)

diff --git a/config/Paths.mk.in b/config/Paths.mk.in
index bf9bd51..32f7c78 100644
--- a/config/Paths.mk.in
+++ b/config/Paths.mk.in
@@ -34,7 +34,6 @@ LIBEXEC_BIN  := @LIBEXEC_BIN@
 LIBEXEC_LIB  := $(LIBEXEC)/lib
 LIBEXEC_INC  := $(LIBEXEC)/include
 
-INCLUDEDIR   := $(includedir)
 SHAREDIR := @SHAREDIR@
 MANDIR   := $(mandir)
 MAN1DIR  := $(MANDIR)/man1
diff --git a/tools/Makefile b/tools/Makefile
index b05d12a..1e0c0a7 100644
--- a/tools/Makefile
+++ b/tools/Makefile
@@ -82,13 +82,13 @@ uninstall:
rm -rf $(D)$(bindir)/xenstore* $(D)$(bindir)/xentrace*
rm -rf $(D)$(bindir)/xen-detect $(D)$(bindir)/xencons
rm -rf $(D)$(bindir)/xenpvnetboot $(D)$(bindir)/qemu-*-xen
-   rm -rf $(D)$(INCLUDEDIR)/xenctrl* $(D)$(INCLUDEDIR)/xenguest.h
-   rm -rf $(D)$(INCLUDEDIR)/xs_lib.h $(D)$(INCLUDEDIR)/xs.h
-   rm -rf $(D)$(INCLUDEDIR)/xenstore-compat/xs_lib.h 
$(D)$(INCLUDEDIR)/xenstore-compat/xs.h
-   rm -rf $(D)$(INCLUDEDIR)/xenstore_lib.h $(D)$(INCLUDEDIR)/xenstore.h
-   rm -rf $(D)$(INCLUDEDIR)/xen
-   rm -rf $(D)$(INCLUDEDIR)/_libxl* $(D)$(INCLUDEDIR)/libxl*
-   rm -rf $(D)$(INCLUDEDIR)/xenstat.h $(D)$(INCLUDEDIR)/xentoollog.h
+   rm -rf $(D)$(includedir)/xenctrl* $(D)$(includedir)/xenguest.h
+   rm -rf $(D)$(includedir)/xs_lib.h $(D)$(includedir)/xs.h
+   rm -rf $(D)$(includedir)/xenstore-compat/xs_lib.h 
$(D)$(includedir)/xenstore-compat/xs.h
+   rm -rf $(D)$(includedir)/xenstore_lib.h $(D)$(includedir)/xenstore.h
+   rm -rf $(D)$(includedir)/xen
+   rm -rf $(D)$(includedir)/_libxl* $(D)$(includedir)/libxl*
+   rm -rf $(D)$(includedir)/xenstat.h $(D)$(includedir)/xentoollog.h
rm -rf $(D)$(LIBDIR)/libxenctrl* $(D)$(LIBDIR)/libxenguest*
rm -rf $(D)$(LIBDIR)/libxenstore* $(D)$(LIBDIR)/libxlutil*
rm -rf $(D)$(LIBDIR)/python/xen $(D)$(LIBDIR)/python/grub
diff --git a/tools/blktap2/include/Makefile b/tools/blktap2/include/Makefile
index c238486..66e8a1e 100644
--- a/tools/blktap2/include/Makefile
+++ b/tools/blktap2/include/Makefile
@@ -6,7 +6,7 @@ all:
 
 .PHONY: install
 install:
-   $(INSTALL_DIR) -p $(DESTDIR)$(INCLUDEDIR)
+   $(INSTALL_DIR) -p $(DESTDIR)$(includedir)
 
 
 .PHONY: clean
diff --git a/tools/include/Makefile b/tools/include/Makefile
index 601f79c..3cd157a 100644
--- a/tools/include/Makefile
+++ b/tools/include/Makefile
@@ -21,29 +21,29 @@ xen/.dir:
 
 .PHONY: install
 install: all
-   $(INSTALL_DIR) $(DESTDIR)$(INCLUDEDIR)/xen/arch-x86
-   $(INSTALL_DIR) $(DESTDIR)$(INCLUDEDIR)/xen/arch-x86/hvm
-   $(INSTALL_DIR) $(DESTDIR)$(INCLUDEDIR)/xen/arch-arm
-   $(INSTALL_DIR) $(DESTDIR)$(INCLUDEDIR)/xen/arch-arm/hvm
-   $(INSTALL_DIR) $(DESTDIR)$(INCLUDEDIR)/xen/foreign
-   $(INSTALL_DIR) $(DESTDIR)$(INCLUDEDIR)/xen/hvm
-   $(INSTALL_DIR) $(DESTDIR)$(INCLUDEDIR)/xen/io
-   $(INSTALL_DIR) $(DESTDIR)$(INCLUDEDIR)/xen/sys
-   $(INSTALL_DIR) $(DESTDIR)$(INCLUDEDIR)/xen/xsm
-   $(INSTALL_DATA) xen/COPYING $(DESTDIR)$(INCLUDEDIR)/xen
-   $(INSTALL_DATA) xen/*.h $(DESTDIR)$(INCLUDEDIR)/xen
-   $(INSTALL_DATA) xen/arch-x86/*.h $(DESTDIR)$(INCLUDEDIR)/xen/arch-x86
-   $(INSTALL_DATA) xen/arch-x86/hvm/*.h 
$(DESTDIR)$(INCLUDEDIR)/xen/arch-x86/hvm
+   $(INSTALL_DIR) $(DESTDIR)$(includedir)/xen/arch-x86
+   $(INSTALL_DIR) $(DESTDIR)$(includedir)/xen/arch-x86/hvm
+   $(INSTALL_DIR) $(DESTDIR)$(includedir)/xen/arch-arm
+   $(INSTALL_DIR) $(DESTDIR)$(includedir)/xen/arch-arm/hvm
+   $(INSTALL_DIR) $(DESTDIR)$(includedir)/xen/foreign
+   $(INSTALL_DIR) $(DESTDIR)$(includedir)/xen/hvm
+   $(INSTALL_DIR) $(DESTDIR)$(includedir)/xen/io
+   $(INSTALL_DIR) $(DESTDIR)$(includedir)/xen/sys
+   $(INSTALL_DIR) $(DESTDIR)$(includedir)/xen/xsm
+   $(INSTALL_DATA) xen/COPYING $(DESTDIR)$(includedir)/xen
+   $(INSTALL_DATA) xen/*.h $(DESTDIR)$(includedir)/xen
+   $(INSTALL_DATA) xen/arch-x86/*.h $(DESTDIR)$(includedir)/xen/arch-x86
+   $(INSTALL_DATA) xen/arch-x86/hvm/*.h

[Xen-devel] [PATCH 2/7] tools: replace private BINDIR with automake bindir

2015-04-24 Thread Olaf Hering

The result of this command:
 git grep -wnl BINDIR | xargs sed -i 's@BINDIR@bindir@g'

Signed-off-by: Olaf Hering o...@aepfle.de
Cc: Ian Jackson ian.jack...@eu.citrix.com
Cc: Stefano Stabellini stefano.stabell...@eu.citrix.com
Cc: Ian Campbell ian.campb...@citrix.com
Cc: Wei Liu wei.l...@citrix.com
Cc: George Dunlap george.dun...@eu.citrix.com
---
 Config.mk|  2 +-
 config/Paths.mk.in   |  1 -
 tools/Makefile   | 16 
 tools/hotplug/FreeBSD/rc.d/xencommons.in |  2 +-
 tools/hotplug/FreeBSD/vif-bridge |  2 +-
 tools/hotplug/Linux/init.d/xencommons.in |  6 +++---
 tools/hotplug/Linux/xen-hotplug-common.sh.in |  2 +-
 tools/hotplug/NetBSD/block   |  2 +-
 tools/hotplug/NetBSD/rc.d/xencommons.in  |  2 +-
 tools/hotplug/NetBSD/vif-bridge  |  2 +-
 tools/hotplug/NetBSD/vif-ip  |  2 +-
 tools/misc/Makefile  |  4 ++--
 tools/pygrub/Makefile|  6 +++---
 tools/xenstore/Makefile  | 14 +++---
 tools/xentrace/Makefile  |  6 +++---
 15 files changed, 34 insertions(+), 35 deletions(-)

diff --git a/Config.mk b/Config.mk
index abb1e44..1312a80 100644
--- a/Config.mk
+++ b/Config.mk
@@ -157,7 +157,7 @@ define move-if-changed
if ! cmp -s $(1) $(2); then mv -f $(1) $(2); else rm -f $(1); fi
 endef
 
-BUILD_MAKE_VARS := sbindir BINDIR LIBEXEC LIBEXEC_BIN LIBDIR SHAREDIR \
+BUILD_MAKE_VARS := sbindir bindir LIBEXEC LIBEXEC_BIN LIBDIR SHAREDIR \
XENFIRMWAREDIR XEN_CONFIG_DIR XEN_SCRIPT_DIR XEN_LOCK_DIR \
XEN_RUN_DIR XEN_PAGING_DIR
 
diff --git a/config/Paths.mk.in b/config/Paths.mk.in
index 26674b1..1653f26 100644
--- a/config/Paths.mk.in
+++ b/config/Paths.mk.in
@@ -31,7 +31,6 @@ sysconfdir   := @sysconfdir@
 
 PREFIX   := $(prefix)
 
-BINDIR   := $(bindir)
 LIBEXEC  := $(libexecdir)/$(PACKAGE_TARNAME)
 LIBEXEC_BIN  := @LIBEXEC_BIN@
 LIBEXEC_LIB  := $(LIBEXEC)/lib
diff --git a/tools/Makefile b/tools/Makefile
index 77625cb..b05d12a 100644
--- a/tools/Makefile
+++ b/tools/Makefile
@@ -74,14 +74,14 @@ uninstall:
rm -f  $(D)$(sbindir)/xendomains
rm -f  $(D)$(SYSCONFIG_DIR)/xencommons
rm -rf $(D)/var/lib/xen*
-   rm -rf $(D)$(BINDIR)/cpuperf-perfcntr $(D)$(BINDIR)/cpuperf-xen
-   rm -rf $(D)$(BINDIR)/xc_shadow
-   rm -rf $(D)$(BINDIR)/pygrub
-   rm -rf $(D)$(BINDIR)/setsize $(D)$(BINDIR)/tbctl
-   rm -rf $(D)$(BINDIR)/xsls
-   rm -rf $(D)$(BINDIR)/xenstore* $(D)$(BINDIR)/xentrace*
-   rm -rf $(D)$(BINDIR)/xen-detect $(D)$(BINDIR)/xencons
-   rm -rf $(D)$(BINDIR)/xenpvnetboot $(D)$(BINDIR)/qemu-*-xen
+   rm -rf $(D)$(bindir)/cpuperf-perfcntr $(D)$(bindir)/cpuperf-xen
+   rm -rf $(D)$(bindir)/xc_shadow
+   rm -rf $(D)$(bindir)/pygrub
+   rm -rf $(D)$(bindir)/setsize $(D)$(bindir)/tbctl
+   rm -rf $(D)$(bindir)/xsls
+   rm -rf $(D)$(bindir)/xenstore* $(D)$(bindir)/xentrace*
+   rm -rf $(D)$(bindir)/xen-detect $(D)$(bindir)/xencons
+   rm -rf $(D)$(bindir)/xenpvnetboot $(D)$(bindir)/qemu-*-xen
rm -rf $(D)$(INCLUDEDIR)/xenctrl* $(D)$(INCLUDEDIR)/xenguest.h
rm -rf $(D)$(INCLUDEDIR)/xs_lib.h $(D)$(INCLUDEDIR)/xs.h
rm -rf $(D)$(INCLUDEDIR)/xenstore-compat/xs_lib.h 
$(D)$(INCLUDEDIR)/xenstore-compat/xs.h
diff --git a/tools/hotplug/FreeBSD/rc.d/xencommons.in 
b/tools/hotplug/FreeBSD/rc.d/xencommons.in
index 3e204d6..0ef9af9 100644
--- a/tools/hotplug/FreeBSD/rc.d/xencommons.in
+++ b/tools/hotplug/FreeBSD/rc.d/xencommons.in
@@ -46,7 +46,7 @@ xen_startcmd()
XENSTORED_ARGS=${XENSTORED_ARGS} -T 
/var/log/xen/xenstored-trace.log
fi
${sbindir}/xenstored ${XENSTORED_ARGS}
-   while [ $time -lt $timeout ]  ! `${BINDIR}/xenstore-read -s / 
/dev/null 21` ; do
+   while [ $time -lt $timeout ]  ! `${bindir}/xenstore-read -s / 
/dev/null 21` ; do
printf .
time=$(($time+1))
sleep 1
diff --git a/tools/hotplug/FreeBSD/vif-bridge b/tools/hotplug/FreeBSD/vif-bridge
index 3319710..428c653 100644
--- a/tools/hotplug/FreeBSD/vif-bridge
+++ b/tools/hotplug/FreeBSD/vif-bridge
@@ -13,7 +13,7 @@
 DIR=$(dirname $0)
 . ${DIR}/hotplugpath.sh
 
-PATH=${BINDIR}:${sbindir}:${LIBEXEC_BIN}:/bin:/usr/bin:/sbin:/usr/sbin
+PATH=${bindir}:${sbindir}:${LIBEXEC_BIN}:/bin:/usr/bin:/sbin:/usr/sbin
 export PATH
 
 path=$1
diff --git a/tools/hotplug/Linux/init.d/xencommons.in 
b/tools/hotplug/Linux/init.d/xencommons.in
index 82b3fcb..00d0f55 100644
--- a/tools/hotplug/Linux/init.d/xencommons.in
+++ b/tools/hotplug/Linux/init.d/xencommons.in
@@ -62,7 +62,7 @@ do_start () {
mkdir -p ${XEN_RUN_DIR}
mkdir -p

[Xen-devel] [PATCH 5/7] tools: replace private MANDIR with automake mandir

2015-04-24 Thread Olaf Hering

The result of this command:
 git grep -wnl MANDIR | xargs sed -i 's@MANDIR@mandir@g'

Signed-off-by: Olaf Hering o...@aepfle.de
Cc: Ian Jackson ian.jack...@eu.citrix.com
Cc: Stefano Stabellini stefano.stabell...@eu.citrix.com
Cc: Ian Campbell ian.campb...@citrix.com
Cc: Wei Liu wei.l...@citrix.com
---
 config/Paths.mk.in | 5 ++---
 docs/Makefile  | 6 +++---
 2 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/config/Paths.mk.in b/config/Paths.mk.in
index 32f7c78..8ca02bb 100644
--- a/config/Paths.mk.in
+++ b/config/Paths.mk.in
@@ -35,9 +35,8 @@ LIBEXEC_LIB  := $(LIBEXEC)/lib
 LIBEXEC_INC  := $(LIBEXEC)/include
 
 SHAREDIR := @SHAREDIR@
-MANDIR   := $(mandir)
-MAN1DIR  := $(MANDIR)/man1
-MAN8DIR  := $(MANDIR)/man8
+MAN1DIR  := $(mandir)/man1
+MAN8DIR  := $(mandir)/man8
 LIBDIR   := $(libdir)
 DOCDIR   := $(docdir)
 
diff --git a/docs/Makefile b/docs/Makefile
index 9183252..94965c7 100644
--- a/docs/Makefile
+++ b/docs/Makefile
@@ -79,9 +79,9 @@ distclean: clean
 # Top level install targets
 .PHONY: install-man-pages
 install-man-pages: man-pages
-   $(INSTALL_DIR) $(DESTDIR)$(MANDIR)
-   cp -R man1 $(DESTDIR)$(MANDIR)
-   cp -R man5 $(DESTDIR)$(MANDIR)
+   $(INSTALL_DIR) $(DESTDIR)$(mandir)
+   cp -R man1 $(DESTDIR)$(mandir)
+   cp -R man5 $(DESTDIR)$(mandir)
 
 .PHONY: install-html
 install-html: html txt figs

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 1/2] x86/hvm: actually release ioreq server pages

2015-04-24 Thread Jan Beulich

 On 23.04.15 at 17:46, paul.durr...@citrix.com wrote:
 --- a/xen/arch/x86/hvm/hvm.c
 +++ b/xen/arch/x86/hvm/hvm.c
 @@ -496,7 +496,7 @@ static void hvm_free_ioreq_gmfn(struct domain *d, 
 unsigned long gmfn)
  {
  unsigned int i = gmfn - d-arch.hvm_domain.ioreq_gmfn.base;
  
 -clear_bit(i, d-arch.hvm_domain.ioreq_gmfn.mask);
 +set_bit(i, d-arch.hvm_domain.ioreq_gmfn.mask);
  }

Double checking this I came to wonder whether
HVM_PARAM_NR_IOREQ_SERVER_PAGES handling isn't broken
when invoked a second time with a.value smaller than what was
used the first time, or when any server pages are already in
use.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [v3 06/10] vmx: handle PML buffer full VMEXIT

2015-04-24 Thread Kai Huang

We need to flush PML buffer when it's full.

Signed-off-by: Kai Huang kai.hu...@linux.intel.com
---
 xen/arch/x86/hvm/vmx/vmx.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 6c4f78c..f11ac46 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -3178,6 +3178,10 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
 vmx_handle_apic_write();
 break;
 
+case EXIT_REASON_PML_FULL:
+vmx_vcpu_flush_pml_buffer(v);
+break;
+
 case EXIT_REASON_ACCESS_GDTR_OR_IDTR:
 case EXIT_REASON_ACCESS_LDTR_OR_TR:
 case EXIT_REASON_VMX_PREEMPTION_TIMER_EXPIRED:
-- 
2.1.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [v3 10/10] p2m/ept: enable PML in p2m-ept for log-dirty

2015-04-24 Thread Kai Huang

This patch firstly enables EPT A/D bits if PML is used, as PML depends on EPT
A/D bits to work. A bit is set for all present p2m types in middle and leaf EPT
entries, and D bit is set for all writable types in EPT leaf entry, except for
log-dirty type with PML.

With PML, for 4K pages, instead of setting EPT entry to read-only, we just need
to clear D bit in order to log that GFN. For superpages, we still need to set it
to read-only as we need to split superpage to 4K pages in EPT violation.

Signed-off-by: Kai Huang kai.hu...@linux.intel.com
---
 xen/arch/x86/mm/p2m-ept.c  | 79 ++
 xen/include/asm-x86/hvm/vmx/vmcs.h |  3 +-
 xen/include/asm-x86/hvm/vmx/vmx.h  |  3 +-
 3 files changed, 76 insertions(+), 9 deletions(-)

diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
index 5e95a83..a1b9eaf 100644
--- a/xen/arch/x86/mm/p2m-ept.c
+++ b/xen/arch/x86/mm/p2m-ept.c
@@ -102,9 +102,20 @@ static int atomic_write_ept_entry(ept_entry_t *entryptr, 
ept_entry_t new,
 return rc;
 }
 
-static void ept_p2m_type_to_flags(ept_entry_t *entry, p2m_type_t type, 
p2m_access_t access)
+static void ept_p2m_type_to_flags(struct p2m_domain *p2m, ept_entry_t *entry,
+  p2m_type_t type, p2m_access_t access)
 {
-/* First apply type permissions */
+/*
+ * First apply type permissions.
+ *
+ * A/D bits are also manually set to avoid overhead of MMU having to set
+ * them later. Both A/D bits are safe to be updated directly as they are
+ * ignored by processor if EPT A/D bits is not turned on.
+ *
+ * A bit is set for all present p2m types in middle and leaf EPT entries.
+ * D bit is set for all writable types in EPT leaf entry, except for
+ * log-dirty type with PML.
+ */
 switch(type)
 {
 case p2m_invalid:
@@ -118,27 +129,51 @@ static void ept_p2m_type_to_flags(ept_entry_t *entry, 
p2m_type_t type, p2m_acces
 break;
 case p2m_ram_rw:
 entry-r = entry-w = entry-x = 1;
+entry-a = entry-d = 1;
 break;
 case p2m_mmio_direct:
 entry-r = entry-x = 1;
 entry-w = !rangeset_contains_singleton(mmio_ro_ranges,
 entry-mfn);
+entry-a = 1;
+entry-d = entry-w;
 break;
 case p2m_ram_logdirty:
+entry-r = entry-x = 1;
+/*
+ * In case of PML, we don't have to write protect 4K page, but
+ * only need to clear D-bit for it, but we still need to write
+ * protect super page in order to split it to 4K pages in EPT
+ * violation.
+ */
+if ( vmx_domain_pml_enabled(p2m-domain)
+  !is_epte_superpage(entry) )
+entry-w = 1;
+else
+entry-w = 0;
+entry-a = 1;
+/* For both PML or non-PML cases we clear D bit anyway */
+entry-d = 0;
+break;
 case p2m_ram_ro:
 case p2m_ram_shared:
 entry-r = entry-x = 1;
 entry-w = 0;
+entry-a = 1;
+entry-d = 0;
 break;
 case p2m_grant_map_rw:
 case p2m_map_foreign:
 entry-r = entry-w = 1;
 entry-x = 0;
+entry-a = entry-d = 1;
 break;
 case p2m_grant_map_ro:
 case p2m_mmio_write_dm:
 entry-r = 1;
 entry-w = entry-x = 0;
+entry-a = 1;
+entry-d = 0;
 break;
 }
 
@@ -194,6 +229,8 @@ static int ept_set_middle_entry(struct p2m_domain *p2m, 
ept_entry_t *ept_entry)
 ept_entry-access = p2m-default_access;
 
 ept_entry-r = ept_entry-w = ept_entry-x = 1;
+/* Manually set A bit to avoid overhead of MMU having to write it later. */
+ept_entry-a = 1;
 
 return 1;
 }
@@ -244,10 +281,9 @@ static int ept_split_super_page(struct p2m_domain *p2m, 
ept_entry_t *ept_entry,
 epte-sp = (level  1);
 epte-mfn += i * trunk;
 epte-snp = (iommu_enabled  iommu_snoop);
-ASSERT(!epte-rsvd1);
 ASSERT(!epte-avail3);
 
-ept_p2m_type_to_flags(epte, epte-sa_p2mt, epte-access);
+ept_p2m_type_to_flags(p2m, epte, epte-sa_p2mt, epte-access);
 
 if ( (level - 1) == target )
 continue;
@@ -489,7 +525,7 @@ static int resolve_misconfig(struct p2m_domain *p2m, 
unsigned long gfn)
 {
  e.sa_p2mt = p2m_is_logdirty_range(p2m, gfn + i, gfn + 
i)
  ? p2m_ram_logdirty : p2m_ram_rw;
- ept_p2m_type_to_flags(e, e.sa_p2mt, e.access);
+ ept_p2m_type_to_flags(p2m, e, e.sa_p2mt, e.access);
 }
 e.recalc = 0;
 wrc = atomic_write_ept_entry(epte[i], e, level);

Re: [Xen-devel] [PATCH 0/9] Porting the intel_pstate driver to Xen

2015-04-24 Thread Jan Beulich

 On 24.04.15 at 10:32, wei.w.w...@intel.com wrote:
 On 23/04/2015 15:27, Jan Beulich wrote:
  On 24.04.15 at 07:12, wei.w.w...@intel.com wrote:
  On 23/04/2015 22:09, Jan Beulich wrote:
   On 23.04.16 at 15:31, wei.w.w...@intel.com wrote:
   The intel_pstate.c file under xen/arch/x86/acpi/cpufreq/ contains
   all the logic for selecting the current P-state. It follows its
   implementation in the kernel. Instead of using the traditional
   cpufreq governors, intel_pstate implements its internal governor in
   the setpolicy().
 
  And this internal governor behaves how? Like ondemand, powersave,
  peerformance, or yet something else? And how would its behavior be
  changed?
 
  In the kenel intel_pstate implementation, they have two internal governors:
  Powersave and Performance.
  Powersave is similar to the old (cpufreq) ondemand governor. A timer
  function is periodically invoked to sample the CPU busy info (e.g.
  will get increased due to the running of a CPU-intensive workload).
  However, the final calculated target value is clamped into the
  [min_pct, max_pct] limit interval.
  The Performance governor is actually a special case of Powersave, when
  the min_pct= max_pct=100%. This is the same as the old performance
 governor.
 
 So a true powersave one would then be accomplished by setting min_pct =
 max_pct = some value smaller than 100%. Is there a limit on the valid
 percentages to be specified here?
 
 
 In the old driver, a powersave governor just sets the CPU to run with the 
 lowest possible performance state. This one does not exist in the 
 intel_pstate driver. 
 The intel_pstate driver changes the terminology by using powersave to 
 refer to the previous ondemand case. This does make people feel confused. 
 But we may think it this way: it only has two modes, the max performance mode 
 and the ondemand mode. ondemand is the one who saves power (actually in a 
 more reasonable way compared to the previous powersave which simply sets 
 the CPU to run with the lowest performance state). Anyway, we can surely 
 change the name if it sounds uncomfortable.

I think at the very least from a user interface perspective (e.g. the
xenpm tool) the meaning of the old governor names should be
retained as much as possible.

 The valid pct value range is 0 to 100. 

So what does 0% mean then? I.e. (wrt powersave) what does
min_pct = max_pct = 0 result in?

 Also, you calling powersave what supposedly is ondemand
 makes me nervous about it not immediately raising the CPU freq when load
 increases, yet imo that's a fundamental requirement for server kind loads
 where you don't want to run in performance mode. Can you clarify the
 behavior here?
 
 The timer fires very 10ms to update the CPU P-state according to the sampled 
 workload info.

But that doesn't tell what the action is that the timer initiates. I.e.
under what conditions it would effect a frequency change.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] Xen on CentoOS7

2015-04-24 Thread Benoit L'hoest

Hello,

I'm a student in computer science, during my internship i have to go
and explore multipathing solution for several hypervisor.
I'm looking into MPTCP (multi-path TCP) this is a extension of TCP
protocols that allows to aggregate the bandwith of several connection.
I'd like tu use MPTCP to abble to have a nice bandwith between the
hypervisor and the NAS. That would be a nice software solution to
avoid buying expensive 10 Gig or 100 Gig NICs.

In order to make some tests i'd like to install MPTCP on Xenserver.
There is a .rpm for CentOS 7 but Xenserver is on CentOS 5 and some
dependancies are missing. I'm not confident enought to build my own
supplemental pack for xenserver 6.5.

I'd like to know if there is a easy way to add MPTCP to Xenserver ?
Is there a CentOS 7 version that i can get ?

Thanks for your help.

Ben,

PS : If you feel to help me a bit more here is a link to my support
topic where i explain the problem i have when i try to boot on my
recompiled kernel
http://discussions.citrix.com/topic/364033-change-dom0-kernel/#entry1870428

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH] libxl: remove redundant assignment of spawn-xspath

2015-04-24 Thread Wei Liu

Reported-by: Olaf Hering o...@aepfle.de
Signed-off-by: Wei Liu wei.l...@citrix.com
Cc: Ian Campbell ian.campb...@citrix.com
Cc: Ian Jackson ian.jack...@eu.citrix.com
---
 tools/libxl/libxl_dm.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c
index 30c1578..a1011e3 100644
--- a/tools/libxl/libxl_dm.c
+++ b/tools/libxl/libxl_dm.c
@@ -1489,7 +1489,6 @@ retry_transaction:
 LIBXL__LOG(CTX, XTL_DEBUG,   %s, *arg);
 
 spawn-what = GCSPRINTF(domain %d device model, domid);
-spawn-xspath = GCSPRINTF(/local/domain/0/device-model/%d/state, domid);
 spawn-xspath = libxl__device_model_xs_path(gc, LIBXL_TOOLSTACK_DOMID,
 domid, /state);
 spawn-timeout_ms = LIBXL_DEVICE_MODEL_START_TIMEOUT * 1000;
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] libxl, flexarray_append_pair usage

2015-04-24 Thread Olaf Hering

On Fri, Apr 24, Olaf Hering wrote:

  flexarray_append(back, state);
 -flexarray_append(back, GCSPRINTF(%d, 1));
 +flexarray_append(back, GCSPRINTF(%d, XenbusStateInitialising));

Should all such code be converted to flexarray_append_pair?
To reduce line length a short macro should be added, like FLXAP or
FLXPAIR or something else.

Olaf

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] libxl: convert strings and ints to xenbus_state

2015-04-24 Thread Wei Liu

On Fri, Apr 24, 2015 at 09:07:14AM +, Olaf Hering wrote:
 Convert all plain ints and strings which are used for xenbus state
 files to xenbus_state. This makes it easier to find code which deals
 with backend/frontend state changes.
 
 Convert usage of libxl__sprintf to GCSPRINTF.
 
 No change in behaviour is expected by this change, beside a small
 increase of runtime memory usage in places that used a string constant.
 
 Signed-off-by: Olaf Hering o...@aepfle.de
 Cc: Ian Jackson ian.jack...@eu.citrix.com
 Cc: Stefano Stabellini stefano.stabell...@eu.citrix.com
 Cc: Ian Campbell ian.campb...@citrix.com
 Cc: Wei Liu wei.l...@citrix.com

Acked-by: Wei Liu wei.l...@citrix.com

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH v2] xen: Add __GFP_DMA flag when xen_swiotlb_init gets free pages on ARM

2015-04-24 Thread Stefano Stabellini

Make sure that xen_swiotlb_init allocates buffers that are DMA capable
when at least one memblock is available below 4G. Otherwise we assume
that all devices on the SoC can cope with 4G addresses. We do this on
ARM and ARM64, where dom0 is mapped 1:1, so pfn == mfn in this case.

No functional changes on x86.

From: Chen Baozi baoz...@gmail.com

Signed-off-by: Chen Baozi baoz...@gmail.com
Signed-off-by: Stefano Stabellini stefano.stabell...@eu.citrix.com
Tested-by: Chen Baozi baoz...@gmail.com

diff --git a/arch/arm/include/asm/xen/page.h b/arch/arm/include/asm/xen/page.h
index 2f7e6ff..0b579b2 100644
--- a/arch/arm/include/asm/xen/page.h
+++ b/arch/arm/include/asm/xen/page.h
@@ -110,5 +110,6 @@ static inline bool set_phys_to_machine(unsigned long pfn, 
unsigned long mfn)
 bool xen_arch_need_swiotlb(struct device *dev,
   unsigned long pfn,
   unsigned long mfn);
+unsigned long xen_get_swiotlb_free_pages(unsigned int order);
 
 #endif /* _ASM_ARM_XEN_PAGE_H */
diff --git a/arch/arm/xen/mm.c b/arch/arm/xen/mm.c
index 793551d..4983250 100644
--- a/arch/arm/xen/mm.c
+++ b/arch/arm/xen/mm.c
@@ -4,6 +4,7 @@
 #include linux/gfp.h
 #include linux/highmem.h
 #include linux/export.h
+#include linux/memblock.h
 #include linux/of_address.h
 #include linux/slab.h
 #include linux/types.h
@@ -21,6 +22,20 @@
 #include asm/xen/hypercall.h
 #include asm/xen/interface.h
 
+unsigned long xen_get_swiotlb_free_pages(unsigned int order)
+{
+   struct memblock_region *reg;
+   gfp_t flags = __GFP_NOWARN;
+
+   for_each_memblock(memory, reg) {
+   if (reg-base  (phys_addr_t)0x) {
+   flags |= __GFP_DMA;
+   break;
+   }
+   }
+   return __get_free_pages(flags, order);
+}
+
 enum dma_cache_op {
DMA_UNMAP,
DMA_MAP,
diff --git a/arch/x86/include/asm/xen/page.h b/arch/x86/include/asm/xen/page.h
index 358dcd3..c44a5d5 100644
--- a/arch/x86/include/asm/xen/page.h
+++ b/arch/x86/include/asm/xen/page.h
@@ -269,4 +269,9 @@ static inline bool xen_arch_need_swiotlb(struct device *dev,
return false;
 }
 
+static inline unsigned long xen_get_swiotlb_free_pages(unsigned int order)
+{
+   return __get_free_pages(__GFP_NOWARN, order);
+}
+
 #endif /* _ASM_X86_XEN_PAGE_H */
diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index 810ad41..4c54932 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -235,7 +235,7 @@ retry:
 #define SLABS_PER_PAGE (1  (PAGE_SHIFT - IO_TLB_SHIFT))
 #define IO_TLB_MIN_SLABS ((120)  IO_TLB_SHIFT)
while ((SLABS_PER_PAGE  order)  IO_TLB_MIN_SLABS) {
-   xen_io_tlb_start = (void 
*)__get_free_pages(__GFP_NOWARN, order);
+   xen_io_tlb_start = (void 
*)xen_get_swiotlb_free_pages(order);
if (xen_io_tlb_start)
break;
order--;

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [v3 01/10] vmx: add new boot parameter to control PML enabling

2015-04-24 Thread Andrew Cooper

On 24/04/15 09:19, Kai Huang wrote:
 A top level EPT parameter ept=options and a sub boolean opt_pml_enabled
 are added to control PML. Other booleans can be further added for any other 
 EPT
 related features.

 The document description for the new parameter is also added.

 Signed-off-by: Kai Huang kai.hu...@linux.intel.com

Reviewed-by: Andrew Cooper andrew.coop...@citrix.com

 ---
  docs/misc/xen-command-line.markdown | 15 +++
  xen/arch/x86/hvm/vmx/vmcs.c | 30 ++
  2 files changed, 45 insertions(+)

 diff --git a/docs/misc/xen-command-line.markdown 
 b/docs/misc/xen-command-line.markdown
 index 1dda1f0..4889e27 100644
 --- a/docs/misc/xen-command-line.markdown
 +++ b/docs/misc/xen-command-line.markdown
 @@ -685,6 +685,21 @@ requirement can be relaxed.  This option is particularly 
 useful for nested
  virtualization, to allow the L1 hypervisor to use EPT even if the L0 
 hypervisor
  does not provide VM\_ENTRY\_LOAD\_GUEST\_PAT.
  
 +### ept (Intel)
 + `= List of ( pmlboolean )`
 +
 + Default: `false`
 +
 +Controls EPT related features. Currently only Page Modification Logging 
 (PML) is
 +the controllable feature as boolean type.
 +
 +PML is a new hardware feature in Intel's Broadwell Server and further 
 platforms
 +which reduces hypervisor overhead of log-dirty mechanism by automatically
 +recording GPAs (guest physical addresses) when guest memory gets dirty, and
 +therefore significantly reducing number of EPT violation caused by write
 +protection of guest memory, which is a necessity to implement log-dirty
 +mechanism before PML.
 +
  ### gdb
   `= 
 baud[/clock_hz][,DPS[,io-base[,irq[,port-bdf[,bridge-bdf | 
 pci | amt ] `
  
 diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
 index 63007a9..79efa42 100644
 --- a/xen/arch/x86/hvm/vmx/vmcs.c
 +++ b/xen/arch/x86/hvm/vmx/vmcs.c
 @@ -64,6 +64,36 @@ integer_param(ple_gap, ple_gap);
  static unsigned int __read_mostly ple_window = 4096;
  integer_param(ple_window, ple_window);
  
 +static bool_t __read_mostly opt_pml_enabled = 0;
 +
 +/*
 + * The 'ept' parameter controls functionalities that depend on, or impact the
 + * EPT mechanism. Optional comma separated value may contain:
 + *
 + *  pml Enable PML
 + */
 +static void __init parse_ept_param(char *s)
 +{
 +char *ss;
 +
 +do {
 +bool_t val = !!strncmp(s, no-, 3);
 +if ( !val )
 +s += 3;
 +
 +ss = strchr(s, ',');
 +if ( ss )
 +*ss = '\0';
 +
 +if ( !strcmp(s, pml) )
 +opt_pml_enabled = val;
 +
 +s = ss + 1;
 +} while ( ss );
 +}
 +
 +custom_param(ept, parse_ept_param);
 +
  /* Dynamic (run-time adjusted) execution control flags. */
  u32 vmx_pin_based_exec_control __read_mostly;
  u32 vmx_cpu_based_exec_control __read_mostly;


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 5/7] tools: replace private MANDIR with automake mandir

2015-04-24 Thread Andrew Cooper

On 24/04/15 11:25, Olaf Hering wrote:
 The result of this command:
  git grep -wnl MANDIR | xargs sed -i 's@MANDIR@mandir@g'

 Signed-off-by: Olaf Hering o...@aepfle.de
 Cc: Ian Jackson ian.jack...@eu.citrix.com
 Cc: Stefano Stabellini stefano.stabell...@eu.citrix.com
 Cc: Ian Campbell ian.campb...@citrix.com
 Cc: Wei Liu wei.l...@citrix.com

In some copious free time during a docs day, we should see about moving
the xenstat/xentrace manpages into docs/ (and writing them in something
slightly nicer than raw troff), which would allow all $(mandir)
information to disappear from Paths.mk

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH V2] xen: arm: X-Gene Storm check GIC DIST address for EOI quirk

2015-04-24 Thread Pranavkumar Sawargaonkar

In old X-Gene Storm firmware and DT, secure mode addresses have been
mentioned in GICv2 node. In this case maintenance interrupt is used
instead of EOI HW method.

This patch checks the GIC Distributor Base Address to enable EOI quirk
for old firmware.

Ref:
http://lists.xen.org/archives/html/xen-devel/2014-07/msg01263.html

ChangeLog:

V2:
- Fine tune interrupt controller node search as per comments on V1 patch
- Incorporating other misc comments on V1.
V1:
- Initial patch.

Tested-by: Christoffer Dall christoffer.d...@linaro.org
Signed-off-by: Pranavkumar Sawargaonkar pranavku...@linaro.org
---
 xen/arch/arm/platforms/xgene-storm.c |   42 +-
 1 file changed, 41 insertions(+), 1 deletion(-)

diff --git a/xen/arch/arm/platforms/xgene-storm.c 
b/xen/arch/arm/platforms/xgene-storm.c
index 1812e5b..c9a6dfc 100644
--- a/xen/arch/arm/platforms/xgene-storm.c
+++ b/xen/arch/arm/platforms/xgene-storm.c
@@ -22,6 +22,7 @@
 #include asm/platform.h
 #include xen/stdbool.h
 #include xen/vmap.h
+#include xen/device_tree.h
 #include asm/io.h
 #include asm/gic.h
 
@@ -35,9 +36,46 @@ static u64 reset_addr, reset_size;
 static u32 reset_mask;
 static bool reset_vals_valid = false;
 
+#define XGENE_SEC_GICV2_DIST_ADDR0x7801
+static u32 __read_mostly xgene_quirks = PLATFORM_QUIRK_GIC_64K_STRIDE;
+
+static void __init xgene_check_pirq_eoi(void)
+{
+struct dt_device_node *node;
+int res;
+paddr_t dbase;
+static const struct dt_device_match xgene_dt_int_ctrl_match[] =
+{
+DT_MATCH_COMPATIBLE(arm,cortex-a15-gic),
+{ /*sentinel*/ },
+};
+
+node = dt_find_interrupt_controller(xgene_dt_int_ctrl_match);
+if ( !node )
+panic(%s: Can not find interrupt controller node\n, __func__);
+
+res = dt_device_get_address(node, 0, dbase, NULL);
+if ( !dbase )
+panic(%s: Cannot find a valid address for the 
+distributor, __func__);
+
+/*
+ * In old X-Gene Storm firmware and DT, secure mode addresses have
+ * been mentioned in GICv2 node. We have to use maintenance interrupt
+ * instead of EOI HW in this case. We check the GIC Distributor Base
+ * Address to maintain compatibility with older firmware.
+ */
+if ( dbase == XGENE_SEC_GICV2_DIST_ADDR )
+{
+xgene_quirks |= PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI;
+printk(Xen: warning: Using OLD X-Gene Firmware,
+disabling PIRQ EOI mode ...\n);
+}
+}
+
 static uint32_t xgene_storm_quirks(void)
 {
-return PLATFORM_QUIRK_GIC_64K_STRIDE|PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI;
+return xgene_quirks;
 }
 
 static int map_one_mmio(struct domain *d, const char *what,
@@ -216,6 +254,8 @@ static int xgene_storm_init(void)
 reset_mask = XGENE_RESET_MASK;
 
 reset_vals_valid = true;
+xgene_check_pirq_eoi();
+
 return 0;
 }
 
-- 
1.7.9.5


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 0/9] Porting the intel_pstate driver to Xen

2015-04-24 Thread Wang, Wei W

On 23/04/2015 15:27, Jan Beulich wrote:
  On 24.04.15 at 07:12, wei.w.w...@intel.com wrote:
  On 23/04/2015 22:09, Jan Beulich wrote:
   On 23.04.16 at 15:31, wei.w.w...@intel.com wrote:
   The intel_pstate.c file under xen/arch/x86/acpi/cpufreq/ contains
   all the logic for selecting the current P-state. It follows its
   implementation in the kernel. Instead of using the traditional
   cpufreq governors, intel_pstate implements its internal governor in
   the setpolicy().
 
  And this internal governor behaves how? Like ondemand, powersave,
  peerformance, or yet something else? And how would its behavior be
  changed?
 
  In the kenel intel_pstate implementation, they have two internal governors:
  Powersave and Performance.
  Powersave is similar to the old (cpufreq) ondemand governor. A timer
  function is periodically invoked to sample the CPU busy info (e.g.
  will get increased due to the running of a CPU-intensive workload).
  However, the final calculated target value is clamped into the
  [min_pct, max_pct] limit interval.
  The Performance governor is actually a special case of Powersave, when
  the min_pct= max_pct=100%. This is the same as the old performance
 governor.
 
 So a true powersave one would then be accomplished by setting min_pct =
 max_pct = some value smaller than 100%. Is there a limit on the valid
 percentages to be specified here?


In the old driver, a powersave governor just sets the CPU to run with the 
lowest possible performance state. This one does not exist in the intel_pstate 
driver. 
The intel_pstate driver changes the terminology by using powersave to refer 
to the previous ondemand case. This does make people feel confused. But we 
may think it this way: it only has two modes, the max performance mode and the 
ondemand mode. ondemand is the one who saves power (actually in a more 
reasonable way compared to the previous powersave which simply sets the CPU 
to run with the lowest performance state). Anyway, we can surely change the 
name if it sounds uncomfortable.

The valid pct value range is 0 to 100. 

 
 Also, you calling powersave what supposedly is ondemand
 makes me nervous about it not immediately raising the CPU freq when load
 increases, yet imo that's a fundamental requirement for server kind loads
 where you don't want to run in performance mode. Can you clarify the
 behavior here?

The timer fires very 10ms to update the CPU P-state according to the sampled 
workload info.

Best,
Wei

 
  Here in the ported version, the limit interval can be set via the new
  added interfaces in xenpm. I think we can make use of only the
  Powersave governor, and the Performance governor can actually be
  simply achieved by setting min_pct= max_pct=100%.
 
  If all of you are agree, I will remove the Performance governor
  related code in the next version of patchset.
 
 Yes, this certainly seems to make sense with the above.
 
 Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCHv6 5/5] gnttab: use per-VCPU maptrack free lists

2015-04-24 Thread Malcolm Crossley

On 23/04/15 17:11, Jan Beulich wrote:
 On 22.04.15 at 18:00, david.vra...@citrix.com wrote:
  static inline int
  get_maptrack_handle(
  struct grant_table *lgt)
  {
 +struct vcpu  *v = current;
  int   i;
  grant_handle_thandle;
  struct grant_mapping *new_mt;
  unsigned int  new_mt_limit, nr_frames;
  
 -spin_lock(lgt-maptrack_lock);
 -
 -while ( unlikely((handle = __get_maptrack_handle(lgt)) == -1) )
 -{
 -nr_frames = nr_maptrack_frames(lgt);
 -if ( nr_frames = max_maptrack_frames )
 -break;
 -
 -new_mt = alloc_xenheap_page();
 -if ( !new_mt )
 -break;
 +handle = __get_maptrack_handle(lgt, v);
 +if ( likely(handle != -1) )
 +return handle;
  
 -clear_page(new_mt);
 +nr_frames = nr_vcpu_maptrack_frames(v);
 +if ( nr_frames = max_maptrack_frames )
 +return -1;
  
 -new_mt_limit = lgt-maptrack_limit + MAPTRACK_PER_PAGE;
 +new_mt = alloc_xenheap_page();
 +if ( !new_mt )
 +return -1;
  
 -for ( i = 1; i  MAPTRACK_PER_PAGE; i++ )
 -new_mt[i - 1].ref = lgt-maptrack_limit + i;
 -new_mt[i - 1].ref = lgt-maptrack_head;
 -lgt-maptrack_head = lgt-maptrack_limit;
 +clear_page(new_mt);
  
 -lgt-maptrack[nr_frames] = new_mt;
 -smp_wmb();
 -lgt-maptrack_limit  = new_mt_limit;
 +new_mt_limit = v-maptrack_limit + MAPTRACK_PER_PAGE;
  
 -gdprintk(XENLOG_INFO, Increased maptrack size to %u frames\n,
 - nr_frames + 1);
 +for ( i = 1; i  MAPTRACK_PER_PAGE; i++ ){
 +new_mt[i - 1].ref = (lgt-maptrack_pages * MAPTRACK_PER_PAGE) + i;
 +new_mt[i - 1].vcpu = v-vcpu_id;
 +}
 +/* Set last entry vcpu and ref */
 +new_mt[i - 1].ref = v-maptrack_head;
 +new_mt[i - 1].vcpu = v-vcpu_id;
 +v-maptrack_head = lgt-maptrack_pages * MAPTRACK_PER_PAGE;
 +if (v-maptrack_tail == MAPTRACK_TAIL)
 +{
 +v-maptrack_tail = (lgt-maptrack_pages * MAPTRACK_PER_PAGE)
 ++ MAPTRACK_PER_PAGE - 1;
 +new_mt[i - 1].ref = MAPTRACK_TAIL;
  }
  
 +spin_lock(lgt-maptrack_lock);
 +lgt-maptrack[lgt-maptrack_pages++] = new_mt;
  spin_unlock(lgt-maptrack_lock);
 
 The uses of -maptrack_pages ahead of taking the lock can race
 with updates inside the lock. And with locking elsewhere dropped
 by the previous patch it looks like you can't update -maptrack[]
 like you do (you'd need a barrier between the pointer store and
 the increment, and with that I think the lock would become
 superfluous if it was needed only for this update).
 
 Also note the coding style issues in the changes above^(and also
 in code further down).

This was a last minute optimisation, this isn't on the hot patch so
we'll expand the spin_lock to cover all users of maptrack_pages.

Sorry about the coding style problems.

 
 -return handle;
 +v-maptrack_limit = new_mt_limit;
 +
 +return __get_maptrack_handle(lgt, v);
 
 With the lock dropped, nothing guarantees this to succeed, which it
 ought to unless the table size reached its allowed maximum.
 
 @@ -1422,6 +1448,17 @@ gnttab_setup_table(
  }
  gt-maptrack_pages = 0;
  
 +/* Tracking of mapped foreign frames table */
 +if ( (gt-maptrack = xzalloc_array(struct grant_mapping *,
 +   max_maptrack_frames * d-max_vcpus)) 
 == NULL )
 +goto out2;
 
 See the comments on the similar misplaced hunk in the previous
 patch.
 
 --- a/xen/include/xen/grant_table.h
 +++ b/xen/include/xen/grant_table.h
 @@ -60,6 +60,8 @@ struct grant_mapping {
  u32  ref;   /* grant ref */
  u16  flags; /* 0-4: GNTMAP_* ; 5-15: unused */
  domid_t  domid; /* granting domain */
 +u32  vcpu;  /* vcpu which created the grant mapping */
 +u16  pad[2];
  };
 
 What is this pad[] good for?

The pad is to keep the struct power of 2 sized because this allows the
compiler to optimise these macro's to right and left shifts:

#define MAPTRACK_PER_PAGE (PAGE_SIZE / sizeof(struct grant_mapping))
#define maptrack_entry(t, e) \
((t)-maptrack[(e)/MAPTRACK_PER_PAGE][(e)%MAPTRACK_PER_PAGE])


Malcolm
 
 Jan
 


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] libxlu: don't crash on empty lists

2015-04-24 Thread Wei Liu

On Fri, Apr 24, 2015 at 10:04:31AM +0100, Jan Beulich wrote:
 Prior to 1a09c5113a (libxlu: rework internal representation of
 setting) empty lists in config files did get accepted. Restore that
 behavior.
 
 Signed-off-by: Jan Beulich jbeul...@suse.com
 

Thanks for the fix.

Acked-by: Wei Liu wei.l...@citrix.com

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] libxl, flexarray_append_pair usage

2015-04-24 Thread Wei Liu

On Fri, Apr 24, 2015 at 11:16:00AM +0200, Olaf Hering wrote:
 On Fri, Apr 24, Olaf Hering wrote:
 
   flexarray_append(back, state);
  -flexarray_append(back, GCSPRINTF(%d, 1));
  +flexarray_append(back, GCSPRINTF(%d, XenbusStateInitialising));
 
 Should all such code be converted to flexarray_append_pair?
 To reduce line length a short macro should be added, like FLXAP or
 FLXPAIR or something else.

I don't have very strong opinion on this.

Wei.

 
 Olaf

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 0/9] Porting the intel_pstate driver to Xen

2015-04-24 Thread Wang, Wei W

On 24/04/2015 17:11, Jan Beulich wrote:
  On 24.04.15 at 10:32, wei.w.w...@intel.com wrote:
  On 23/04/2015 15:27, Jan Beulich wrote:
   On 24.04.15 at 07:12, wei.w.w...@intel.com wrote:
   On 23/04/2015 22:09, Jan Beulich wrote:
On 23.04.16 at 15:31, wei.w.w...@intel.com wrote:
The intel_pstate.c file under xen/arch/x86/acpi/cpufreq/
contains all the logic for selecting the current P-state. It
follows its implementation in the kernel. Instead of using the
traditional cpufreq governors, intel_pstate implements its
internal governor in the setpolicy().
  
   And this internal governor behaves how? Like ondemand, powersave,
   peerformance, or yet something else? And how would its behavior be
   changed?
  
   In the kenel intel_pstate implementation, they have two internal
 governors:
   Powersave and Performance.
   Powersave is similar to the old (cpufreq) ondemand governor. A
   timer function is periodically invoked to sample the CPU busy info (e.g.
   will get increased due to the running of a CPU-intensive workload).
   However, the final calculated target value is clamped into the
   [min_pct, max_pct] limit interval.
   The Performance governor is actually a special case of Powersave,
   when the min_pct= max_pct=100%. This is the same as the old
   performance
  governor.
 
  So a true powersave one would then be accomplished by setting min_pct
  = max_pct = some value smaller than 100%. Is there a limit on the
  valid percentages to be specified here?
 
 
  In the old driver, a powersave governor just sets the CPU to run with
  the lowest possible performance state. This one does not exist in the
  intel_pstate driver.
  The intel_pstate driver changes the terminology by using powersave
  to refer to the previous ondemand case. This does make people feel
 confused.
  But we may think it this way: it only has two modes, the max
  performance mode and the ondemand mode. ondemand is the one who
  saves power (actually in a more reasonable way compared to the
  previous powersave which simply sets the CPU to run with the lowest
  performance state). Anyway, we can surely change the name if it sounds
 uncomfortable.
 
 I think at the very least from a user interface perspective (e.g. the xenpm
 tool) the meaning of the old governor names should be retained as much as
 possible.

Ok. I am simply using the name internal for user tools. Please see the 
example below:

scaling_driver   : intel_pstate
scaling_avail_gov: internal
current_governor: internal 

 
  The valid pct value range is 0 to 100.
 
 So what does 0% mean then? I.e. (wrt powersave) what does min_pct =
 max_pct = 0 result in?

Probably I misunderstood that question. 
The CPU actually has its own valid frequency range. On my machine, for example, 
the minimal is 1.2GHZ (got from an MSR), which corresponds to 32%. Then the 
valid range is 32 to 100. Any input pct value less than 32 will be set to 32.
 

 
  Also, you calling powersave what supposedly is ondemand
  makes me nervous about it not immediately raising the CPU freq when
  load increases, yet imo that's a fundamental requirement for server
  kind loads where you don't want to run in performance mode. Can you
  clarify the behavior here?
 
  The timer fires very 10ms to update the CPU P-state according to the
  sampled workload info.
 
 But that doesn't tell what the action is that the timer initiates. I.e.
 under what conditions it would effect a frequency change.

Each time when the timer is timer function is invoked, it gets the CPU 
utilization statistics using APERF and MPERF MSR registers. The current CPU 
utilization is compared to the previous one to generate a scale factor, which 
scales the frequency up or down.

Best,
Wei
 
 Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 0/9] Porting the intel_pstate driver to Xen

2015-04-24 Thread Jan Beulich

 On 24.04.15 at 12:07, wei.w.w...@intel.com wrote:
 On 24/04/2015 17:56, Jan Beulich wrote:
  On 24.04.15 at 11:46, wei.w.w...@intel.com wrote:
  On 24/04/2015 17:11, Jan Beulich wrote:
   On 24.04.15 at 10:32, wei.w.w...@intel.com wrote:
   In the old driver, a powersave governor just sets the CPU to run
   with the lowest possible performance state. This one does not exist
   in the intel_pstate driver.
   The intel_pstate driver changes the terminology by using powersave
   to refer to the previous ondemand case. This does make people
   feel
  confused.
   But we may think it this way: it only has two modes, the max
   performance mode and the ondemand mode. ondemand is the one
 who
   saves power (actually in a more reasonable way compared to the
   previous powersave which simply sets the CPU to run with the
   lowest performance state). Anyway, we can surely change the name if
   it sounds
  uncomfortable.
 
  I think at the very least from a user interface perspective (e.g. the
  xenpm
  tool) the meaning of the old governor names should be retained as
  much as possible.
 
  Ok. I am simply using the name internal for user tools. Please see
  the example below:
 
  scaling_driver   : intel_pstate
  scaling_avail_gov: internal
  current_governor: internal
 
 But xenpm's set-scaling-governor command should still do something useful
 for the currently available governor options. And with that, showing
 internal in its output may also end up being confusing (at least I am 
 already
 being confused).
 
 The case is similar to that in the kernel. Xen has two pstate driver, but 
 only one can be loaded. When the intel_pstate driver is used (scaling_driver 
  : intel_pstate ),   xenpm's set-scaling-governor will not take effect, 
 since 
 the intel_pstate only implements its internal governor. 

But that's precisely what I'm asking to be changed: Even if internally
is uses its own governor, the user interface of the tool should still be
usable to achieve similar effects.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [OSSTEST Nested PATCH v8 5/7] Add new script to customize nested test configuration

2015-04-24 Thread Pang, LongtaoX

 -Original Message-
 From: Ian Campbell [mailto:ian.campb...@citrix.com]
 Sent: Thursday, April 23, 2015 7:30 PM
 To: Hu, Robert
 Cc: Pang, LongtaoX; xen-devel@lists.xen.org; ian.jack...@eu.citrix.com;
 wei.l...@citrix.com
 Subject: Re: [OSSTEST Nested PATCH v8 5/7] Add new script to customize nested
 test configuration

 On Thu, 2015-04-23 at 17:38 +0800, Robert Hu wrote:
As mentioned in [ Patch], 'nested_l1' is ident for L1 guest VM,
'nestedl1' is hostname for L1 guest VM.
' store_runvar('nested_l1',$l1-{Guest});' is used to store L1's ident
and L1's hostname to database, that will be used for 'selecthost()'
function in later script.

   Having a runvar with the bare ident as a name doesn't seem right.
   Perhaps you want to be writing to $r{${ident}_hostname} or some such?

   What do you actually need the hostname for anyway?
  Look at selecthost()
  sub selecthost ($) {
  my ($ident) = @_;
  # must be run outside transaction
  my $name;
  if ($ident =~ m/=/) {
  $ident= $`;
  $name= $'; #'
  $r{$ident}= $name;
  } else {
  $name= $r{$ident};
  die no specified $ident unless defined $name;
  }
  ...

  When in L1 iteration invocation of ts-debian-hvm-install(), the ident
  passed in is L1 ident (nested_l1). Here the 'else' branch will need
  $r{$ident}, whose value shall be L1's name. Therefore we prepare this
  runvar in advance.

 Ah I see, today usually the ident is host or dst_host or src_host
 so I got confused by it just being nested_l1 here.

 In the normal case today the ident runvars are set by ts-hosts-allocate.
 That can't apply to the nested case since it is allocating a guest and
 not a host, so your ts-nested-setup seems like a reasonable enough place
 to do it.

 However, I don't think there is any reason for the indent, hostname and
 guest name to differ, and I think perhaps it would be clearer to write
 this as:
 our $l1_ident = $gho-{Guest};
 store_runvar($l1_ident, $gho-{Guest});
 So that it is clear to the reader that this runvar is an ident. I
 suppose a code comment would work as well (or in addition perhaps).

 Most places you seem to use nestedl1, not nested_l1. I think you ended
 up with this due to not using guest_var and the various hardcoding
 you've done elsewhere. I suspect with all that fixed and make-flight
 updated accordingly you won't need this var any more.

I think I should define L1 ident with  my $l1_ident = 'nested_l1' in
 'ts-nested-setup'

   Hardcoding anything like that in ts-nested-setup seems wrong to me.

   The ident should come from the script's parameters (i.e. be part of the
   sg-run-job invocation of the script).

   Imagine a hypothetical nested-virt migration test, in that scenario you
   would want to run ts-nested-setup on two hosts, both nestedl1src and
   nestedl2dst in order to configure things. That's why we don't hardcode
   such things.

  so to summarize, ts-nested-setup will be invoked with parameters of
  $l0_ident, $l1_ident and $l1_name?
  or only parameters of $l0_ident and $l1_ident, if we have setup
  $l1_ident-$l1_name mapping in runvar when in l0's iteration of
  ts-debian-hvm-install.
  which would you prefer?

 $l1_ident and $l1_name should be the same, I think. Semantically the
 script should be taking $l0_ident and $l1_ident, where $l0 here is a
 host and $l1 here is a guest, so infact isn't $l1_nae just $l1-{Guest}?

 (Things get confusing because $l1 can be a Guest or a Host depending on
 which test script we are in)

  +store_runvar($l1-{Guest}_ip,$l1-{Ip});
  +
  +my $l2_disk_mb = 2;
  +my $l2_lv_name = 'nestedl2-disk';
  +my $vgname = $l0-{Name};
  +$vgname .= .$c{TestHostDomain} if ($l0-{Suite} =~ m/lenny/);
  +target_cmd_root($l0, END);
  +lvremove -f /dev/${vgname}/${l2_lv_name} ||:
  +lvcreate -L ${l2_disk_mb}M -n $l2_lv_name $vgname
  +dd if=/dev/zero of=/dev/${vgname}/${l2_lv_name} count=10

 I think you can do all of this by passing a suitable l2 $gho to
 prepareguest_part_lvmdisk, can't you?

 I think you can get that by my $l2 = selectguest($ARGV[], $l1).

I think I could try it, that means I will add more parameters for
'ts-nested-setup' script, not just 'host' and 'nestedl1'.
  I think we can pass in 3 parameters: $l0_ident, $l1_ident, and
  $l2_ident, as long as we in advance setup their $ident--$name mappings
  in runvar. Here we have 2 options:
  1. setup such mapping in first iteration (l0 interation) of
  ts-debian-hvm-install
  2. setup this in make flight
  I prefer 2. since such mapping can be fixed from beginning and shall not
  be changed in run time.

 The issue we are coming across here is that we are trying to talk about
 the l2 guest before it really exists, since that happens later when
 prepareguest is called in the l1 context and here we are

Re: [Xen-devel] [PATCH 0/9] Porting the intel_pstate driver to Xen

2015-04-24 Thread Jan Beulich

 On 24.04.15 at 11:46, wei.w.w...@intel.com wrote:
 On 24/04/2015 17:11, Jan Beulich wrote:
  On 24.04.15 at 10:32, wei.w.w...@intel.com wrote:
  In the old driver, a powersave governor just sets the CPU to run with
  the lowest possible performance state. This one does not exist in the
  intel_pstate driver.
  The intel_pstate driver changes the terminology by using powersave
  to refer to the previous ondemand case. This does make people feel
 confused.
  But we may think it this way: it only has two modes, the max
  performance mode and the ondemand mode. ondemand is the one who
  saves power (actually in a more reasonable way compared to the
  previous powersave which simply sets the CPU to run with the lowest
  performance state). Anyway, we can surely change the name if it sounds
 uncomfortable.
 
 I think at the very least from a user interface perspective (e.g. the xenpm
 tool) the meaning of the old governor names should be retained as much as
 possible.
 
 Ok. I am simply using the name internal for user tools. Please see the 
 example below:
 
 scaling_driver   : intel_pstate
 scaling_avail_gov: internal
 current_governor: internal 

But xenpm's set-scaling-governor command should still do something
useful for the currently available governor options. And with that,
showing internal in its output may also end up being confusing (at
least I am already being confused).

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH OSSTEST] standalone: noreinstall - reinstall in help string

2015-04-24 Thread Wei Liu

Signed-off-by: Wei Liu wei.l...@citrix.com
Cc: Ian Campbell ian.campb...@citrix.com
Cc: Ian Jackson ian.jack...@eu.citrix.com
---
 standalone | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/standalone b/standalone
index d8113e4..17fa40c 100755
--- a/standalone
+++ b/standalone
@@ -41,7 +41,7 @@ Options:
 -f FLIGHT, --flight=FLIGHTOperate on FLIGHT
 -h HOST, --host=HOST  Test host
 -r, --reuse   Do not wipe test host (default)
--R, --noreuse, --noreinstall  Wipe the test host (if job or test does so)
+-R, --noreuse, --reinstallWipe the test host (if job or test does so)
 -s, --simulate, --dry-run Dry run, printing what would be run
 --lvextendmax=GB  Limit the size of the build logical volume
 --baselineCreate a baseline flight instead of a tip flight
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 0/9] Porting the intel_pstate driver to Xen

2015-04-24 Thread Wang, Wei W

On 24/04/2015 17:56, Jan Beulich wrote:
  On 24.04.15 at 11:46, wei.w.w...@intel.com wrote:
  On 24/04/2015 17:11, Jan Beulich wrote:
   On 24.04.15 at 10:32, wei.w.w...@intel.com wrote:
   In the old driver, a powersave governor just sets the CPU to run
   with the lowest possible performance state. This one does not exist
   in the intel_pstate driver.
   The intel_pstate driver changes the terminology by using powersave
   to refer to the previous ondemand case. This does make people
   feel
  confused.
   But we may think it this way: it only has two modes, the max
   performance mode and the ondemand mode. ondemand is the one
 who
   saves power (actually in a more reasonable way compared to the
   previous powersave which simply sets the CPU to run with the
   lowest performance state). Anyway, we can surely change the name if
   it sounds
  uncomfortable.
 
  I think at the very least from a user interface perspective (e.g. the
  xenpm
  tool) the meaning of the old governor names should be retained as
  much as possible.
 
  Ok. I am simply using the name internal for user tools. Please see
  the example below:
 
  scaling_driver   : intel_pstate
  scaling_avail_gov: internal
  current_governor: internal
 
 But xenpm's set-scaling-governor command should still do something useful
 for the currently available governor options. And with that, showing
 internal in its output may also end up being confusing (at least I am 
 already
 being confused).

The case is similar to that in the kernel. Xen has two pstate driver, but only 
one can be loaded. When the intel_pstate driver is used (scaling_driver  : 
intel_pstate ),   xenpm's set-scaling-governor will not take effect, since the 
intel_pstate only implements its internal governor. 

 
 Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 3/5] libxl: use LIBXL_TOOLSTACK_DOMID

2015-04-24 Thread Wei Liu

On Fri, Apr 24, 2015 at 10:49:36AM +0200, Olaf Hering wrote:
 On Fri, Mar 20, Wei Liu wrote:
 
  +++ b/tools/libxl/libxl_dm.c
  @@ -1438,6 +1438,8 @@ retry_transaction:
   
   spawn-what = GCSPRINTF(domain %d device model, domid);
   spawn-xspath = GCSPRINTF(/local/domain/0/device-model/%d/state, 
  domid);
  +spawn-xspath = libxl__device_model_xs_path(gc, LIBXL_TOOLSTACK_DOMID,
  +domid, /state);
 
 This assigns xspath twice.
 

Good spot. I will send a patch to fix this. Luckily this doesn't cause
practical problem except for wasting some memory. :-)

Wei.

 Olaf

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCHv6 5/5] gnttab: use per-VCPU maptrack free lists

2015-04-24 Thread Jan Beulich

 On 24.04.15 at 11:09, malcolm.cross...@citrix.com wrote:
 On 23/04/15 17:11, Jan Beulich wrote:
 On 22.04.15 at 18:00, david.vra...@citrix.com wrote:
 --- a/xen/include/xen/grant_table.h
 +++ b/xen/include/xen/grant_table.h
 @@ -60,6 +60,8 @@ struct grant_mapping {
  u32  ref;   /* grant ref */
  u16  flags; /* 0-4: GNTMAP_* ; 5-15: unused */
  domid_t  domid; /* granting domain */
 +u32  vcpu;  /* vcpu which created the grant mapping */
 +u16  pad[2];
  };
 
 What is this pad[] good for?
 
 The pad is to keep the struct power of 2 sized because this allows the
 compiler to optimise these macro's to right and left shifts:
 
 #define MAPTRACK_PER_PAGE (PAGE_SIZE / sizeof(struct grant_mapping))
 #define maptrack_entry(t, e) \
 ((t)-maptrack[(e)/MAPTRACK_PER_PAGE][(e)%MAPTRACK_PER_PAGE])

Okay, then why u16[2] instead of u32? And please add a brief
comment explaining the reason.

Apart from that I wonder whether fitting vcpu in the 10 unused
flags bits (not 11, as the comment on the field suggests) would be
an option. That would require limiting vCPU count to 4k, which I
don't think would really be a problem for anyone.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 0/9] Porting the intel_pstate driver to Xen

2015-04-24 Thread Wang, Wei W

On 24/04/2015 18:20, Jan Beulich wrote
  On 24.04.15 at 12:07, wei.w.w...@intel.com wrote:
  On 24/04/2015 17:56, Jan Beulich wrote:
   On 24.04.15 at 11:46, wei.w.w...@intel.com wrote:
   On 24/04/2015 17:11, Jan Beulich wrote:
On 24.04.15 at 10:32, wei.w.w...@intel.com wrote:
   I think at the very least from a user interface perspective (e.g.
   the xenpm
   tool) the meaning of the old governor names should be retained as
   much as possible.
  
   Ok. I am simply using the name internal for user tools. Please
   see the example below:
  
   scaling_driver   : intel_pstate
   scaling_avail_gov: internal
   current_governor: internal
 
  But xenpm's set-scaling-governor command should still do something
  useful for the currently available governor options. And with that,
  showing internal in its output may also end up being confusing (at
  least I am already being confused).
 
  The case is similar to that in the kernel. Xen has two pstate driver,
  but only one can be loaded. When the intel_pstate driver is used
 (scaling_driver
   : intel_pstate ),   xenpm's set-scaling-governor will not take effect, 
  since
  the intel_pstate only implements its internal governor.
 
 But that's precisely what I'm asking to be changed: Even if internally is uses
 its own governor, the user interface of the tool should still be usable to
 achieve similar effects.

Yes, we should change that if we remain two governors in the intel_pstate 
driver.
But as we discussed before, the internal Performance governor seems redundant 
and can be removed. In that case, there is no other option of governors that be 
selected by the user via set-scaling-governor.

Best,
Wei


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH] libxl: convert strings and ints to xenbus_state

2015-04-24 Thread Olaf Hering

Convert all plain ints and strings which are used for xenbus state
files to xenbus_state. This makes it easier to find code which deals
with backend/frontend state changes.

Convert usage of libxl__sprintf to GCSPRINTF.

No change in behaviour is expected by this change, beside a small
increase of runtime memory usage in places that used a string constant.

Signed-off-by: Olaf Hering o...@aepfle.de
Cc: Ian Jackson ian.jack...@eu.citrix.com
Cc: Stefano Stabellini stefano.stabell...@eu.citrix.com
Cc: Ian Campbell ian.campb...@citrix.com
Cc: Wei Liu wei.l...@citrix.com
---
 tools/libxl/libxl.c| 26 +-
 tools/libxl/libxl_device.c |  4 ++--
 tools/libxl/libxl_pci.c| 18 +-
 3 files changed, 24 insertions(+), 24 deletions(-)

diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index feb3aa9..516713e 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -2080,7 +2080,7 @@ void libxl__device_vtpm_add(libxl__egc *egc, uint32_t 
domid,
 flexarray_append(back, online);
 flexarray_append(back, 1);
 flexarray_append(back, state);
-flexarray_append(back, GCSPRINTF(%d, 1));
+flexarray_append(back, GCSPRINTF(%d, XenbusStateInitialising));
 flexarray_append(back, handle);
 flexarray_append(back, GCSPRINTF(%d, vtpm-devid));
 
@@ -2092,7 +2092,7 @@ void libxl__device_vtpm_add(libxl__egc *egc, uint32_t 
domid,
 flexarray_append(front, backend-id);
 flexarray_append(front, GCSPRINTF(%d, vtpm-backend_domid));
 flexarray_append(front, state);
-flexarray_append(front, GCSPRINTF(%d, 1));
+flexarray_append(front, GCSPRINTF(%d, XenbusStateInitialising));
 flexarray_append(front, handle);
 flexarray_append(front, GCSPRINTF(%d, vtpm-devid));
 
@@ -2524,7 +2524,7 @@ static void device_disk_add(libxl__egc *egc, uint32_t 
domid,
 flexarray_append(back, bootable);
 flexarray_append(back, libxl__sprintf(gc, %d, 1));
 flexarray_append(back, state);
-flexarray_append(back, libxl__sprintf(gc, %d, 1));
+flexarray_append(back, GCSPRINTF(%d, XenbusStateInitialising));
 flexarray_append(back, dev);
 flexarray_append(back, disk-vdev);
 flexarray_append(back, type);
@@ -2544,7 +2544,7 @@ static void device_disk_add(libxl__egc *egc, uint32_t 
domid,
 flexarray_append(front, backend-id);
 flexarray_append(front, libxl__sprintf(gc, %d, disk-backend_domid));
 flexarray_append(front, state);
-flexarray_append(front, libxl__sprintf(gc, %d, 1));
+flexarray_append(front, GCSPRINTF(%d, XenbusStateInitialising));
 flexarray_append(front, virtual-device);
 flexarray_append(front, libxl__sprintf(gc, %d, device-devid));
 flexarray_append(front, device-type);
@@ -3148,7 +3148,7 @@ static void local_device_attach_cb(libxl__egc *egc, 
libxl__ao_device *aodev)
 if (rc  0)
 goto out;
 be_path = libxl__device_backend_path(gc, device);
-rc = libxl__wait_for_backend(gc, be_path, 4);
+rc = libxl__wait_for_backend(gc, be_path, GCSPRINTF(%d, 
XenbusStateConnected));
 if (rc  0)
 goto out;
 
@@ -3358,7 +3358,7 @@ void libxl__device_nic_add(libxl__egc *egc, uint32_t 
domid,
 flexarray_append(back, online);
 flexarray_append(back, 1);
 flexarray_append(back, state);
-flexarray_append(back, libxl__sprintf(gc, %d, 1));
+flexarray_append(back, GCSPRINTF(%d, XenbusStateInitialising));
 if (nic-script)
 flexarray_append_pair(back, script,
   libxl__abs_path(gc, nic-script,
@@ -3399,7 +3399,7 @@ void libxl__device_nic_add(libxl__egc *egc, uint32_t 
domid,
 flexarray_append(front, backend-id);
 flexarray_append(front, libxl__sprintf(gc, %d, nic-backend_domid));
 flexarray_append(front, state);
-flexarray_append(front, libxl__sprintf(gc, %d, 1));
+flexarray_append(front, GCSPRINTF(%d, XenbusStateInitialising));
 flexarray_append(front, handle);
 flexarray_append(front, libxl__sprintf(gc, %d, nic-devid));
 flexarray_append(front, mac);
@@ -3685,7 +3685,7 @@ int libxl__device_console_add(libxl__gc *gc, uint32_t 
domid,
 flexarray_append(back, online);
 flexarray_append(back, 1);
 flexarray_append(back, state);
-flexarray_append(back, libxl__sprintf(gc, %d, 1));
+flexarray_append(back, GCSPRINTF(%d, XenbusStateInitialising));
 flexarray_append(back, protocol);
 flexarray_append(back, LIBXL_XENCONSOLE_PROTOCOL);
 
@@ -3724,7 +3724,7 @@ int libxl__device_console_add(libxl__gc *gc, uint32_t 
domid,
 flexarray_append(ro_front, libxl__sprintf(gc, %lu, 
state-console_mfn));
 } else {
 flexarray_append(front, state);
-flexarray_append(front, libxl__sprintf(gc, %d, 1));
+flexarray_append(front, GCSPRINTF(%d, XenbusStateInitialising));
 flexarray_append(front, protocol);
 flexarray_append(front, LIBXL_XENCONSOLE_PROTOCOL);
 }
@@ -4021,12

Re: [Xen-devel] [PATCH 1/2] x86/hvm: actually release ioreq server pages

2015-04-24 Thread Paul Durrant

 -Original Message-
 From: Jan Beulich [mailto:jbeul...@suse.com]
 Sent: 24 April 2015 09:01
 To: Paul Durrant
 Cc: Andrew Cooper; xen-de...@lists.xenproject.org; Keir (Xen.org)
 Subject: Re: [PATCH 1/2] x86/hvm: actually release ioreq server pages

  On 23.04.15 at 17:46, paul.durr...@citrix.com wrote:
  --- a/xen/arch/x86/hvm/hvm.c
  +++ b/xen/arch/x86/hvm/hvm.c
  @@ -496,7 +496,7 @@ static void hvm_free_ioreq_gmfn(struct domain *d,
 unsigned long gmfn)
   {
   unsigned int i = gmfn - d-arch.hvm_domain.ioreq_gmfn.base;

  -clear_bit(i, d-arch.hvm_domain.ioreq_gmfn.mask);
  +set_bit(i, d-arch.hvm_domain.ioreq_gmfn.mask);
   }

 Double checking this I came to wonder whether
 HVM_PARAM_NR_IOREQ_SERVER_PAGES handling isn't broken
 when invoked a second time with a.value smaller than what was
 used the first time, or when any server pages are already in
 use.

Yes, the param is only intended to be a set-once param, the same as 
HVM_PARAM_IOREQ_SERVER_PFN. If the latter were changed after ioreq servers had 
been created then it could also lead to strangeness. I'll put together a patch 
to add checks for both.

  Paul

 Jan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [rumpuserxen test] 52054: regressions - FAIL

2015-04-24 Thread osstest service user

flight 52054 rumpuserxen real [real]
http://logs.test-lab.xenproject.org/osstest/logs/52054/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-amd64-rumpuserxen   5 rumpuserxen-build fail REGR. vs. 33866
 build-i386-rumpuserxen5 rumpuserxen-build fail REGR. vs. 33866

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-rumpuserxen-amd64  1 build-check(1)   blocked n/a
 test-amd64-i386-rumpuserxen-i386  1 build-check(1)   blocked  n/a

version targeted for testing:
 rumpuserxen  3b91e44996ea6ae1276bce1cc44f38701c53ee6f
baseline version:
 rumpuserxen  30d72f3fc5e35cd53afd82c8179cc0e0b11146ad


People who touched revisions under test:
  Antti Kantee po...@iki.fi
  Ian Jackson ian.jack...@eu.citrix.com
  Martin Lucina mar...@lucina.net
  Wei Liu l...@liuw.name


jobs:
 build-amd64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-i386   pass
 build-amd64-pvopspass
 build-i386-pvops pass
 build-amd64-rumpuserxen  fail
 build-i386-rumpuserxen   fail
 test-amd64-amd64-rumpuserxen-amd64   blocked 
 test-amd64-i386-rumpuserxen-i386 blocked 



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/osstest/pub/logs
images: /home/osstest/pub/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Not pushing.

(No revision log; it would be 535 lines long.)

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCHv6 5/5] gnttab: use per-VCPU maptrack free lists

2015-04-24 Thread Malcolm Crossley

On 24/04/15 10:50, Jan Beulich wrote:
 On 24.04.15 at 11:09, malcolm.cross...@citrix.com wrote:
 On 23/04/15 17:11, Jan Beulich wrote:
 On 22.04.15 at 18:00, david.vra...@citrix.com wrote:
 --- a/xen/include/xen/grant_table.h
 +++ b/xen/include/xen/grant_table.h
 @@ -60,6 +60,8 @@ struct grant_mapping {
  u32  ref;   /* grant ref */
  u16  flags; /* 0-4: GNTMAP_* ; 5-15: unused */
  domid_t  domid; /* granting domain */
 +u32  vcpu;  /* vcpu which created the grant mapping */
 +u16  pad[2];
  };

 What is this pad[] good for?

 The pad is to keep the struct power of 2 sized because this allows the
 compiler to optimise these macro's to right and left shifts:

 #define MAPTRACK_PER_PAGE (PAGE_SIZE / sizeof(struct grant_mapping))
 #define maptrack_entry(t, e) \
 ((t)-maptrack[(e)/MAPTRACK_PER_PAGE][(e)%MAPTRACK_PER_PAGE])
 
 Okay, then why u16[2] instead of u32? And please add a brief
 comment explaining the reason.
 
 Apart from that I wonder whether fitting vcpu in the 10 unused
 flags bits (not 11, as the comment on the field suggests) would be
 an option. That would require limiting vCPU count to 4k, which I
 don't think would really be a problem for anyone.

I'd like to not have the overhead of bit operations on a hot path.

We're going from 512 grant entries per page to 256 grant entries per
page. This is taking the Xen memory overhead from 0.2% to 0.4% for grant
mapped pages.

Is the extra 0.2% memory overhead that concerning?

Malcolm
 
 Jan
 


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCHv6 5/5] gnttab: use per-VCPU maptrack free lists

2015-04-24 Thread Jan Beulich

 On 24.04.15 at 12:21, malcolm.cross...@citrix.com wrote:
 On 24/04/15 10:50, Jan Beulich wrote:
 On 24.04.15 at 11:09, malcolm.cross...@citrix.com wrote:
 On 23/04/15 17:11, Jan Beulich wrote:
 On 22.04.15 at 18:00, david.vra...@citrix.com wrote:
 --- a/xen/include/xen/grant_table.h
 +++ b/xen/include/xen/grant_table.h
 @@ -60,6 +60,8 @@ struct grant_mapping {
  u32  ref;   /* grant ref */
  u16  flags; /* 0-4: GNTMAP_* ; 5-15: unused */
  domid_t  domid; /* granting domain */
 +u32  vcpu;  /* vcpu which created the grant mapping */
 +u16  pad[2];
  };

 What is this pad[] good for?

 The pad is to keep the struct power of 2 sized because this allows the
 compiler to optimise these macro's to right and left shifts:

 #define MAPTRACK_PER_PAGE (PAGE_SIZE / sizeof(struct grant_mapping))
 #define maptrack_entry(t, e) \
 ((t)-maptrack[(e)/MAPTRACK_PER_PAGE][(e)%MAPTRACK_PER_PAGE])
 
 Okay, then why u16[2] instead of u32? And please add a brief
 comment explaining the reason.
 
 Apart from that I wonder whether fitting vcpu in the 10 unused
 flags bits (not 11, as the comment on the field suggests) would be
 an option. That would require limiting vCPU count to 4k, which I
 don't think would really be a problem for anyone.
 
 I'd like to not have the overhead of bit operations on a hot path.
 
 We're going from 512 grant entries per page to 256 grant entries per
 page. This is taking the Xen memory overhead from 0.2% to 0.4% for grant
 mapped pages.
 
 Is the extra 0.2% memory overhead that concerning?

Perhaps not so much the memory overhead but the doubled
cache bandwidth needed. But anyway, this was only a
question, and with both you and Andrew being opposed let's
just stay with what you submitted.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] AMD IOMMU: only translate remapped IO-APIC RTEs

2015-04-24 Thread Jan Beulich

 On 24.04.15 at 16:05, li...@eikelenboom.it wrote:
 I see you commited AMD IOMMU: only translate remapped IO-APIC RTEs to 
 staging,
 any reason why your patch in 
 http://lists.xen.org/archives/html/xen-devel/2015-04/msg02253.html
 isn't commited yet ? 
 
 (still waiting for an formal ack from someone ? as Suravee also seems to 
 have tested it)

This wasn't even submitted as a patch for inclusion yet (hence it
necessarily lacks required acks). And that is because irrespective of
it being confirmed to work in some cases, I'm still being concerned
about Tim's warning regarding there being possible races left.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] AMD IOMMU: only translate remapped IO-APIC RTEs

2015-04-24 Thread Sander Eikelenboom


Friday, April 24, 2015, 4:12:42 PM, you wrote:

 On 24.04.15 at 16:05, li...@eikelenboom.it wrote:
 I see you commited AMD IOMMU: only translate remapped IO-APIC RTEs to 
 staging,
 any reason why your patch in 
 http://lists.xen.org/archives/html/xen-devel/2015-04/msg02253.html
 isn't commited yet ? 
 
 (still waiting for an formal ack from someone ? as Suravee also seems to 
 have tested it)

 This wasn't even submitted as a patch for inclusion yet (hence it
 necessarily lacks required acks). And that is because irrespective of
 it being confirmed to work in some cases, I'm still being concerned
 about Tim's warning regarding there being possible races left.

 Jan

Ah OK that's clear, i wasn't aware you had some concerns left.
Thanks for the explanation !

--
Sander



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v2] xen: Add __GFP_DMA flag when xen_swiotlb_init gets free pages on ARM

2015-04-24 Thread Konrad Rzeszutek Wilk

On Fri, Apr 24, 2015 at 10:16:40AM +0100, Stefano Stabellini wrote:
 Make sure that xen_swiotlb_init allocates buffers that are DMA capable
 when at least one memblock is available below 4G. Otherwise we assume
 that all devices on the SoC can cope with 4G addresses. We do this on
 ARM and ARM64, where dom0 is mapped 1:1, so pfn == mfn in this case.
 
 No functional changes on x86.
 
 From: Chen Baozi baoz...@gmail.com
 
 Signed-off-by: Chen Baozi baoz...@gmail.com
 Signed-off-by: Stefano Stabellini stefano.stabell...@eu.citrix.com
 Tested-by: Chen Baozi baoz...@gmail.com

Acked-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
 
 diff --git a/arch/arm/include/asm/xen/page.h b/arch/arm/include/asm/xen/page.h
 index 2f7e6ff..0b579b2 100644
 --- a/arch/arm/include/asm/xen/page.h
 +++ b/arch/arm/include/asm/xen/page.h
 @@ -110,5 +110,6 @@ static inline bool set_phys_to_machine(unsigned long pfn, 
 unsigned long mfn)
  bool xen_arch_need_swiotlb(struct device *dev,
  unsigned long pfn,
  unsigned long mfn);
 +unsigned long xen_get_swiotlb_free_pages(unsigned int order);
  
  #endif /* _ASM_ARM_XEN_PAGE_H */
 diff --git a/arch/arm/xen/mm.c b/arch/arm/xen/mm.c
 index 793551d..4983250 100644
 --- a/arch/arm/xen/mm.c
 +++ b/arch/arm/xen/mm.c
 @@ -4,6 +4,7 @@
  #include linux/gfp.h
  #include linux/highmem.h
  #include linux/export.h
 +#include linux/memblock.h
  #include linux/of_address.h
  #include linux/slab.h
  #include linux/types.h
 @@ -21,6 +22,20 @@
  #include asm/xen/hypercall.h
  #include asm/xen/interface.h
  
 +unsigned long xen_get_swiotlb_free_pages(unsigned int order)
 +{
 + struct memblock_region *reg;
 + gfp_t flags = __GFP_NOWARN;
 +
 + for_each_memblock(memory, reg) {
 + if (reg-base  (phys_addr_t)0x) {
 + flags |= __GFP_DMA;
 + break;
 + }
 + }
 + return __get_free_pages(flags, order);
 +}
 +
  enum dma_cache_op {
 DMA_UNMAP,
 DMA_MAP,
 diff --git a/arch/x86/include/asm/xen/page.h b/arch/x86/include/asm/xen/page.h
 index 358dcd3..c44a5d5 100644
 --- a/arch/x86/include/asm/xen/page.h
 +++ b/arch/x86/include/asm/xen/page.h
 @@ -269,4 +269,9 @@ static inline bool xen_arch_need_swiotlb(struct device 
 *dev,
   return false;
  }
  
 +static inline unsigned long xen_get_swiotlb_free_pages(unsigned int order)
 +{
 + return __get_free_pages(__GFP_NOWARN, order);
 +}
 +
  #endif /* _ASM_X86_XEN_PAGE_H */
 diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
 index 810ad41..4c54932 100644
 --- a/drivers/xen/swiotlb-xen.c
 +++ b/drivers/xen/swiotlb-xen.c
 @@ -235,7 +235,7 @@ retry:
  #define SLABS_PER_PAGE (1  (PAGE_SHIFT - IO_TLB_SHIFT))
  #define IO_TLB_MIN_SLABS ((120)  IO_TLB_SHIFT)
   while ((SLABS_PER_PAGE  order)  IO_TLB_MIN_SLABS) {
 - xen_io_tlb_start = (void 
 *)__get_free_pages(__GFP_NOWARN, order);
 + xen_io_tlb_start = (void 
 *)xen_get_swiotlb_free_pages(order);
   if (xen_io_tlb_start)
   break;
   order--;

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] io.c:164:d25v0 Weird HVM ioemulation status 1. domain_crash called from io.c:165

2015-04-24 Thread Konrad Rzeszutek Wilk

On Fri, Apr 24, 2015 at 02:02:36PM +0200, Sander Eikelenboom wrote:
 Hi, 
 
 On Xen-unstable I encountered this non stopping logspam while trying to 
 shutdown an HVM domain with 
 pci-passthrough:
 
 (XEN) [2015-04-24 11:55:47.802] io.c:164:d25v0 Weird HVM ioemulation status 1.
 (XEN) [2015-04-24 11:55:47.802] domain_crash called from io.c:165
 (XEN) [2015-04-24 11:55:47.802] io.c:164:d25v0 Weird HVM ioemulation status 1.

And I saw as well with the latest unstable. Hadn't tried to narrow it down.

 (XEN) [2015-04-24 11:55:47.802] domain_crash called from io.c:165
 (XEN) [2015-04-24 11:55:47.802] io.c:164:d25v0 Weird HVM ioemulation status 1.
 (XEN) [2015-04-24 11:55:47.802] domain_crash called from io.c:165
 (XEN) [2015-04-24 11:55:47.802] io.c:164:d25v0 Weird HVM ioemulation status 1.
 (XEN) [2015-04-24 11:55:47.802] domain_crash called from io.c:165
 (XEN) [2015-04-24 11:55:47.802] io.c:164:d25v0 Weird HVM ioemulation status 1.
 (XEN) [2015-04-24 11:55:47.802] domain_crash called from io.c:165
 (XEN) [2015-04-24 11:55:47.802] io.c:164:d25v0 Weird HVM ioemulation status 1.
 (XEN) [2015-04-24 11:55:47.802] domain_crash called from io.c:165
 (XEN) [2015-04-24 11:55:47.802] io.c:164:d25v0 Weird HVM ioemulation status 1.
 (XEN) [2015-04-24 11:55:47.802] domain_crash called from io.c:165
 (XEN) [2015-04-24 11:55:47.802] io.c:164:d25v0 Weird HVM ioemulation status 1.
 (XEN) [2015-04-24 11:55:47.802] domain_crash called from io.c:165
 (XEN) [2015-04-24 11:55:47.802] io.c:164:d25v0 Weird HVM ioemulation status 1.
 (XEN) [2015-04-24 11:55:47.802] domain_crash called from io.c:165
 (XEN) [2015-04-24 11:55:47.802] io.c:164:d25v0 Weird HVM ioemulation status 1.
 (XEN) [2015-04-24 11:55:47.802] domain_crash called from io.c:165
 (XEN) [2015-04-24 11:55:47.802] io.c:164:d25v0 Weird HVM ioemulation status 1.
  etc. etc. etc.
 
 The domain had shutdown perfectly well on earlier occasions, but this time it
 kept running. It did drop from it's assigned 3 vpcu's to 1 vpcu, but it 
 stalled 
 there.
 The log spam only stopped by destroying the guest.
 
 Since i haven't seen it before, it's probably hard to replicate.
 
 --
 Sander
 
 
 ___
 Xen-devel mailing list
 Xen-devel@lists.xen.org
 http://lists.xen.org/xen-devel

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 0/9] Porting the intel_pstate driver to Xen

2015-04-24 Thread Jan Beulich

 On 24.04.15 at 13:53, wei.w.w...@intel.com wrote:
 On 24/04/2015 18:20, Jan Beulich wrote
  On 24.04.15 at 12:07, wei.w.w...@intel.com wrote:
  On 24/04/2015 17:56, Jan Beulich wrote:
   On 24.04.15 at 11:46, wei.w.w...@intel.com wrote:
   On 24/04/2015 17:11, Jan Beulich wrote:
On 24.04.15 at 10:32, wei.w.w...@intel.com wrote:
   I think at the very least from a user interface perspective (e.g.
   the xenpm
   tool) the meaning of the old governor names should be retained as
   much as possible.
  
   Ok. I am simply using the name internal for user tools. Please
   see the example below:
  
   scaling_driver   : intel_pstate
   scaling_avail_gov: internal
   current_governor: internal
 
  But xenpm's set-scaling-governor command should still do something
  useful for the currently available governor options. And with that,
  showing internal in its output may also end up being confusing (at
  least I am already being confused).
 
  The case is similar to that in the kernel. Xen has two pstate driver,
  but only one can be loaded. When the intel_pstate driver is used
 (scaling_driver
   : intel_pstate ),   xenpm's set-scaling-governor will not take effect, 
 since
  the intel_pstate only implements its internal governor.
 
 But that's precisely what I'm asking to be changed: Even if internally is 
 uses
 its own governor, the user interface of the tool should still be usable to
 achieve similar effects.
 
 Yes, we should change that if we remain two governors in the intel_pstate 
 driver.
 But as we discussed before, the internal Performance governor seems 
 redundant and can be removed. In that case, there is no other option of 
 governors that be selected by the user via set-scaling-governor.

I'm not sure how else to express what I want (no matter how many
internal governors the intel_pstate driver has).

xenpm set-scaling-governor powersave
xenpm set-scaling-governor ondemand
xenpm set-scaling-governor performance

each should switch the system into a respective state, no matter
whether internally to the driver this means a change of governors
or just a modification to {min,max}_pct.

And obtaining the current state after any of the above should
show the same governor in use that was set (and not internal),
again no matter how this is being achieved internally to the
driver.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] AMD IOMMU: only translate remapped IO-APIC RTEs

2015-04-24 Thread Sander Eikelenboom


Friday, April 24, 2015, 12:12:32 AM, you wrote:



 On 4/23/15, 12:59, Sander Eikelenboom li...@eikelenboom.it wrote:


 On 4/23/15, 08:47, Jan Beulich jbeul...@suse.com wrote:

 On 23.04.15 at 15:31, suravee.suthikulpa...@amd.com wrote:

 
 On 4/17/15, 10:27, Jan Beulich jbeul...@suse.com wrote:
 
1aeb1156fa (x86 don't change affinity with interrupt unmasked)
introducing RTE reads prior to the respective interrupt having got
enabled for the first time uncovered a bug in 2ca9fbd739 (AMD IOMMU:
allocate IRTE entries instead of using a static mapping): We
obviously
shouldn't be translating RTEs for which remapping didn't get set up
yet.

Reported-by: Sander Eikelenboom li...@eikelenboom.it
Signed-off-by: Jan Beulich jbeul...@suse.com

--- a/xen/drivers/passthrough/amd/iommu_intr.c
+++ b/xen/drivers/passthrough/amd/iommu_intr.c
@@ -365,15 +365,17 @@ unsigned int amd_iommu_read_ioapic_from_
 unsigned int apic, unsigned int reg)
 {
 unsigned int val = __io_apic_read(apic, reg);
+unsigned int pin = (reg - 0x10) / 2;
+unsigned int offset =
ioapic_sbdf[IO_APIC_ID(apic)].pin_2_idx[pin];
 
-if ( !(reg  1) )
+if ( !(reg  1)  offset  INTREMAP_ENTRIES )
 {
-unsigned int offset = val  (INTREMAP_ENTRIES - 1);
 u16 bdf = ioapic_sbdf[IO_APIC_ID(apic)].bdf;
 u16 seg = ioapic_sbdf[IO_APIC_ID(apic)].seg;
 u16 req_id = get_intremap_requestor_id(seg, bdf);
 const u32 *entry = get_intremap_entry(seg, req_id, offset);
 
+ASSERT(offset == (val  (INTREMAP_ENTRIES - 1)));
 
 Jan, could you please explain why the ASSERT is needed here?

The previous value offset got assigned was calculated using
the right side expression. I.e. the assert makes sure that what
we used before and what we use now is the same.

Jan

 Jan,

 I have tested this patch w/ staging branch booting Dom0, and this patch
 got rid of the following error from xl dmesg:
 (XEN) APIC error on CPU0: 00(40)
 (XEN) APIC error on CPU2: 00(40)

 However, when I tried starting a guest w/ PCI device passthrough, the
 system crashed and reboot. Although, I don¹t think this is caused by the
 patch. 

 Acked-by: Suravee Suthikulpanit suravee.suthikulpa...@amd.com

 I¹m looking into the issue. I also went back and tested it with 4.5 on
the
 same setup and didn¹t see the issue.

 Thanks,
 Suravee

If it crashes with:


That problem is likely solved by another patch:

http://lists.xen.org/archives/html/xen-devel/2015-04/msg02253.html


 Yes, this has fixed the crash.

 Thanks,

 Suravee


Hi Jan,

I see you commited AMD IOMMU: only translate remapped IO-APIC RTEs to staging,
any reason why your patch in 
http://lists.xen.org/archives/html/xen-devel/2015-04/msg02253.html
isn't commited yet ? 

(still waiting for an formal ack from someone ? as Suravee also seems to have 
tested it)

--
Sander


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] io.c:164:d25v0 Weird HVM ioemulation status 1. domain_crash called from io.c:165

2015-04-24 Thread Olaf Hering

On Fri, Apr 24, Sander Eikelenboom wrote:

 (XEN) [2015-04-24 11:55:47.802] domain_crash called from io.c:165

 Since i haven't seen it before, it's probably hard to replicate.

I see it all time on my beloved ProBook. With an ordinary domU.cfg.

name=fv-13.1-pvscsi
uuid=4ff34139-8381-4f3b-8c3a-0078a794b152
memory=512
builder=hvm
boot=cd
disk=[
'vdev=hda, discard, 
target=/work/xen/pvscsi/fv-13.1-pvscsi/vdisk-fv-13.1-pvscsi.raw',
'vdev=hdc, discard, cdrom, 
target=/dist/iso/13.2/openSUSE-13.2-DVD-x86_64.iso',
]
vif=[ 
'mac=00:16:3e:13:01:00,ip=1.1.1.2,type=vif,gatewaydev=xentap,script=vif-route' ]
keymap=de
vnc=0
sdl=1
vncunused=1
serial=pty

Olaf

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v7 2/5] sysctl: Add sysctl interface for querying PCI topology

2015-04-24 Thread Boris Ostrovsky



On 04/24/2015 03:19 AM, Jan Beulich wrote:

On 24.04.15 at 00:20, boris.ostrov...@oracle.com wrote:

On 04/21/2015 03:01 AM, Jan Beulich wrote:

+ ((++dev_cnt  0x3f)  hypercall_preempt_check()) )
+break;
+}
+
+if ( (!ret || (ret == -ENODEV)) 
+ __copy_field_to_guest(u_sysctl, op, u.pcitopoinfo.first_dev) )
+ret = -EFAULT;
+}
+break;
+#endif

With the continuation-less model now used I don't think it makes
sense to have first_dev and num_devs - for re-invocation all the
caller needs to do is increment the buffer pointer suitably. I.e.
you can get away with just a count of devices afaict.


This would require walking xc_hypercall_buffer_t-hbuf. Would something like

  set_xen_guest_handle_raw(sysctl..., (void *)HYPERCALL_BUFFER_AS_ARG(foo) 
+ offset)

be acceptable? I don't think I see anything better.

I thought of adding set_xen_guest_handle_offset() that would look
similar to set_xen_guest_handle() but then I felt that having this in
API may not be a good idea since xc_hypercall_buffer_t-hbuf would end
up pointing to memory that is not allocated for full
xc_hypercall_buffer_t-sz.

There ought to be a way to create a guest handle from other than
the start of an allocated hypercall buffer, but that's really more a
question for the tool stack folks.


Yes, this was question for toolstack people.

(And my second paragraph was not stated correctly, now that I re-read 
it. I meant to say that my understanding is that API is expected to make 
all safety checks on buffers and with set_xen_guest_handle_offset() that 
I was picturing in my head we could pass in pretty much any pointer. I 
suppose we could check 'hbuf+offset  sz')



-boris

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH v3] x86: allow 64-bit PV guest kernels to suppress user mode exposure of M2P

2015-04-24 Thread Jan Beulich

Xen L4 entries being uniformly installed into any L4 table and 64-bit
PV kernels running in ring 3 means that user mode was able to see the
read-only M2P presented by Xen to the guests. While apparently not
really representing an exploitable information leak, this still very
certainly was never meant to be that way.

Building on the fact that these guests already have separate kernel and
user mode page tables we can allow guest kernels to tell Xen that they
don't want user mode to see this table. We can't, however, do this by
default: There is no ABI requirement that kernel and user mode page
tables be separate. Therefore introduce a new VM-assist flag allowing
the guest to control respective hypervisor behavior:
- when not set, L4 tables get created with the respective slot blank,
  and whenever the L4 table gets used as a kernel one the missing
  mapping gets inserted,
- when set, L4 tables get created with the respective slot initialized
  as before, and whenever the L4 table gets used as a user one the
  mapping gets zapped.

Signed-off-by: Jan Beulich jbeul...@suse.com
---
v3: Shadow mode changes re-written (to address the migration regression
in v2).

--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -339,7 +339,7 @@ static int setup_compat_l4(struct vcpu *
 
 l4tab = __map_domain_page(pg);
 clear_page(l4tab);
-init_guest_l4_table(l4tab, v-domain);
+init_guest_l4_table(l4tab, v-domain, 1);
 unmap_domain_page(l4tab);
 
 v-arch.guest_table = pagetable_from_page(pg);
@@ -971,7 +971,11 @@ int arch_set_info_guest(
 case -EINTR:
 rc = -ERESTART;
 case -ERESTART:
+break;
 case 0:
+if ( !compat  !VM_ASSIST(d, m2p_strict) 
+ !paging_mode_refcounts(d) )
+fill_ro_mpt(cr3_gfn);
 break;
 default:
 if ( cr3_page == current-arch.old_guest_table )
@@ -1006,7 +1010,10 @@ int arch_set_info_guest(
 default:
 if ( cr3_page == current-arch.old_guest_table )
 cr3_page = NULL;
+break;
 case 0:
+if ( VM_ASSIST(d, m2p_strict) )
+zap_ro_mpt(cr3_gfn);
 break;
 }
 }
--- a/xen/arch/x86/domain_build.c
+++ b/xen/arch/x86/domain_build.c
@@ -1205,7 +1205,7 @@ int __init construct_dom0(
 l3start = __va(mpt_alloc); mpt_alloc += PAGE_SIZE;
 }
 clear_page(l4tab);
-init_guest_l4_table(l4tab, d);
+init_guest_l4_table(l4tab, d, 0);
 v-arch.guest_table = pagetable_from_paddr(__pa(l4start));
 if ( is_pv_32on64_domain(d) )
 v-arch.guest_table_user = v-arch.guest_table;
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -1380,7 +1380,8 @@ static int alloc_l3_table(struct page_in
 return rc  0 ? 0 : rc;
 }
 
-void init_guest_l4_table(l4_pgentry_t l4tab[], const struct domain *d)
+void init_guest_l4_table(l4_pgentry_t l4tab[], const struct domain *d,
+ bool_t zap_ro_mpt)
 {
 /* Xen private mappings. */
 memcpy(l4tab[ROOT_PAGETABLE_FIRST_XEN_SLOT],
@@ -1395,6 +1396,25 @@ void init_guest_l4_table(l4_pgentry_t l4
 l4e_from_pfn(domain_page_map_to_mfn(l4tab), __PAGE_HYPERVISOR);
 l4tab[l4_table_offset(PERDOMAIN_VIRT_START)] =
 l4e_from_page(d-arch.perdomain_l3_pg, __PAGE_HYPERVISOR);
+if ( zap_ro_mpt || is_pv_32on64_domain(d) || paging_mode_refcounts(d) )
+l4tab[l4_table_offset(RO_MPT_VIRT_START)] = l4e_empty();
+}
+
+void fill_ro_mpt(unsigned long mfn)
+{
+l4_pgentry_t *l4tab = map_domain_page(mfn);
+
+l4tab[l4_table_offset(RO_MPT_VIRT_START)] =
+idle_pg_table[l4_table_offset(RO_MPT_VIRT_START)];
+unmap_domain_page(l4tab);
+}
+
+void zap_ro_mpt(unsigned long mfn)
+{
+l4_pgentry_t *l4tab = map_domain_page(mfn);
+
+l4tab[l4_table_offset(RO_MPT_VIRT_START)] = l4e_empty();
+unmap_domain_page(l4tab);
 }
 
 static int alloc_l4_table(struct page_info *page)
@@ -1444,7 +1464,7 @@ static int alloc_l4_table(struct page_in
 adjust_guest_l4e(pl4e[i], d);
 }
 
-init_guest_l4_table(pl4e, d);
+init_guest_l4_table(pl4e, d, !VM_ASSIST(d, m2p_strict));
 unmap_domain_page(pl4e);
 
 return rc  0 ? 0 : rc;
@@ -2754,6 +2774,8 @@ int new_guest_cr3(unsigned long mfn)
 
 invalidate_shadow_ldt(curr, 0);
 
+if ( !VM_ASSIST(d, m2p_strict)  !paging_mode_refcounts(d) )
+fill_ro_mpt(mfn);
 curr-arch.guest_table = pagetable_from_pfn(mfn);
 update_cr3(curr);
 
@@ -3111,6 +3133,8 @@ long do_mmuext_op(
 op.arg1.mfn);
 break;
 }
+if ( VM_ASSIST(d, m2p_strict)  !paging_mode_refcounts(d) )
+zap_ro_mpt(op.arg1.mfn);
 }
 
 curr-arch.guest_table_user = pagetable_from_pfn(op.arg1.mfn);
--- a/xen/arch/x86/mm/shadow/multi.c
+++

Re: [Xen-devel] [PATCH 0/9] Porting the intel_pstate driver to Xen

2015-04-24 Thread Jan Beulich

 On 24.04.15 at 16:56, wei.w.w...@intel.com wrote:
 On 24/04/2015 20:57, Jan Beulich wrote
 I'm not sure how else to express what I want (no matter how many internal
 governors the intel_pstate driver has).
 
 xenpm set-scaling-governor powersave
 xenpm set-scaling-governor ondemand
 xenpm set-scaling-governor performance
 
 each should switch the system into a respective state, no matter whether
 internally to the driver this means a change of governors or just a
 modification to {min,max}_pct.
 
 And obtaining the current state after any of the above should show the same
 governor in use that was set (and not internal), again no matter how this 
 is
 being achieved internally to the driver.
 
 Thanks Jan, that's clear. But this will have another issue. For example, we 
 set-scaling-governor to ondemand, then we adjust min_pct=max_pct = 60%. The 
 timer function may generate results like 35%, 55%, 45%..., but the CPU just 
 keeps running with 60%.

So I must be misunderstanding something then: How can the driver
do anything at all when told to run the system at 60%?

 Then, this is not ondemand at all (I think this 
 should be another reason why the intel_pstate driver does not call its 
 governor ondemand).
 
 The intel_pstate driver in the kernel has already got rid of the old 
 governor convention. They let the user get what they want through simply 
 adjusting the {min,max}_pct  (the {min,max}_pct actually limits how the 
 performance is selected).

Adjusting the values individually to me very much looks like the
userspace governor.

 I think we can follow the kernel implementation regarding this point, what 
 do you think?  

Not sure - I'm not always convinced that what Linux does is the one
and only and best way.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH] libxl: remove duplicate check for pci subsystem type

2015-04-24 Thread Olaf Hering

Both attach and detach functions get called only if the type matches.

Signed-off-by: Olaf Hering o...@aepfle.de
Cc: Jim Fehlig jfeh...@suse.com
---
 src/libxl/libxl_driver.c | 6 --
 1 file changed, 6 deletions(-)

diff --git a/src/libxl/libxl_driver.c b/src/libxl/libxl_driver.c
index d76efda..f915f91 100644
--- a/src/libxl/libxl_driver.c
+++ b/src/libxl/libxl_driver.c
@@ -2901,9 +2901,6 @@ libxlDomainAttachHostPCIDevice(libxlDriverPrivatePtr 
driver,
 
 libxl_device_pci_init(pcidev);
 
-if (hostdev-source.subsys.type != VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI)
-goto cleanup;
-
 if (virDomainHostdevFind(vm-def, hostdev, found) = 0) {
 virReportError(VIR_ERR_OPERATION_FAILED,
_(target pci device %.4x:%.2x:%.2x.%.1x already 
exists),
@@ -3239,9 +3236,6 @@ libxlDomainDetachHostPCIDevice(libxlDriverPrivatePtr 
driver,
 
 libxl_device_pci_init(pcidev);
 
-if (subsys-type != VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI)
-goto cleanup;
-
 idx = virDomainHostdevFind(vm-def, hostdev, detach);
 if (idx  0) {
 virReportError(VIR_ERR_OPERATION_FAILED,

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3] x86: allow 64-bit PV guest kernels to suppress user mode exposure of M2P

2015-04-24 Thread Jan Beulich

 On 24.04.15 at 17:54, andrew.coop...@citrix.com wrote:
 On 24/04/15 16:07, Jan Beulich wrote:
 On 24.04.15 at 16:57, andrew.coop...@citrix.com wrote:
 On 24/04/15 15:31, Jan Beulich wrote:
 Xen L4 entries being uniformly installed into any L4 table and 64-bit
 PV kernels running in ring 3 means that user mode was able to see the
 read-only M2P presented by Xen to the guests. While apparently not
 really representing an exploitable information leak, this still very
 certainly was never meant to be that way.

 Building on the fact that these guests already have separate kernel and
 user mode page tables we can allow guest kernels to tell Xen that they
 don't want user mode to see this table. We can't, however, do this by
 default: There is no ABI requirement that kernel and user mode page
 tables be separate. Therefore introduce a new VM-assist flag allowing
 the guest to control respective hypervisor behavior:
 - when not set, L4 tables get created with the respective slot blank,
   and whenever the L4 table gets used as a kernel one the missing
   mapping gets inserted,
 - when set, L4 tables get created with the respective slot initialized
   as before, and whenever the L4 table gets used as a user one the
   mapping gets zapped.
 Is this complete?
 Yes.

 For backwards compatibility, older kernels will not have m2p_strict set,
 and the m2p should unconditionally appear in all L4s.
 No. L4s _only_ used for user mode have no need for this entry to
 be non-zero.
 
 There is only ever a single L4 in a particular virtual address space. 
 The M2P is part of the Xen ABI range.
 
 Previously, the M2P was present in all L4s which is why they leaked into
 user context.
 
 If we don not wish to break backwards compatibility with this change,
 then in relaxed mode the M2P must remain in all tables, or Userspace
 which is actually making use of mappings will suddenly start faulting
 because of a change in Xen, not a kernel change (and an unknowing kernel
 might not be prepared to handle this case).

Userspace was never supposed to access this table, and hence
revoking access to the table cannot be a problem there.

 By your description, in the relaxed case a newly created L4 which is
 first used as user table will have the mapping clear.

The ordering doesn't matter. In relaxed mode, the first time a
table gets used for kernel purposes the entry will be inserted
(and hence potentially become visible to user mode).

But anyway I'm confused that you're starting this discussion
now, when the patch had already gone in and only needed to
be reverted because of breaking migration.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 0/9] Porting the intel_pstate driver to Xen

2015-04-24 Thread Wang, Wei W

On 24/04/2015 23:54, Jan Beulich wrote
  On 24.04.15 at 17:42, wei.w.w...@intel.com wrote:
  On 24/04/2015 23:04, Jan Beulich wrote
   On 24.04.15 at 16:56, wei.w.w...@intel.com wrote:
   On 24/04/2015 20:57, Jan Beulich wrote
   I'm not sure how else to express what I want (no matter how many
   internal governors the intel_pstate driver has).
  
   xenpm set-scaling-governor powersave xenpm set-scaling-governor
   ondemand xenpm set-scaling-governor performance
  
   each should switch the system into a respective state, no matter
   whether internally to the driver this means a change of governors
   or just a modification to {min,max}_pct.
  
   And obtaining the current state after any of the above should show
   the same governor in use that was set (and not internal), again
   no matter how this is being achieved internally to the driver.
  
   Thanks Jan, that's clear. But this will have another issue. For
   example, we set-scaling-governor to ondemand, then we adjust
   min_pct=max_pct = 60%. The timer function may generate results like
   35%, 55%, 45%..., but the CPU just keeps running with 60%.
 
  So I must be misunderstanding something then: How can the driver do
  anything at all when told to run the system at 60%?
 
  The {min,max}_pct is a limit.
 
 That's clear. The question is whether these are hardware imposed limits or
 set by the user. In the latter case ...


The hardware CPU has its own minimal performance limit, let's call it 
hw_min_pct. For example, on my machine, the hw_min_cpt=32% (corresponds to 
1.2GHZ, which is the minimal frequency). 
The [min,max]_pct in our discussion is set by the user via xenpm. If min_pct is 
set to a value less than 32%, than it will be adjusted to hw_min_cpt.
 

   Then, this is not ondemand at all (I think this should be another
   reason why the intel_pstate driver does not call its governor
   ondemand).
  
   The intel_pstate driver in the kernel has already got rid of the
   old governor convention. They let the user get what they want
   through simply adjusting the {min,max}_pct  (the {min,max}_pct
   actually limits how the performance is selected).
 
  Adjusting the values individually to me very much looks like the
  userspace governor.
 
  Yeah, that example was like userspace. Please take a look at this example:
  [min_pct=60%, max_pct=80%], the timer generates 45%, 55%, 65%, 70%,
  75%, 90%, then the final target values will not be constant.
 
 (the 45%, 55%, and 90% here shouldn't happen in any case)

Yes. The final values will be 60%, 60%, 65%, 70%, 75%, 80%. This is neither a 
userspace governor nor a ondemand governor.


  The ones (65%, 70%, 75%)
  falling into the limit interval behaves like ondemand, others are not.
 
 ... restricting to these would be the job of the driver, and
 - ondemand would be 0...100
 - powersave would be 0
 - performance would be 100
 Everything else would be userspace.

Please see the example above, the internal governor in the new driver is 
different from these conventional governors.

Best,
Wei

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v7 2/5] sysctl: Add sysctl interface for querying PCI topology

2015-04-24 Thread Boris Ostrovsky



On 04/24/2015 10:09 AM, Boris Ostrovsky wrote:


On 04/24/2015 03:19 AM, Jan Beulich wrote:

On 24.04.15 at 00:20, boris.ostrov...@oracle.com wrote:

On 04/21/2015 03:01 AM, Jan Beulich wrote:

+ ((++dev_cnt  0x3f)  hypercall_preempt_check()) )
+break;
+}
+
+if ( (!ret || (ret == -ENODEV)) 
+ __copy_field_to_guest(u_sysctl, op, 
u.pcitopoinfo.first_dev) )

+ret = -EFAULT;
+}
+break;
+#endif

With the continuation-less model now used I don't think it makes
sense to have first_dev and num_devs - for re-invocation all the
caller needs to do is increment the buffer pointer suitably. I.e.
you can get away with just a count of devices afaict.

This would require walking xc_hypercall_buffer_t-hbuf. Would 
something like


  set_xen_guest_handle_raw(sysctl..., (void 
*)HYPERCALL_BUFFER_AS_ARG(foo) + offset)


be acceptable? I don't think I see anything better.

I thought of adding set_xen_guest_handle_offset() that would look
similar to set_xen_guest_handle() but then I felt that having this in
API may not be a good idea since xc_hypercall_buffer_t-hbuf would end
up pointing to memory that is not allocated for full
xc_hypercall_buffer_t-sz.

There ought to be a way to create a guest handle from other than
the start of an allocated hypercall buffer, but that's really more a
question for the tool stack folks.


Yes, this was question for toolstack people.


(Adjusted TO/CC to reflect this).



(And my second paragraph was not stated correctly, now that I re-read 
it. I meant to say that my understanding is that API is expected to 
make all safety checks on buffers and with 
set_xen_guest_handle_offset() that I was picturing in my head we could 
pass in pretty much any pointer. I suppose we could check 'hbuf+offset 
 sz')



-boris



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [RFC v1 00/15] Add VT-d Posted-Interrupts support

2015-04-24 Thread Wu, Feng

Ping..

Thanks,
Feng

 -Original Message-
 From: Jan Beulich [mailto:jbeul...@suse.com]
 Sent: Monday, April 13, 2015 5:13 AM
 To: Wu, Feng
 Cc: Tian, Kevin; Zhang, Yang Z; xen-devel@lists.xen.org; k...@xen.org
 Subject: RE: [RFC v1 00/15] Add VT-d Posted-Interrupts support

  On 01.04.15 at 15:21, feng...@intel.com wrote:
  Hi Jan,

  Any more comments about this series? Thanks a lot!

 I think it makes little sense to wait for my review before posting v2
 with issues addressed which others have pointed out. I just
 returned from vacation, and would have time to look at v1 next
 week the earliest (with the week after being occupied by travel).

 Jan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v6 03/14] x86: detect and initialize Intel CAT feature

2015-04-24 Thread Jan Beulich

 On 23.04.15 at 11:55, chao.p.p...@linux.intel.com wrote:
  struct psr_cmt *__read_mostly psr_cmt;
 +
 +static unsigned long __read_mostly * cat_socket_init_bitmap;
 +static unsigned long __read_mostly * cat_socket_enable_bitmap;
 +static struct psr_cat_socket_info *__read_mostly cat_socket_info;

So the last line looks right, and the line of context left in place is
well formed too. Why did you not follow these good examples for
the two newly added lines?

And what is cat_socket_enable_bitmap needed for anyway?

 @@ -194,8 +210,67 @@ void psr_ctxt_switch_to(struct domain *d)
  }
  }
  
 +static void cat_cpu_init(void)
 +{
 +unsigned int eax, ebx, ecx, edx;
 +struct psr_cat_socket_info *info;
 +unsigned int socket;
 +unsigned int cpu = smp_processor_id();
 +const struct cpuinfo_x86 *c = cpu_data + cpu;
 +
 +if ( !cpu_has(c, X86_FEATURE_CAT) )
 +return;
 +
 +socket = cpu_to_socket(cpu);
 +ASSERT(socket  nr_sockets);
 +
 +info = cat_socket_info + socket;
 +
 +/* Avoid initializing more than one times for the same socket. */
 +if ( test_and_set_bit(socket, cat_socket_init_bitmap) )
 +return;

So what if a package got offlined and then is being brought back
online (perhaps after having got replaced)?

 +static void __init init_psr_cat(void)
 +{
 +size_t nlongs = BITS_TO_LONGS(nr_sockets);
 +
 +cat_socket_init_bitmap = xzalloc_array(unsigned long, nlongs);
 +if ( !cat_socket_init_bitmap )
 +return;
 +
 +cat_socket_enable_bitmap = xzalloc_array(unsigned long, nlongs);
 +if ( !cat_socket_enable_bitmap )
 +{
 +xfree(cat_socket_init_bitmap);
 +return;
 +}
 +
 +cat_socket_info = xzalloc_array(struct psr_cat_socket_info, nr_sockets);
 +if ( !cat_socket_info )
 +{
 +xfree(cat_socket_init_bitmap);
 +xfree(cat_socket_enable_bitmap);
 +}
 +}

Please do all three allocs in a row, then check if any of them failed,
and on the error path free and clear all three pointers. This will
especially pay off if later further allocations get added here.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3] x86: allow 64-bit PV guest kernels to suppress user mode exposure of M2P

2015-04-24 Thread Jan Beulich

 On 24.04.15 at 16:57, andrew.coop...@citrix.com wrote:
 On 24/04/15 15:31, Jan Beulich wrote:
 Xen L4 entries being uniformly installed into any L4 table and 64-bit
 PV kernels running in ring 3 means that user mode was able to see the
 read-only M2P presented by Xen to the guests. While apparently not
 really representing an exploitable information leak, this still very
 certainly was never meant to be that way.

 Building on the fact that these guests already have separate kernel and
 user mode page tables we can allow guest kernels to tell Xen that they
 don't want user mode to see this table. We can't, however, do this by
 default: There is no ABI requirement that kernel and user mode page
 tables be separate. Therefore introduce a new VM-assist flag allowing
 the guest to control respective hypervisor behavior:
 - when not set, L4 tables get created with the respective slot blank,
   and whenever the L4 table gets used as a kernel one the missing
   mapping gets inserted,
 - when set, L4 tables get created with the respective slot initialized
   as before, and whenever the L4 table gets used as a user one the
   mapping gets zapped.
 
 Is this complete?

Yes.

 For backwards compatibility, older kernels will not have m2p_strict set,
 and the m2p should unconditionally appear in all L4s.

No. L4s _only_ used for user mode have no need for this entry to
be non-zero. The difference between strict and relaxed mode is as
described - in strict mode, L4s default to have the entry populated
and tables used for user mode get it stripped, while in relaxed mode
L4s default to the empty and get the entry populated when used
for kernel mode. This guarantees that in non-strict mode all kernel
page tables have the entry filled, while in strict mode it guarantees
that all user page tables have it cleared.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v6 08/14] x86: add scheduling support for Intel CAT

2015-04-24 Thread Jan Beulich

 On 23.04.15 at 11:55, chao.p.p...@linux.intel.com wrote:
 On context switch, write the the domain's Class of Service(COS) to MSR
 IA32_PQR_ASSOC, to notify hardware to use the new COS.
 
 For performance reason, the COS mask for current cpu is also cached in
 the local per-CPU variable.
 
 Signed-off-by: Chao Peng chao.p.p...@linux.intel.com

Acked-by: Jan Beulich jbeul...@suse.com


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3] x86: allow 64-bit PV guest kernels to suppress user mode exposure of M2P

2015-04-24 Thread Andrew Cooper

On 24/04/15 15:31, Jan Beulich wrote:
 Xen L4 entries being uniformly installed into any L4 table and 64-bit
 PV kernels running in ring 3 means that user mode was able to see the
 read-only M2P presented by Xen to the guests. While apparently not
 really representing an exploitable information leak, this still very
 certainly was never meant to be that way.

 Building on the fact that these guests already have separate kernel and
 user mode page tables we can allow guest kernels to tell Xen that they
 don't want user mode to see this table. We can't, however, do this by
 default: There is no ABI requirement that kernel and user mode page
 tables be separate. Therefore introduce a new VM-assist flag allowing
 the guest to control respective hypervisor behavior:
 - when not set, L4 tables get created with the respective slot blank,
   and whenever the L4 table gets used as a kernel one the missing
   mapping gets inserted,
 - when set, L4 tables get created with the respective slot initialized
   as before, and whenever the L4 table gets used as a user one the
   mapping gets zapped.

Is this complete?

For backwards compatibility, older kernels will not have m2p_strict set,
and the m2p should unconditionally appear in all L4s.

If m2p_strict is set then the mapping should be zapped for user L4s.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] Xen on CentoOS7

2015-04-24 Thread Simon Rowe


On 24/04/15 12:26, Benoit L'hoest wrote:

Hello,

I'm a student in computer science, during my internship i have to go
and explore multipathing solution for several hypervisor.
I'm looking into MPTCP (multi-path TCP) this is a extension of TCP
protocols that allows to aggregate the bandwith of several connection.
I'd like tu use MPTCP to abble to have a nice bandwith between the
hypervisor and the NAS. That would be a nice software solution to
avoid buying expensive 10 Gig or 100 Gig NICs.

In order to make some tests i'd like to install MPTCP on Xenserver.
There is a .rpm for CentOS 7 but Xenserver is on CentOS 5 and some
dependancies are missing. I'm not confident enought to build my own
supplemental pack for xenserver 6.5.

I'd like to know if there is a easy way to add MPTCP to Xenserver ?
Is there a CentOS 7 version that i can get ?

Thanks for your help.

Ben,

PS : If you feel to help me a bit more here is a link to my support
topic where i explain the problem i have when i try to boot on my
recompiled kernel
http://discussions.citrix.com/topic/364033-change-dom0-kernel/#entry1870428

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel




Technical discussions specifically related to XenServer are best posted 
to xs-de...@lists.xenserver.org.


There will be a XenServer alpha release based on CentOS7 coming out shortly,

Simon


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 0/9] Porting the intel_pstate driver to Xen

2015-04-24 Thread Wang, Wei W

On 24/04/2015 23:04, Jan Beulich wrote
  On 24.04.15 at 16:56, wei.w.w...@intel.com wrote:
  On 24/04/2015 20:57, Jan Beulich wrote
  I'm not sure how else to express what I want (no matter how many
  internal governors the intel_pstate driver has).
 
  xenpm set-scaling-governor powersave
  xenpm set-scaling-governor ondemand
  xenpm set-scaling-governor performance
 
  each should switch the system into a respective state, no matter
  whether internally to the driver this means a change of governors or
  just a modification to {min,max}_pct.
 
  And obtaining the current state after any of the above should show
  the same governor in use that was set (and not internal), again no
  matter how this is being achieved internally to the driver.
 
  Thanks Jan, that's clear. But this will have another issue. For
  example, we set-scaling-governor to ondemand, then we adjust
  min_pct=max_pct = 60%. The timer function may generate results like
  35%, 55%, 45%..., but the CPU just keeps running with 60%.
 
 So I must be misunderstanding something then: How can the driver do
 anything at all when told to run the system at 60%?

The {min,max}_pct is a limit. The timer function figures out a proper value 
based on the sampled statistics, then this value is clamped into [min_pct, 
max_pct]. When we have [60%, 60%], whatever the value from the timer function 
is, it will be finally adjusted to 60%, and set to the perf_ctl register. 
 
  Then, this is not ondemand at all (I think this should be another
  reason why the intel_pstate driver does not call its governor
  ondemand).
 
  The intel_pstate driver in the kernel has already got rid of the old
  governor convention. They let the user get what they want through
  simply adjusting the {min,max}_pct  (the {min,max}_pct actually limits
  how the performance is selected).
 
 Adjusting the values individually to me very much looks like the userspace
 governor.

Yeah, that example was like userspace. Please take a look at this example: 
[min_pct=60%, max_pct=80%], the timer generates 45%, 55%, 65%, 70%, 75%, 90%, 
then the final target values will not be constant. The ones (65%, 70%, 75%) 
falling into the limit interval behaves like ondemand, others are not.

 
  I think we can follow the kernel implementation regarding this point,
  what do you think?
 
 Not sure - I'm not always convinced that what Linux does is the one and only
 and best way.

Understand it. But I think that usage is good, in terms of supporting future 
intel processors (e.g. the hardware controlled P-states on Skylake+). The 
{min,max}_pct needs to be exposed to users to set the limits.

Best,
Wei


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 0/9] Porting the intel_pstate driver to Xen

2015-04-24 Thread Konrad Rzeszutek Wilk

On Fri, Apr 24, 2015 at 03:42:40PM +, Wang, Wei W wrote:
 On 24/04/2015 23:04, Jan Beulich wrote
   On 24.04.15 at 16:56, wei.w.w...@intel.com wrote:
   On 24/04/2015 20:57, Jan Beulich wrote
   I'm not sure how else to express what I want (no matter how many
   internal governors the intel_pstate driver has).
  
   xenpm set-scaling-governor powersave
   xenpm set-scaling-governor ondemand
   xenpm set-scaling-governor performance
  
   each should switch the system into a respective state, no matter
   whether internally to the driver this means a change of governors or
   just a modification to {min,max}_pct.
  
   And obtaining the current state after any of the above should show
   the same governor in use that was set (and not internal), again no
   matter how this is being achieved internally to the driver.
  
   Thanks Jan, that's clear. But this will have another issue. For
   example, we set-scaling-governor to ondemand, then we adjust
   min_pct=max_pct = 60%. The timer function may generate results like
   35%, 55%, 45%..., but the CPU just keeps running with 60%.
  
  So I must be misunderstanding something then: How can the driver do
  anything at all when told to run the system at 60%?
 
 The {min,max}_pct is a limit. The timer function figures out a proper value 
 based on the sampled statistics, then this value is clamped into [min_pct, 
 max_pct]. When we have [60%, 60%], whatever the value from the timer function 
 is, it will be finally adjusted to 60%, and set to the perf_ctl register. 
  
   Then, this is not ondemand at all (I think this should be another
   reason why the intel_pstate driver does not call its governor
   ondemand).
  
   The intel_pstate driver in the kernel has already got rid of the old
   governor convention. They let the user get what they want through
   simply adjusting the {min,max}_pct  (the {min,max}_pct actually limits
   how the performance is selected).
  
  Adjusting the values individually to me very much looks like the userspace
  governor.
 
 Yeah, that example was like userspace. Please take a look at this example: 
 [min_pct=60%, max_pct=80%], the timer generates 45%, 55%, 65%, 70%, 75%, 90%, 
 then the final target values will not be constant. The ones (65%, 70%, 75%) 
 falling into the limit interval behaves like ondemand, others are not.
 
  
   I think we can follow the kernel implementation regarding this point,
   what do you think?
  
  Not sure - I'm not always convinced that what Linux does is the one and only
  and best way.
 
 Understand it. But I think that usage is good, in terms of supporting future 
 intel processors (e.g. the hardware controlled P-states on Skylake+). The 
 {min,max}_pct needs to be exposed to users to set the limits.

How will this affect AMD processors which can use the cpufreq? Would the
ondemand feature go away?
 
 Best,
 Wei
 
 
 ___
 Xen-devel mailing list
 Xen-devel@lists.xen.org
 http://lists.xen.org/xen-devel

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [v3 01/10] vmx: add new boot parameter to control PML enabling

2015-04-24 Thread Jan Beulich

 On 24.04.15 at 10:19, kai.hu...@linux.intel.com wrote:
 --- a/xen/arch/x86/hvm/vmx/vmcs.c
 +++ b/xen/arch/x86/hvm/vmx/vmcs.c
 @@ -64,6 +64,36 @@ integer_param(ple_gap, ple_gap);
  static unsigned int __read_mostly ple_window = 4096;
  integer_param(ple_window, ple_window);
  
 +static bool_t __read_mostly opt_pml_enabled = 0;
 +
 +/*
 + * The 'ept' parameter controls functionalities that depend on, or impact the
 + * EPT mechanism. Optional comma separated value may contain:
 + *
 + *  pml Enable PML
 + */
 +static void __init parse_ept_param(char *s)
 +{
 +char *ss;
 +
 +do {
 +bool_t val = !!strncmp(s, no-, 3);
 +if ( !val )

In case another round is needed, a blank line is missing above.

 +s += 3;
 +
 +ss = strchr(s, ',');
 +if ( ss )
 +*ss = '\0';
 +
 +if ( !strcmp(s, pml) )
 +opt_pml_enabled = val;
 +
 +s = ss + 1;
 +} while ( ss );
 +}
 +
 +custom_param(ept, parse_ept_param);

And a superfluous blank line would want to be dropped here.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 0/9] Porting the intel_pstate driver to Xen

2015-04-24 Thread Wang, Wei W

On 24/04/2015 20:57, Jan Beulich wrote
  On 24.04.15 at 13:53, wei.w.w...@intel.com wrote:
  On 24/04/2015 18:20, Jan Beulich wrote
   On 24.04.15 at 12:07, wei.w.w...@intel.com wrote:
   On 24/04/2015 17:56, Jan Beulich wrote:
On 24.04.15 at 11:46, wei.w.w...@intel.com wrote:
On 24/04/2015 17:11, Jan Beulich wrote:
 On 24.04.15 at 10:32, wei.w.w...@intel.com wrote:
I think at the very least from a user interface perspective (e.g.
the xenpm
tool) the meaning of the old governor names should be retained
as much as possible.
   
Ok. I am simply using the name internal for user tools. Please
see the example below:
   
scaling_driver   : intel_pstate
scaling_avail_gov: internal
current_governor: internal
  
   But xenpm's set-scaling-governor command should still do something
   useful for the currently available governor options. And with
   that, showing internal in its output may also end up being
   confusing (at least I am already being confused).
  
   The case is similar to that in the kernel. Xen has two pstate
   driver, but only one can be loaded. When the intel_pstate driver is
   used
  (scaling_driver
: intel_pstate ),   xenpm's set-scaling-governor will not take effect,
  since
   the intel_pstate only implements its internal governor.
 
  But that's precisely what I'm asking to be changed: Even if
  internally is
  uses
  its own governor, the user interface of the tool should still be
  usable to achieve similar effects.
 
  Yes, we should change that if we remain two governors in the
  intel_pstate driver.
  But as we discussed before, the internal Performance governor seems
  redundant and can be removed. In that case, there is no other option
  of governors that be selected by the user via set-scaling-governor.
 
 I'm not sure how else to express what I want (no matter how many internal
 governors the intel_pstate driver has).
 
 xenpm set-scaling-governor powersave
 xenpm set-scaling-governor ondemand
 xenpm set-scaling-governor performance
 
 each should switch the system into a respective state, no matter whether
 internally to the driver this means a change of governors or just a
 modification to {min,max}_pct.
 
 And obtaining the current state after any of the above should show the same
 governor in use that was set (and not internal), again no matter how this is
 being achieved internally to the driver.

Thanks Jan, that's clear. But this will have another issue. For example, we 
set-scaling-governor to ondemand, then we adjust min_pct=max_pct = 60%. The 
timer function may generate results like 35%, 55%, 45%..., but the CPU just 
keeps running with 60%. Then, this is not ondemand at all (I think this 
should be another reason why the intel_pstate driver does not call its governor 
ondemand).

The intel_pstate driver in the kernel has already got rid of the old governor 
convention. They let the user get what they want through simply adjusting the 
{min,max}_pct  (the {min,max}_pct actually limits how the performance is 
selected).

I think we can follow the kernel implementation regarding this point, what do 
you think?  

Best,
Wei
 



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v6 06/14] x86: expose CBM length and COS number information

2015-04-24 Thread Jan Beulich

 On 23.04.15 at 11:55, chao.p.p...@linux.intel.com wrote:
 --- a/xen/arch/x86/sysctl.c
 +++ b/xen/arch/x86/sysctl.c
 @@ -171,6 +171,24 @@ long arch_do_sysctl(
  
  break;
  
 +case XEN_SYSCTL_psr_cat_op:
 +switch ( sysctl-u.psr_cat_op.cmd )
 +{
 +case XEN_SYSCTL_PSR_CAT_get_l3_info:
 +ret = psr_get_cat_l3_info(sysctl-u.psr_cat_op.target,
 +  
 sysctl-u.psr_cat_op.u.l3_info.cbm_len,
 +  
 sysctl-u.psr_cat_op.u.l3_info.cos_max);
 +
 +if ( !ret  __copy_to_guest(u_sysctl, sysctl, 1) )

Can't this be __copy_field_to_guest() for just the u.psr_cat_op part?

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH] OSSTEST: introduce a raisin build test

2015-04-24 Thread Stefano Stabellini

Signed-off-by: Stefano Stabellini stefano.stabell...@eu.citrix.com
---
 ap-common   |5 +++
 mfi-common  |   19 ++
 sg-run-job  |5 +++
 ts-raisin-build |  104 +++
 4 files changed, 133 insertions(+)
 create mode 100755 ts-raisin-build

diff --git a/ap-common b/ap-common
index 64749e3..985eeec 100644
--- a/ap-common
+++ b/ap-common
@@ -47,13 +47,18 @@
 # rumpsrc-related runvars needed only for old rumpuser-xen
 # (ie ones which need $bodges=1 in ts-rumpuserxen-build)
 
+: ${TREE_RAISIN:=git://xenbits.xen.org/people/sstabellini/raisin.git}
+: ${DEFAULT_REVISION_RAISIN:=master}
+
 : ${TREE_SEABIOS_UPSTREAM:=git://git.seabios.org/seabios.git}
 : ${PUSH_TREE_SEABIOS:=$XENBITS:/home/xen/git/osstest/seabios.git}
 : ${BASE_TREE_SEABIOS:=git://xenbits.xen.org/osstest/seabios.git}
+: ${TREE_SEABIOS:=$TREE_SEABIOS_UPSTREAM}
 
 : ${TREE_OVMF_UPSTREAM:=https://github.com/tianocore/edk2.git}
 : ${PUSH_TREE_OVMF:=$XENBITS:/home/xen/git/osstest/ovmf.git}
 : ${BASE_TREE_OVMF:=git://xenbits.xen.org/osstest/ovmf.git}
+: ${TREE_OVMF:=$TREE_OVMF_UPSTREAM}
 
 : ${TREE_LINUXFIRMWARE:=git://xenbits.xen.org/osstest/linux-firmware.git}
 : ${PUSH_TREE_LINUXFIRMWARE:=$XENBITS:/home/osstest/ext/linux-firmware.git}
diff --git a/mfi-common b/mfi-common
index 16fc8c5..051c9fc 100644
--- a/mfi-common
+++ b/mfi-common
@@ -215,6 +215,25 @@ create_build_jobs () {
 
 fi
 
+if [ x$REVISION_RAISIN != xdisable ]; then
+
+./cs-job-create $flight build-$arch-raisin build-raisin  \
+arch=$arch   \
+tree_xen=$TREE_XEN   \
+$RUNVARS $BUILD_RUNVARS $BUILD_RAISIN_RUNVARS $arch_runvars \
+$suite_runvars   \
+host_hostflags=$build_hostflags  \
+buildjob=${bfi}build-$arch   \
+tree_raisin=$TREE_RAISIN \
+tree_qemuu=$TREE_QEMU_UPSTREAM   \
+tree_qemu=$TREE_QEMU \
+tree_seabios=$TREE_SEABIOS   \
+tree_libvirt=$TREE_LIBVIRT   \
+tree_ovmf=$TREE_OVMF \
+revision_raisin=${REVISION_RAISIN:-${DEFAULT_REVISION_RAISIN}}\
+
+fi
+
 if branch_wants_rumpkernel_tests; then
 
 case $arch in
diff --git a/sg-run-job b/sg-run-job
index eae159d..449118d 100755
--- a/sg-run-job
+++ b/sg-run-job
@@ -346,6 +346,7 @@ proc need-hosts/build {} { return BUILD }
 proc need-hosts/build-kern {} { return BUILD }
 proc need-hosts/build-libvirt {} { return BUILD }
 proc need-hosts/build-rumpuserxen {} { return BUILD }
+proc need-hosts/build-raisin {} { return BUILD }
 
 proc run-job/build {} {
 run-ts . = ts-xen-build
@@ -364,6 +365,10 @@ proc run-job/build-rumpuserxen {} {
 run-ts . = ts-xen-build + host tools
 }
 
+proc run-job/build-raisin {} {
+run-ts . = ts-raisin-build
+}
+
 proc prepare-build-host {} {
 global jobinfo
 run-ts broken = ts-hosts-allocate + host
diff --git a/ts-raisin-build b/ts-raisin-build
new file mode 100755
index 000..f9b8ed6
--- /dev/null
+++ b/ts-raisin-build
@@ -0,0 +1,104 @@
+#!/usr/bin/perl -w
+# This is part of osstest, an automated testing framework for Xen.
+# Copyright (C) 2009-2013 Citrix Inc.
+# 
+# This program is free software: you can redistribute it and/or modify
+# it under the terms of the GNU Affero General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+# 
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU Affero General Public License for more details.
+# 
+# You should have received a copy of the GNU Affero General Public License
+# along with this program.  If not, see http://www.gnu.org/licenses/.
+
+use strict qw(vars);
+use DBI;
+use Osstest;
+use File::Path;
+use POSIX;
+use Osstest::TestSupport;
+use Osstest::BuildSupport;
+
+tsreadconfig();
+selectbuildhost(\@ARGV);
+# remaining arguments are passed as targets to make
+builddirsprops();
+
+sub checkout () {
+prepbuilddirs();
+build_clone($ho, 'raisin', $builddir, 'raisin');
+
+target_cmd_build($ho, 100, $builddir, END);
+cd $builddir/raisin
+   config
+
+echo config ENABLED_COMPONENTS=\\seabios ovmf xen qemu qemu_traditional 
libvirt\\
+echo config MAKE=\\make $makeflags\\
+echo config PREFIX=\\/usr\\
+echo config DESTDIR=dist
+
+echo config

Re: [Xen-devel] [PATCH 0/9] Porting the intel_pstate driver to Xen

2015-04-24 Thread Wang, Wei W

 On 25/04/2015 00:15, Konrad Rzeszutek Wilk wrote
 On Fri, Apr 24, 2015 at 03:42:40PM +, Wang, Wei W wrote:
  On 24/04/2015 23:04, Jan Beulich wrote
On 24.04.15 at 16:56, wei.w.w...@intel.com wrote:
On 24/04/2015 20:57, Jan Beulich wrote
I'm not sure how else to express what I want (no matter how many
internal governors the intel_pstate driver has).
   
xenpm set-scaling-governor powersave xenpm set-scaling-governor
ondemand xenpm set-scaling-governor performance
   
each should switch the system into a respective state, no matter
whether internally to the driver this means a change of governors
or just a modification to {min,max}_pct.
   
And obtaining the current state after any of the above should
show the same governor in use that was set (and not internal),
again no matter how this is being achieved internally to the driver.
   
Thanks Jan, that's clear. But this will have another issue. For
example, we set-scaling-governor to ondemand, then we adjust
min_pct=max_pct = 60%. The timer function may generate results
like 35%, 55%, 45%..., but the CPU just keeps running with 60%.
  
   So I must be misunderstanding something then: How can the driver do
   anything at all when told to run the system at 60%?
 
  The {min,max}_pct is a limit. The timer function figures out a proper value
 based on the sampled statistics, then this value is clamped into [min_pct,
 max_pct]. When we have [60%, 60%], whatever the value from the timer
 function is, it will be finally adjusted to 60%, and set to the perf_ctl 
 register.
 
Then, this is not ondemand at all (I think this should be
another reason why the intel_pstate driver does not call its
governor ondemand).
   
The intel_pstate driver in the kernel has already got rid of the
old governor convention. They let the user get what they want
through simply adjusting the {min,max}_pct  (the {min,max}_pct
actually limits how the performance is selected).
  
   Adjusting the values individually to me very much looks like the
   userspace governor.
 
  Yeah, that example was like userspace. Please take a look at this example:
 [min_pct=60%, max_pct=80%], the timer generates 45%, 55%, 65%, 70%,
 75%, 90%, then the final target values will not be constant. The ones (65%,
 70%, 75%) falling into the limit interval behaves like ondemand, others are
 not.
 
  
I think we can follow the kernel implementation regarding this
point, what do you think?
  
   Not sure - I'm not always convinced that what Linux does is the one
   and only and best way.
 
  Understand it. But I think that usage is good, in terms of supporting future
 intel processors (e.g. the hardware controlled P-states on Skylake+). The
 {min,max}_pct needs to be exposed to users to set the limits.
 
 How will this affect AMD processors which can use the cpufreq? Would the
 ondemand feature go away?

No, this won't affect them. When the intel_pstate=disable is added to the 
booting parameter list, the old cpufreq driver will be used, and everything, 
including xenpm, will work in the old style.

The new driver, intel_pstate, actually works in a mode similar to the ondemand 
(probably can be called enhanced ondemand - the user can set a limit range for 
the ondemand).

Best,
Wei

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH 2/3] x86/hvm: introduce functions for HVMOP_get/set_param allowance checks

2015-04-24 Thread Paul Durrant

Some parameters can only (validly) be set once. Some cannot be set
by a guest for its own domain. Consolidate these checks, along with
the XSM check, in a new hvm_allow_set_param() function for clarity.

Also, introduce hvm_allow_get_param() for similar reasons.

Signed-off-by: Paul Durrant paul.durr...@citrix.com
Cc: Keir Fraser k...@xen.org
Cc: Jan Beulich jbeul...@suse.com
Cc: Andrew Cooper andrew.coop...@citrix.com
---
 xen/arch/x86/hvm/hvm.c |  141 +++-
 1 file changed, 79 insertions(+), 62 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 87c47b1..23c604d 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -5640,6 +5640,57 @@ static int hvmop_set_evtchn_upcall_vector(
 return 0;
 }
 
+static int hvm_allow_set_param(struct domain *d,
+   struct xen_hvm_param *a)
+{
+uint64_t value = d-arch.hvm_domain.params[a-index];
+int rc = 0;
+
+rc = xsm_hvm_param(XSM_TARGET, d, HVMOP_set_param);
+if ( rc )
+return rc;
+
+/* The following parameters cannot be set by the guest */
+switch (a-index)
+{
+case HVM_PARAM_VIRIDIAN:
+case HVM_PARAM_IDENT_PT:
+case HVM_PARAM_DM_DOMAIN:
+case HVM_PARAM_ACPI_S_STATE:
+case HVM_PARAM_MEMORY_EVENT_CR0:
+case HVM_PARAM_MEMORY_EVENT_CR3:
+case HVM_PARAM_MEMORY_EVENT_CR4:
+case HVM_PARAM_MEMORY_EVENT_INT3:
+case HVM_PARAM_MEMORY_EVENT_SINGLE_STEP:
+case HVM_PARAM_MEMORY_EVENT_MSR:
+case HVM_PARAM_IOREQ_SERVER_PFN:
+case HVM_PARAM_NR_IOREQ_SERVER_PAGES:
+if ( d == current-domain )
+rc = -EPERM;
+break;
+default:
+break;
+}
+
+if ( rc )
+return rc;
+
+/* The following parameters can only be changed once */
+switch (a-index)
+{
+case HVM_PARAM_VIRIDIAN:
+case HVM_PARAM_IOREQ_SERVER_PFN:
+case HVM_PARAM_NR_IOREQ_SERVER_PAGES:
+if (value != 0  a-value != value)
+rc = -EEXIST;
+break;
+default:
+break;
+}
+
+return rc;
+}
+
 static int hvmop_set_param(
 XEN_GUEST_HANDLE_PARAM(xen_hvm_param_t) arg)
 {
@@ -5666,7 +5717,7 @@ static int hvmop_set_param(
   (a.index != HVM_PARAM_CALLBACK_IRQ) )
 goto out;
 
-rc = xsm_hvm_param(XSM_TARGET, d, HVMOP_set_param);
+rc = hvm_allow_set_param(d, a);
 if ( rc )
 goto out;
 
@@ -5681,31 +5732,11 @@ static int hvmop_set_param(
 rc = -EINVAL;
 break;
 case HVM_PARAM_VIRIDIAN:
-/* This should only ever be set once by the tools and read by the 
guest. */
-rc = -EPERM;
-if ( d == current-domain )
-break;
-
-if ( a.value != d-arch.hvm_domain.params[a.index] )
-{
-rc = -EEXIST;
-if ( d-arch.hvm_domain.params[a.index] != 0 )
-break;
-
+if ( (a.value  ~HVMPV_feature_mask) ||
+ !(a.value  HVMPV_base_freq) )
 rc = -EINVAL;
-if ( (a.value  ~HVMPV_feature_mask) ||
- !(a.value  HVMPV_base_freq) )
-break;
-}
-
-rc = 0;
 break;
 case HVM_PARAM_IDENT_PT:
-/* Not reflexive, as we must domain_pause(). */
-rc = -EPERM;
-if ( d == current-domain )
-break;
-
 rc = -EINVAL;
 if ( d-arch.hvm_domain.params[a.index] != 0 )
 break;
@@ -5733,22 +5764,12 @@ static int hvmop_set_param(
 domctl_lock_release();
 break;
 case HVM_PARAM_DM_DOMAIN:
-/* Not reflexive, as we may need to domain_pause(). */
-rc = -EPERM;
-if ( d == current-domain )
-break;
-
 if ( a.value == DOMID_SELF )
 a.value = current-domain-domain_id;
 
 rc = hvm_set_dm_domain(d, a.value);
 break;
 case HVM_PARAM_ACPI_S_STATE:
-/* Not reflexive, as we must domain_pause(). */
-rc = -EPERM;
-if ( current-domain == d )
-break;
-
 rc = 0;
 if ( a.value == 3 )
 hvm_s3_suspend(d);
@@ -5761,20 +5782,9 @@ static int hvmop_set_param(
 case HVM_PARAM_ACPI_IOPORTS_LOCATION:
 rc = pmtimer_change_ioport(d, a.value);
 break;
-case HVM_PARAM_MEMORY_EVENT_CR0:
-case HVM_PARAM_MEMORY_EVENT_CR3:
-case HVM_PARAM_MEMORY_EVENT_CR4:
-if ( d == current-domain )
-rc = -EPERM;
-break;
 case HVM_PARAM_MEMORY_EVENT_INT3:
 case HVM_PARAM_MEMORY_EVENT_SINGLE_STEP:
 case HVM_PARAM_MEMORY_EVENT_MSR:
-if ( d == current-domain )
-{
-rc = -EPERM;
-break;
-}
 if ( a.value  HVMPME_onchangeonly )
 rc = -EINVAL;
 break;
@@ -5807,22 +5817,12 @@ static int hvmop_set_param(
 rc = -EINVAL;
 break;
 case HVM_PARAM_IOREQ_SERVER_PFN:
-if ( d == current-domain )
-{
-

Re: [Xen-devel] [PATCH 1/3] x86/hvm: give HVMOP_set_param and HVMOP_get_param their own functions

2015-04-24 Thread Andrew Cooper

On 24/04/15 17:21, Paul Durrant wrote:
 The level of switch nesting in those ops is getting unreadable. Giving
 them their own functions does introduce some code duplication in the
 the pre-op checks but the overall result is easier to follow.

 This patch is code movement. There is no functional change.

 Signed-off-by: Paul Durrant paul.durr...@citrix.com
 Cc: Keir Fraser k...@xen.org
 Cc: Jan Beulich jbeul...@suse.com
 Cc: Andrew Cooper andrew.coop...@citrix.com

Please latch current at the top of the function, and fix Xen style
during motion. (newlines between break and case lines, drop superfluous
braces, brace layout around get bufiorq_evtchn).

With that fixed, Reviewed-by: Andrew Cooper andrew.coop...@citrix.com

 ---
  xen/arch/x86/hvm/hvm.c |  562 
 ++--
  1 file changed, 300 insertions(+), 262 deletions(-)

 diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
 index 3c62b80..87c47b1 100644
 --- a/xen/arch/x86/hvm/hvm.c
 +++ b/xen/arch/x86/hvm/hvm.c
 @@ -5640,6 +5640,300 @@ static int hvmop_set_evtchn_upcall_vector(
  return 0;
  }
  
 +static int hvmop_set_param(
 +XEN_GUEST_HANDLE_PARAM(xen_hvm_param_t) arg)
 +{
 +struct xen_hvm_param a;
 +struct domain *d;
 +struct vcpu *v;
 +int rc = 0;
 +
 +if ( copy_from_guest(a, arg, 1) )
 +return -EFAULT;
 +
 +if ( a.index = HVM_NR_PARAMS )
 +return -EINVAL;
 +
 +d = rcu_lock_domain_by_any_id(a.domid);
 +if ( d == NULL )
 +return -ESRCH;
 +
 +rc = -EINVAL;
 +if ( !has_hvm_container_domain(d) )
 +goto out;
 +
 +if ( is_pvh_domain(d)
 +  (a.index != HVM_PARAM_CALLBACK_IRQ) )
 +goto out;
 +
 +rc = xsm_hvm_param(XSM_TARGET, d, HVMOP_set_param);
 +if ( rc )
 +goto out;
 +
 +switch ( a.index )
 +{
 +case HVM_PARAM_CALLBACK_IRQ:
 +hvm_set_callback_via(d, a.value);
 +hvm_latch_shinfo_size(d);
 +break;
 +case HVM_PARAM_TIMER_MODE:
 +if ( a.value  HVMPTM_one_missed_tick_pending )
 +rc = -EINVAL;
 +break;
 +case HVM_PARAM_VIRIDIAN:
 +/* This should only ever be set once by the tools and read by the 
 guest. */
 +rc = -EPERM;
 +if ( d == current-domain )
 +break;
 +
 +if ( a.value != d-arch.hvm_domain.params[a.index] )
 +{
 +rc = -EEXIST;
 +if ( d-arch.hvm_domain.params[a.index] != 0 )
 +break;
 +
 +rc = -EINVAL;
 +if ( (a.value  ~HVMPV_feature_mask) ||
 + !(a.value  HVMPV_base_freq) )
 +break;
 +}
 +
 +rc = 0;
 +break;
 +case HVM_PARAM_IDENT_PT:
 +/* Not reflexive, as we must domain_pause(). */
 +rc = -EPERM;
 +if ( d == current-domain )
 +break;
 +
 +rc = -EINVAL;
 +if ( d-arch.hvm_domain.params[a.index] != 0 )
 +break;
 +
 +rc = 0;
 +if ( !paging_mode_hap(d) )
 +break;
 +
 +/*
 + * Update GUEST_CR3 in each VMCS to point at identity map.
 + * All foreign updates to guest state must synchronise on
 + * the domctl_lock.
 + */
 +rc = -ERESTART;
 +if ( !domctl_lock_acquire() )
 +break;
 +
 +rc = 0;
 +domain_pause(d);
 +d-arch.hvm_domain.params[a.index] = a.value;
 +for_each_vcpu ( d, v )
 +paging_update_cr3(v);
 +domain_unpause(d);
 +
 +domctl_lock_release();
 +break;
 +case HVM_PARAM_DM_DOMAIN:
 +/* Not reflexive, as we may need to domain_pause(). */
 +rc = -EPERM;
 +if ( d == current-domain )
 +break;
 +
 +if ( a.value == DOMID_SELF )
 +a.value = current-domain-domain_id;
 +
 +rc = hvm_set_dm_domain(d, a.value);
 +break;
 +case HVM_PARAM_ACPI_S_STATE:
 +/* Not reflexive, as we must domain_pause(). */
 +rc = -EPERM;
 +if ( current-domain == d )
 +break;
 +
 +rc = 0;
 +if ( a.value == 3 )
 +hvm_s3_suspend(d);
 +else if ( a.value == 0 )
 +hvm_s3_resume(d);
 +else
 +rc = -EINVAL;
 +
 +break;
 +case HVM_PARAM_ACPI_IOPORTS_LOCATION:
 +rc = pmtimer_change_ioport(d, a.value);
 +break;
 +case HVM_PARAM_MEMORY_EVENT_CR0:
 +case HVM_PARAM_MEMORY_EVENT_CR3:
 +case HVM_PARAM_MEMORY_EVENT_CR4:
 +if ( d == current-domain )
 +rc = -EPERM;
 +break;
 +case HVM_PARAM_MEMORY_EVENT_INT3:
 +case HVM_PARAM_MEMORY_EVENT_SINGLE_STEP:
 +case HVM_PARAM_MEMORY_EVENT_MSR:
 +if ( d == current-domain )
 +{
 +rc = -EPERM;
 +break;
 +}
 +if ( a.value  HVMPME_onchangeonly )
 +rc = -EINVAL;
 +break;
 +case HVM_PARAM_NESTEDHVM:
 +

Re: [Xen-devel] [PATCH 0/2] x86/hvm: HVMOP_get/set_param improvements

2015-04-24 Thread Paul Durrant

The subject line should obviously be prefixed [PATCH 0/3], but the content is 
at least correct.

Apologies for that,

  Paul

 -Original Message-
 From: Paul Durrant [mailto:paul.durr...@citrix.com]
 Sent: 24 April 2015 17:21
 To: xen-de...@lists.xenproject.org
 Cc: Paul Durrant
 Subject: [PATCH 0/2] x86/hvm: HVMOP_get/set_param improvements
 
 The following 3 patches re-structure the code implementing
 HVMOP_set_param
 and HVMOP_get_param.
 
 Patch #1 gives each operation its own function
 
 Patch #2 splits out checks for getting/setting non-reflexive params and
 setting params with change-once semantics, as well as the XSM check into
 separate functions
 
 Patch #3 adds all remaining ioreq server params to non-reflexive lists

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH 1/3] x86/hvm: give HVMOP_set_param and HVMOP_get_param their own functions

2015-04-24 Thread Paul Durrant

The level of switch nesting in those ops is getting unreadable. Giving
them their own functions does introduce some code duplication in the
the pre-op checks but the overall result is easier to follow.

This patch is code movement. There is no functional change.

Signed-off-by: Paul Durrant paul.durr...@citrix.com
Cc: Keir Fraser k...@xen.org
Cc: Jan Beulich jbeul...@suse.com
Cc: Andrew Cooper andrew.coop...@citrix.com
---
 xen/arch/x86/hvm/hvm.c |  562 ++--
 1 file changed, 300 insertions(+), 262 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 3c62b80..87c47b1 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -5640,6 +5640,300 @@ static int hvmop_set_evtchn_upcall_vector(
 return 0;
 }
 
+static int hvmop_set_param(
+XEN_GUEST_HANDLE_PARAM(xen_hvm_param_t) arg)
+{
+struct xen_hvm_param a;
+struct domain *d;
+struct vcpu *v;
+int rc = 0;
+
+if ( copy_from_guest(a, arg, 1) )
+return -EFAULT;
+
+if ( a.index = HVM_NR_PARAMS )
+return -EINVAL;
+
+d = rcu_lock_domain_by_any_id(a.domid);
+if ( d == NULL )
+return -ESRCH;
+
+rc = -EINVAL;
+if ( !has_hvm_container_domain(d) )
+goto out;
+
+if ( is_pvh_domain(d)
+  (a.index != HVM_PARAM_CALLBACK_IRQ) )
+goto out;
+
+rc = xsm_hvm_param(XSM_TARGET, d, HVMOP_set_param);
+if ( rc )
+goto out;
+
+switch ( a.index )
+{
+case HVM_PARAM_CALLBACK_IRQ:
+hvm_set_callback_via(d, a.value);
+hvm_latch_shinfo_size(d);
+break;
+case HVM_PARAM_TIMER_MODE:
+if ( a.value  HVMPTM_one_missed_tick_pending )
+rc = -EINVAL;
+break;
+case HVM_PARAM_VIRIDIAN:
+/* This should only ever be set once by the tools and read by the 
guest. */
+rc = -EPERM;
+if ( d == current-domain )
+break;
+
+if ( a.value != d-arch.hvm_domain.params[a.index] )
+{
+rc = -EEXIST;
+if ( d-arch.hvm_domain.params[a.index] != 0 )
+break;
+
+rc = -EINVAL;
+if ( (a.value  ~HVMPV_feature_mask) ||
+ !(a.value  HVMPV_base_freq) )
+break;
+}
+
+rc = 0;
+break;
+case HVM_PARAM_IDENT_PT:
+/* Not reflexive, as we must domain_pause(). */
+rc = -EPERM;
+if ( d == current-domain )
+break;
+
+rc = -EINVAL;
+if ( d-arch.hvm_domain.params[a.index] != 0 )
+break;
+
+rc = 0;
+if ( !paging_mode_hap(d) )
+break;
+
+/*
+ * Update GUEST_CR3 in each VMCS to point at identity map.
+ * All foreign updates to guest state must synchronise on
+ * the domctl_lock.
+ */
+rc = -ERESTART;
+if ( !domctl_lock_acquire() )
+break;
+
+rc = 0;
+domain_pause(d);
+d-arch.hvm_domain.params[a.index] = a.value;
+for_each_vcpu ( d, v )
+paging_update_cr3(v);
+domain_unpause(d);
+
+domctl_lock_release();
+break;
+case HVM_PARAM_DM_DOMAIN:
+/* Not reflexive, as we may need to domain_pause(). */
+rc = -EPERM;
+if ( d == current-domain )
+break;
+
+if ( a.value == DOMID_SELF )
+a.value = current-domain-domain_id;
+
+rc = hvm_set_dm_domain(d, a.value);
+break;
+case HVM_PARAM_ACPI_S_STATE:
+/* Not reflexive, as we must domain_pause(). */
+rc = -EPERM;
+if ( current-domain == d )
+break;
+
+rc = 0;
+if ( a.value == 3 )
+hvm_s3_suspend(d);
+else if ( a.value == 0 )
+hvm_s3_resume(d);
+else
+rc = -EINVAL;
+
+break;
+case HVM_PARAM_ACPI_IOPORTS_LOCATION:
+rc = pmtimer_change_ioport(d, a.value);
+break;
+case HVM_PARAM_MEMORY_EVENT_CR0:
+case HVM_PARAM_MEMORY_EVENT_CR3:
+case HVM_PARAM_MEMORY_EVENT_CR4:
+if ( d == current-domain )
+rc = -EPERM;
+break;
+case HVM_PARAM_MEMORY_EVENT_INT3:
+case HVM_PARAM_MEMORY_EVENT_SINGLE_STEP:
+case HVM_PARAM_MEMORY_EVENT_MSR:
+if ( d == current-domain )
+{
+rc = -EPERM;
+break;
+}
+if ( a.value  HVMPME_onchangeonly )
+rc = -EINVAL;
+break;
+case HVM_PARAM_NESTEDHVM:
+rc = xsm_hvm_param_nested(XSM_PRIV, d);
+if ( rc )
+break;
+if ( a.value  1 )
+rc = -EINVAL;
+/* Remove the check below once we have
+ * shadow-on-shadow.
+ */
+if ( cpu_has_svm  !paging_mode_hap(d)  a.value )
+rc = -EINVAL;
+/* Set up NHVM state for any vcpus that are already up */
+if ( a.value 
+

[Xen-devel] [PATCH 3/3] x86/hvm: disallow guest get and set of all ioreq server HVM params

2015-04-24 Thread Paul Durrant

A guest has no need to touch these parameters and reading
HVM_PARAM_IOREQ_PFN, HVM_PARAM_BUFIOREQ_PFN, or HVM_PARAM_BUFIOREQ_EVTCHN
may cause Xen to create a default ioreq server where one did not already
exist.

Signed-off-by: Paul Durrant paul.durr...@citrix.com
Cc: Keir Fraser k...@xen.org
Cc: Jan Beulich jbeul...@suse.com
Cc: Andrew Cooper andrew.coop...@citrix.com
---
 xen/arch/x86/hvm/hvm.c |7 +++
 1 file changed, 7 insertions(+)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 23c604d..b51c1d5 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -5665,6 +5665,9 @@ static int hvm_allow_set_param(struct domain *d,
 case HVM_PARAM_MEMORY_EVENT_MSR:
 case HVM_PARAM_IOREQ_SERVER_PFN:
 case HVM_PARAM_NR_IOREQ_SERVER_PAGES:
+case HVM_PARAM_IOREQ_PFN:
+case HVM_PARAM_BUFIOREQ_PFN:
+case HVM_PARAM_BUFIOREQ_EVTCHN:
 if ( d == current-domain )
 rc = -EPERM;
 break;
@@ -5880,6 +5883,10 @@ static int hvm_allow_get_param(struct domain *d,
 {
 case HVM_PARAM_IOREQ_SERVER_PFN:
 case HVM_PARAM_NR_IOREQ_SERVER_PAGES:
+case HVM_PARAM_IOREQ_PFN:
+case HVM_PARAM_BUFIOREQ_PFN:
+case HVM_PARAM_BUFIOREQ_EVTCHN:
+case HVM_PARAM_DM_DOMAIN:
 if ( d == current-domain )
 rc = -EPERM;
 break;
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] Archiving Xen on ARM and PVOPS subprojects

2015-04-24 Thread Lars Kurth


 On 24 Apr 2015, at 16:22, Lars Kurth lars.kurth@gmail.com wrote:
 
 
 On 22 Apr 2015, at 14:04, Ian Jackson ian.jack...@eu.citrix.com wrote:
 
 Lars Kurth writes (Re: Archiving Xen on ARM and PVOPS subprojects):
 Any other votes by committers (in the TO list) before I tally the votes?
 
 +1 to both.
 
 We had several +1's and no -1's. So this vote carries
 Lars
 

Just as a note, I removed the PVOPS and ARM pages from the Developer menu and 
http://www.xenproject.org/developers/teams.html and merged relevant content 
into http://www.xenproject.org/developers/teams/hypervisor.html 

I will do the PVOPS wiki stuff at some point when back from Shanghai

Regards
Lars
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 2/3] x86/hvm: introduce functions for HVMOP_get/set_param allowance checks

2015-04-24 Thread Andrew Cooper

On 24/04/15 17:21, Paul Durrant wrote:
 Some parameters can only (validly) be set once. Some cannot be set
 by a guest for its own domain. Consolidate these checks, along with
 the XSM check, in a new hvm_allow_set_param() function for clarity.

 Also, introduce hvm_allow_get_param() for similar reasons.

 Signed-off-by: Paul Durrant paul.durr...@citrix.com
 Cc: Keir Fraser k...@xen.org
 Cc: Jan Beulich jbeul...@suse.com
 Cc: Andrew Cooper andrew.coop...@citrix.com
 ---
  xen/arch/x86/hvm/hvm.c |  141 
 +++-
  1 file changed, 79 insertions(+), 62 deletions(-)

 diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
 index 87c47b1..23c604d 100644
 --- a/xen/arch/x86/hvm/hvm.c
 +++ b/xen/arch/x86/hvm/hvm.c
 @@ -5640,6 +5640,57 @@ static int hvmop_set_evtchn_upcall_vector(
  return 0;
  }
  
 +static int hvm_allow_set_param(struct domain *d,
 +   struct xen_hvm_param *a)

const

 +{
 +uint64_t value = d-arch.hvm_domain.params[a-index];
 +int rc = 0;

No need for the 0 initialiser (and I know Coverity will complain about it).

 +
 +rc = xsm_hvm_param(XSM_TARGET, d, HVMOP_set_param);
 +if ( rc )
 +return rc;
 +
 +/* The following parameters cannot be set by the guest */
 +switch (a-index)

style

 +{
 +case HVM_PARAM_VIRIDIAN:
 +case HVM_PARAM_IDENT_PT:
 +case HVM_PARAM_DM_DOMAIN:
 +case HVM_PARAM_ACPI_S_STATE:
 +case HVM_PARAM_MEMORY_EVENT_CR0:
 +case HVM_PARAM_MEMORY_EVENT_CR3:
 +case HVM_PARAM_MEMORY_EVENT_CR4:
 +case HVM_PARAM_MEMORY_EVENT_INT3:
 +case HVM_PARAM_MEMORY_EVENT_SINGLE_STEP:
 +case HVM_PARAM_MEMORY_EVENT_MSR:
 +case HVM_PARAM_IOREQ_SERVER_PFN:
 +case HVM_PARAM_NR_IOREQ_SERVER_PAGES:
 +if ( d == current-domain )
 +rc = -EPERM;
 +break;
 +default:
 +break;

Please can we turn the guest-settable list into a whitelist rather than
a blacklist.  We have had several XSAs in the past where a guest has
been able to modify a value it shouldn't have because new params were
default writable by everyone.

 +}
 +
 +if ( rc )
 +return rc;
 +
 +/* The following parameters can only be changed once */
 +switch (a-index)
 +{
 +case HVM_PARAM_VIRIDIAN:
 +case HVM_PARAM_IOREQ_SERVER_PFN:
 +case HVM_PARAM_NR_IOREQ_SERVER_PAGES:
 +if (value != 0  a-value != value)
 +rc = -EEXIST;
 +break;
 +default:
 +break;
 +}
 +
 +return rc;
 +}
 +
  static int hvmop_set_param(
  XEN_GUEST_HANDLE_PARAM(xen_hvm_param_t) arg)
  {
 @@ -5666,7 +5717,7 @@ static int hvmop_set_param(
(a.index != HVM_PARAM_CALLBACK_IRQ) )
  goto out;
  
 -rc = xsm_hvm_param(XSM_TARGET, d, HVMOP_set_param);
 +rc = hvm_allow_set_param(d, a);
  if ( rc )
  goto out;
  
 @@ -5681,31 +5732,11 @@ static int hvmop_set_param(
  rc = -EINVAL;
  break;
  case HVM_PARAM_VIRIDIAN:
 -/* This should only ever be set once by the tools and read by the 
 guest. */
 -rc = -EPERM;
 -if ( d == current-domain )
 -break;
 -
 -if ( a.value != d-arch.hvm_domain.params[a.index] )
 -{
 -rc = -EEXIST;
 -if ( d-arch.hvm_domain.params[a.index] != 0 )
 -break;
 -
 +if ( (a.value  ~HVMPV_feature_mask) ||
 + !(a.value  HVMPV_base_freq) )
  rc = -EINVAL;
 -if ( (a.value  ~HVMPV_feature_mask) ||
 - !(a.value  HVMPV_base_freq) )
 -break;
 -}
 -
 -rc = 0;
  break;
  case HVM_PARAM_IDENT_PT:
 -/* Not reflexive, as we must domain_pause(). */
 -rc = -EPERM;
 -if ( d == current-domain )
 -break;
 -
  rc = -EINVAL;
  if ( d-arch.hvm_domain.params[a.index] != 0 )
  break;
 @@ -5733,22 +5764,12 @@ static int hvmop_set_param(
  domctl_lock_release();
  break;
  case HVM_PARAM_DM_DOMAIN:
 -/* Not reflexive, as we may need to domain_pause(). */
 -rc = -EPERM;
 -if ( d == current-domain )
 -break;
 -
  if ( a.value == DOMID_SELF )
  a.value = current-domain-domain_id;
  
  rc = hvm_set_dm_domain(d, a.value);
  break;
  case HVM_PARAM_ACPI_S_STATE:
 -/* Not reflexive, as we must domain_pause(). */
 -rc = -EPERM;
 -if ( current-domain == d )
 -break;
 -
  rc = 0;
  if ( a.value == 3 )
  hvm_s3_suspend(d);
 @@ -5761,20 +5782,9 @@ static int hvmop_set_param(
  case HVM_PARAM_ACPI_IOPORTS_LOCATION:
  rc = pmtimer_change_ioport(d, a.value);
  break;
 -case HVM_PARAM_MEMORY_EVENT_CR0:
 -case HVM_PARAM_MEMORY_EVENT_CR3:
 -case HVM_PARAM_MEMORY_EVENT_CR4:
 -if ( d == current-domain )
 -rc =

[Xen-devel] [seabios test] 51929: regressions - FAIL

2015-04-24 Thread osstest service user

flight 51929 seabios real [real]
http://logs.test-lab.xenproject.org/osstest/logs/51929/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-i386-freebsd10-amd64 15 guest-localmigrate.2   fail REGR. vs. 36656
 test-amd64-i386-freebsd10-i386 16 guest-localmigrate/x10  fail REGR. vs. 36656
 test-amd64-i386-pair   21 guest-migrate/src_host/dst_host fail REGR. vs. 36656
 test-amd64-amd64-xl-win7-amd64  9 windows-install fail REGR. vs. 36656
 test-amd64-i386-xl-qemuu-win7-amd64  9 windows-installfail REGR. vs. 36656

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemut-debianhvm-amd64-xsm 9 debian-hvm-install fail never 
pass
 test-amd64-amd64-libvirt-xsm 11 guest-start  fail   never pass
 test-amd64-i386-xl-qemuu-debianhvm-amd64-xsm 9 debian-hvm-install fail never 
pass
 test-amd64-amd64-xl-pvh-intel 11 guest-start  fail  never pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64-xsm 9 debian-hvm-install fail never 
pass
 test-amd64-i386-xl-qemut-debianhvm-amd64-xsm 9 debian-hvm-install fail never 
pass
 test-amd64-amd64-xl-pvh-amd  11 guest-start  fail   never pass
 test-amd64-i386-xl-xsm   11 guest-start  fail   never pass
 test-amd64-amd64-xl-xsm  11 guest-start  fail   never pass
 test-amd64-amd64-libvirt 12 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  12 migrate-support-checkfail   never pass
 test-amd64-i386-xl-qemut-winxpsp3-vcpus1 16 guest-stop fail never pass
 test-amd64-i386-xl-win7-amd64 16 guest-stop   fail  never pass
 test-amd64-amd64-xl-qemuu-win7-amd64 16 guest-stop fail never pass
 test-amd64-i386-xl-qemut-win7-amd64 16 guest-stop  fail never pass
 test-amd64-i386-xl-qemut-winxpsp3 16 guest-stopfail never pass
 test-amd64-i386-xl-qemuu-winxpsp3 16 guest-stopfail never pass
 test-amd64-amd64-xl-qemut-win7-amd64 16 guest-stop fail never pass
 test-amd64-amd64-xl-qemut-winxpsp3 16 guest-stop   fail never pass
 test-amd64-amd64-xl-qemuu-winxpsp3 16 guest-stop   fail never pass
 test-amd64-i386-libvirt-xsm  11 guest-start  fail   never pass
 test-amd64-i386-xl-winxpsp3-vcpus1 16 guest-stop   fail never pass
 test-amd64-i386-xl-qemuu-winxpsp3-vcpus1 16 guest-stop fail never pass
 test-amd64-amd64-xl-winxpsp3 16 guest-stop   fail   never pass
 test-amd64-i386-xl-winxpsp3  16 guest-stop   fail   never pass

version targeted for testing:
 seabios  0b2165d191f51ed2a522142c3ed3db55bc852627
baseline version:
 seabios  b4581224824871ad2909f84fc4a9e067cda663f2


People who touched revisions under test:
  Bruce Rogers brog...@suse.com
  Kevin O'Connor ke...@koconnor.net


jobs:
 build-amd64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl  pass
 test-amd64-i386-xl   pass
 test-amd64-amd64-xl-qemut-debianhvm-amd64-xsmfail
 test-amd64-i386-xl-qemut-debianhvm-amd64-xsm fail
 test-amd64-amd64-xl-qemuu-debianhvm-amd64-xsmfail
 test-amd64-i386-xl-qemuu-debianhvm-amd64-xsm fail
 test-amd64-amd64-libvirt-xsm fail
 test-amd64-i386-libvirt-xsm  fail
 test-amd64-amd64-xl-xsm  fail
 test-amd64-i386-xl-xsm   fail
 test-amd64-amd64-xl-pvh-amd  fail
 test-amd64-i386-rhel6hvm-amd pass
 test-amd64-i386-qemut-rhel6hvm-amd   pass
 test-amd64-i386-qemuu-rhel6hvm-amd   pass
 test-amd64-amd64-xl-qemut-debianhvm-amd64pass
 test-amd64-i386-xl-qemut-debianhvm-amd64 pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-i386-xl-qemuu-debianhvm-amd64 pass
 test-amd64-i386-freebsd10-amd64  fail

Re: [Xen-devel] [PATCH v2] xen: Add __GFP_DMA flag when xen_swiotlb_init gets free pages on ARM

2015-04-24 Thread Stefano Stabellini

On Fri, 24 Apr 2015, Konrad Rzeszutek Wilk wrote:
 On Fri, Apr 24, 2015 at 10:16:40AM +0100, Stefano Stabellini wrote:
  Make sure that xen_swiotlb_init allocates buffers that are DMA capable
  when at least one memblock is available below 4G. Otherwise we assume
  that all devices on the SoC can cope with 4G addresses. We do this on
  ARM and ARM64, where dom0 is mapped 1:1, so pfn == mfn in this case.
  
  No functional changes on x86.
  
  From: Chen Baozi baoz...@gmail.com
  
  Signed-off-by: Chen Baozi baoz...@gmail.com
  Signed-off-by: Stefano Stabellini stefano.stabell...@eu.citrix.com
  Tested-by: Chen Baozi baoz...@gmail.com
 
 Acked-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com

Thanks! We are still early in the release cycle, should I add it to
xentip/stable/for-linus-4.1?


  diff --git a/arch/arm/include/asm/xen/page.h 
  b/arch/arm/include/asm/xen/page.h
  index 2f7e6ff..0b579b2 100644
  --- a/arch/arm/include/asm/xen/page.h
  +++ b/arch/arm/include/asm/xen/page.h
  @@ -110,5 +110,6 @@ static inline bool set_phys_to_machine(unsigned long 
  pfn, unsigned long mfn)
   bool xen_arch_need_swiotlb(struct device *dev,
 unsigned long pfn,
 unsigned long mfn);
  +unsigned long xen_get_swiotlb_free_pages(unsigned int order);
   
   #endif /* _ASM_ARM_XEN_PAGE_H */
  diff --git a/arch/arm/xen/mm.c b/arch/arm/xen/mm.c
  index 793551d..4983250 100644
  --- a/arch/arm/xen/mm.c
  +++ b/arch/arm/xen/mm.c
  @@ -4,6 +4,7 @@
   #include linux/gfp.h
   #include linux/highmem.h
   #include linux/export.h
  +#include linux/memblock.h
   #include linux/of_address.h
   #include linux/slab.h
   #include linux/types.h
  @@ -21,6 +22,20 @@
   #include asm/xen/hypercall.h
   #include asm/xen/interface.h
   
  +unsigned long xen_get_swiotlb_free_pages(unsigned int order)
  +{
  +   struct memblock_region *reg;
  +   gfp_t flags = __GFP_NOWARN;
  +
  +   for_each_memblock(memory, reg) {
  +   if (reg-base  (phys_addr_t)0x) {
  +   flags |= __GFP_DMA;
  +   break;
  +   }
  +   }
  +   return __get_free_pages(flags, order);
  +}
  +
   enum dma_cache_op {
  DMA_UNMAP,
  DMA_MAP,
  diff --git a/arch/x86/include/asm/xen/page.h 
  b/arch/x86/include/asm/xen/page.h
  index 358dcd3..c44a5d5 100644
  --- a/arch/x86/include/asm/xen/page.h
  +++ b/arch/x86/include/asm/xen/page.h
  @@ -269,4 +269,9 @@ static inline bool xen_arch_need_swiotlb(struct device 
  *dev,
  return false;
   }
   
  +static inline unsigned long xen_get_swiotlb_free_pages(unsigned int order)
  +{
  +   return __get_free_pages(__GFP_NOWARN, order);
  +}
  +
   #endif /* _ASM_X86_XEN_PAGE_H */
  diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
  index 810ad41..4c54932 100644
  --- a/drivers/xen/swiotlb-xen.c
  +++ b/drivers/xen/swiotlb-xen.c
  @@ -235,7 +235,7 @@ retry:
   #define SLABS_PER_PAGE (1  (PAGE_SHIFT - IO_TLB_SHIFT))
   #define IO_TLB_MIN_SLABS ((120)  IO_TLB_SHIFT)
  while ((SLABS_PER_PAGE  order)  IO_TLB_MIN_SLABS) {
  -   xen_io_tlb_start = (void 
  *)__get_free_pages(__GFP_NOWARN, order);
  +   xen_io_tlb_start = (void 
  *)xen_get_swiotlb_free_pages(order);
  if (xen_io_tlb_start)
  break;
  order--;
 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v6 01/14] x86: add socket_to_cpumask

2015-04-24 Thread Jan Beulich

 On 23.04.15 at 11:55, chao.p.p...@linux.intel.com wrote:
 @@ -301,6 +304,8 @@ static void set_cpu_sibling_map(int cpu)
  }
  }
  }
 +
 +cpumask_set_cpu(cpu, socket_to_cpumask[cpu_to_socket(cpu)]);
  }

There is an early return path in this function, which you need to
deal with.

 @@ -717,6 +722,14 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
  
  stack_base[0] = stack_start;
  
 +nr_sockets = DIV_ROUND_UP(nr_cpu_ids, boot_cpu_data.x86_max_cores *
 +  boot_cpu_data.x86_num_siblings);

I think this is going to be problematic when there are more CPUs
in the system than Xen can (or is told to) handle.

 +ASSERT(nr_sockets  0);

Due to the DIV_ROUND_UP() I think this can never trigger.

 +socket_to_cpumask = xzalloc_array(cpumask_t, nr_sockets);

This is going to be very wasteful when Xen was built for very many
CPUs. I think the array should be of cpumask_var_t type, and the
array elements be initialized individually.

 --- a/xen/include/asm-x86/smp.h
 +++ b/xen/include/asm-x86/smp.h
 @@ -58,6 +58,14 @@ int hard_smp_processor_id(void);
  
  void __stop_this_cpu(void);
  
 +/*
 + * This value is considered to not change from the initial startup.
 + * Otherwise all the relevant places need to be retrofitted.
 + */
 +extern unsigned int nr_sockets;
 +
 +/* Representing HT and core siblings in each socket */
 +extern cpumask_t *socket_to_cpumask;
  #endif /* !__ASSEMBLY__ */

Please retain the blank line before the #endif.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] Archiving Xen on ARM and PVOPS subprojects

2015-04-24 Thread Lars Kurth


 On 22 Apr 2015, at 14:04, Ian Jackson ian.jack...@eu.citrix.com wrote:
 
 Lars Kurth writes (Re: Archiving Xen on ARM and PVOPS subprojects):
 Any other votes by committers (in the TO list) before I tally the votes?
 
 +1 to both.

We had several +1's and no -1's. So this vote carries
Lars


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v6 07/14] x86: dynamically get/set CBM for a domain

2015-04-24 Thread Jan Beulich

 On 23.04.15 at 11:55, chao.p.p...@linux.intel.com wrote:
 +int psr_set_l3_cbm(struct domain *d, unsigned int socket, uint64_t cbm)
 +{
 +unsigned int old_cos, cos;
 +struct psr_cat_cbm *map, *find;

I think the variable would better be named found and ...

 +struct psr_cat_socket_info *info;
 +int ret = get_cat_socket_info(socket, info);
 +
 +if ( ret )
 +return ret;
 +
 +if ( !psr_check_cbm(info-cbm_len, cbm) )
 +return -EINVAL;
 +
 +old_cos = d-arch.psr_cos_ids[socket];
 +map = info-cos_to_cbm;
 +find = NULL;

... NULL could be its initializer.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v6 07/14] x86: dynamically get/set CBM for a domain

2015-04-24 Thread Jan Beulich

 On 23.04.15 at 11:55, chao.p.p...@linux.intel.com wrote:
 Changes in v5:
 * Add spin_lock to protect cbm_map.
 * Add spin_lock to read path aswell.

I don't think the second bullet point is true - there's only a single
spin_lock() invocation being added here afaics.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3] x86: allow 64-bit PV guest kernels to suppress user mode exposure of M2P

2015-04-24 Thread Andrew Cooper

On 24/04/15 16:07, Jan Beulich wrote:
 On 24.04.15 at 16:57, andrew.coop...@citrix.com wrote:
 On 24/04/15 15:31, Jan Beulich wrote:
 Xen L4 entries being uniformly installed into any L4 table and 64-bit
 PV kernels running in ring 3 means that user mode was able to see the
 read-only M2P presented by Xen to the guests. While apparently not
 really representing an exploitable information leak, this still very
 certainly was never meant to be that way.

 Building on the fact that these guests already have separate kernel and
 user mode page tables we can allow guest kernels to tell Xen that they
 don't want user mode to see this table. We can't, however, do this by
 default: There is no ABI requirement that kernel and user mode page
 tables be separate. Therefore introduce a new VM-assist flag allowing
 the guest to control respective hypervisor behavior:
 - when not set, L4 tables get created with the respective slot blank,
   and whenever the L4 table gets used as a kernel one the missing
   mapping gets inserted,
 - when set, L4 tables get created with the respective slot initialized
   as before, and whenever the L4 table gets used as a user one the
   mapping gets zapped.
 Is this complete?
 Yes.

 For backwards compatibility, older kernels will not have m2p_strict set,
 and the m2p should unconditionally appear in all L4s.
 No. L4s _only_ used for user mode have no need for this entry to
 be non-zero.

There is only ever a single L4 in a particular virtual address space. 
The M2P is part of the Xen ABI range.

Previously, the M2P was present in all L4s which is why they leaked into
user context.

If we don not wish to break backwards compatibility with this change,
then in relaxed mode the M2P must remain in all tables, or Userspace
which is actually making use of mappings will suddenly start faulting
because of a change in Xen, not a kernel change (and an unknowing kernel
might not be prepared to handle this case).

By your description, in the relaxed case a newly created L4 which is
first used as user table will have the mapping clear.

Or am I missing something?

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 0/9] Porting the intel_pstate driver to Xen

2015-04-24 Thread Jan Beulich

 On 24.04.15 at 17:42, wei.w.w...@intel.com wrote:
 On 24/04/2015 23:04, Jan Beulich wrote
  On 24.04.15 at 16:56, wei.w.w...@intel.com wrote:
  On 24/04/2015 20:57, Jan Beulich wrote
  I'm not sure how else to express what I want (no matter how many
  internal governors the intel_pstate driver has).
 
  xenpm set-scaling-governor powersave
  xenpm set-scaling-governor ondemand
  xenpm set-scaling-governor performance
 
  each should switch the system into a respective state, no matter
  whether internally to the driver this means a change of governors or
  just a modification to {min,max}_pct.
 
  And obtaining the current state after any of the above should show
  the same governor in use that was set (and not internal), again no
  matter how this is being achieved internally to the driver.
 
  Thanks Jan, that's clear. But this will have another issue. For
  example, we set-scaling-governor to ondemand, then we adjust
  min_pct=max_pct = 60%. The timer function may generate results like
  35%, 55%, 45%..., but the CPU just keeps running with 60%.
 
 So I must be misunderstanding something then: How can the driver do
 anything at all when told to run the system at 60%?
 
 The {min,max}_pct is a limit.

That's clear. The question is whether these are hardware imposed limits
or set by the user. In the latter case ...

  Then, this is not ondemand at all (I think this should be another
  reason why the intel_pstate driver does not call its governor
  ondemand).
 
  The intel_pstate driver in the kernel has already got rid of the old
  governor convention. They let the user get what they want through
  simply adjusting the {min,max}_pct  (the {min,max}_pct actually limits
  how the performance is selected).
 
 Adjusting the values individually to me very much looks like the userspace
 governor.
 
 Yeah, that example was like userspace. Please take a look at this example: 
 [min_pct=60%, max_pct=80%], the timer generates 45%, 55%, 65%, 70%, 75%, 90%, 
 then the final target values will not be constant.

(the 45%, 55%, and 90% here shouldn't happen in any case)

 The ones (65%, 70%, 75%) 
 falling into the limit interval behaves like ondemand, others are not.

... restricting to these would be the job of the driver, and
- ondemand would be 0...100
- powersave would be 0
- performance would be 100
Everything else would be userspace.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH v16 04/14] qspinlock: Extract out code snippets for the next patch

2015-04-24 Thread Waiman Long

This is a preparatory patch that extracts out the following 2 code
snippets to prepare for the next performance optimization patch.

 1) the logic for the exchange of new and previous tail code words
into a new xchg_tail() function.
 2) the logic for clearing the pending bit and setting the locked bit
into a new clear_pending_set_locked() function.

This patch also simplifies the trylock operation before queuing by
calling queue_spin_trylock() directly.

Signed-off-by: Waiman Long waiman.l...@hp.com
Signed-off-by: Peter Zijlstra (Intel) pet...@infradead.org
---
 include/asm-generic/qspinlock_types.h |2 +
 kernel/locking/qspinlock.c|   79 -
 2 files changed, 50 insertions(+), 31 deletions(-)

diff --git a/include/asm-generic/qspinlock_types.h 
b/include/asm-generic/qspinlock_types.h
index 9c3f5c2..ef36613 100644
--- a/include/asm-generic/qspinlock_types.h
+++ b/include/asm-generic/qspinlock_types.h
@@ -58,6 +58,8 @@ typedef struct qspinlock {
 #define _Q_TAIL_CPU_BITS   (32 - _Q_TAIL_CPU_OFFSET)
 #define _Q_TAIL_CPU_MASK   _Q_SET_MASK(TAIL_CPU)
 
+#define _Q_TAIL_MASK   (_Q_TAIL_IDX_MASK | _Q_TAIL_CPU_MASK)
+
 #define _Q_LOCKED_VAL  (1U  _Q_LOCKED_OFFSET)
 #define _Q_PENDING_VAL (1U  _Q_PENDING_OFFSET)
 
diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c
index 0351f78..11f6ad9 100644
--- a/kernel/locking/qspinlock.c
+++ b/kernel/locking/qspinlock.c
@@ -97,6 +97,42 @@ static inline struct mcs_spinlock *decode_tail(u32 tail)
 #define _Q_LOCKED_PENDING_MASK (_Q_LOCKED_MASK | _Q_PENDING_MASK)
 
 /**
+ * clear_pending_set_locked - take ownership and clear the pending bit.
+ * @lock: Pointer to queue spinlock structure
+ *
+ * *,1,0 - *,0,1
+ */
+static __always_inline void clear_pending_set_locked(struct qspinlock *lock)
+{
+   atomic_add(-_Q_PENDING_VAL + _Q_LOCKED_VAL, lock-val);
+}
+
+/**
+ * xchg_tail - Put in the new queue tail code word  retrieve previous one
+ * @lock : Pointer to queue spinlock structure
+ * @tail : The new queue tail code word
+ * Return: The previous queue tail code word
+ *
+ * xchg(lock, tail)
+ *
+ * p,*,* - n,*,* ; prev = xchg(lock, node)
+ */
+static __always_inline u32 xchg_tail(struct qspinlock *lock, u32 tail)
+{
+   u32 old, new, val = atomic_read(lock-val);
+
+   for (;;) {
+   new = (val  _Q_LOCKED_PENDING_MASK) | tail;
+   old = atomic_cmpxchg(lock-val, val, new);
+   if (old == val)
+   break;
+
+   val = old;
+   }
+   return old;
+}
+
+/**
  * queue_spin_lock_slowpath - acquire the queue spinlock
  * @lock: Pointer to queue spinlock structure
  * @val: Current value of the queue spinlock 32-bit word
@@ -178,15 +214,7 @@ void queue_spin_lock_slowpath(struct qspinlock *lock, u32 
val)
 *
 * *,1,0 - *,0,1
 */
-   for (;;) {
-   new = (val  ~_Q_PENDING_MASK) | _Q_LOCKED_VAL;
-
-   old = atomic_cmpxchg(lock-val, val, new);
-   if (old == val)
-   break;
-
-   val = old;
-   }
+   clear_pending_set_locked(lock);
return;
 
/*
@@ -203,37 +231,26 @@ queue:
node-next = NULL;
 
/*
-* We have already touched the queueing cacheline; don't bother with
-* pending stuff.
-*
-* trylock || xchg(lock, node)
-*
-* 0,0,0 - 0,0,1 ; no tail, not locked - no tail, locked.
-* p,y,x - n,y,x ; tail was p - tail is n; preserving locked.
+* We touched a (possibly) cold cacheline in the per-cpu queue node;
+* attempt the trylock once more in the hope someone let go while we
+* weren't watching.
 */
-   for (;;) {
-   new = _Q_LOCKED_VAL;
-   if (val)
-   new = tail | (val  _Q_LOCKED_PENDING_MASK);
-
-   old = atomic_cmpxchg(lock-val, val, new);
-   if (old == val)
-   break;
-
-   val = old;
-   }
+   if (queue_spin_trylock(lock))
+   goto release;
 
/*
-* we won the trylock; forget about queueing.
+* We have already touched the queueing cacheline; don't bother with
+* pending stuff.
+*
+* p,*,* - n,*,*
 */
-   if (new == _Q_LOCKED_VAL)
-   goto release;
+   old = xchg_tail(lock, tail);
 
/*
 * if there was a previous node; link it and wait until reaching the
 * head of the waitqueue.
 */
-   if (old  ~_Q_LOCKED_PENDING_MASK) {
+   if (old  _Q_TAIL_MASK) {
prev = decode_tail(old);
WRITE_ONCE(prev-next, node);
 
-- 
1.7.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH v16 01/14] qspinlock: A simple generic 4-byte queue spinlock

2015-04-24 Thread Waiman Long

This patch introduces a new generic queue spinlock implementation that
can serve as an alternative to the default ticket spinlock. Compared
with the ticket spinlock, this queue spinlock should be almost as fair
as the ticket spinlock. It has about the same speed in single-thread
and it can be much faster in high contention situations especially when
the spinlock is embedded within the data structure to be protected.

Only in light to moderate contention where the average queue depth
is around 1-3 will this queue spinlock be potentially a bit slower
due to the higher slowpath overhead.

This queue spinlock is especially suit to NUMA machines with a large
number of cores as the chance of spinlock contention is much higher
in those machines. The cost of contention is also higher because of
slower inter-node memory traffic.

Due to the fact that spinlocks are acquired with preemption disabled,
the process will not be migrated to another CPU while it is trying
to get a spinlock. Ignoring interrupt handling, a CPU can only be
contending in one spinlock at any one time. Counting soft IRQ, hard
IRQ and NMI, a CPU can only have a maximum of 4 concurrent lock waiting
activities.  By allocating a set of per-cpu queue nodes and used them
to form a waiting queue, we can encode the queue node address into a
much smaller 24-bit size (including CPU number and queue node index)
leaving one byte for the lock.

Please note that the queue node is only needed when waiting for the
lock. Once the lock is acquired, the queue node can be released to
be used later.

Signed-off-by: Waiman Long waiman.l...@hp.com
Signed-off-by: Peter Zijlstra (Intel) pet...@infradead.org
---
 include/asm-generic/qspinlock.h   |  132 +
 include/asm-generic/qspinlock_types.h |   58 +
 kernel/Kconfig.locks  |7 +
 kernel/locking/Makefile   |1 +
 kernel/locking/mcs_spinlock.h |1 +
 kernel/locking/qspinlock.c|  209 +
 6 files changed, 408 insertions(+), 0 deletions(-)
 create mode 100644 include/asm-generic/qspinlock.h
 create mode 100644 include/asm-generic/qspinlock_types.h
 create mode 100644 kernel/locking/qspinlock.c

diff --git a/include/asm-generic/qspinlock.h b/include/asm-generic/qspinlock.h
new file mode 100644
index 000..315d6dc
--- /dev/null
+++ b/include/asm-generic/qspinlock.h
@@ -0,0 +1,132 @@
+/*
+ * Queue spinlock
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * (C) Copyright 2013-2015 Hewlett-Packard Development Company, L.P.
+ *
+ * Authors: Waiman Long waiman.l...@hp.com
+ */
+#ifndef __ASM_GENERIC_QSPINLOCK_H
+#define __ASM_GENERIC_QSPINLOCK_H
+
+#include asm-generic/qspinlock_types.h
+
+/**
+ * queue_spin_is_locked - is the spinlock locked?
+ * @lock: Pointer to queue spinlock structure
+ * Return: 1 if it is locked, 0 otherwise
+ */
+static __always_inline int queue_spin_is_locked(struct qspinlock *lock)
+{
+   return atomic_read(lock-val);
+}
+
+/**
+ * queue_spin_value_unlocked - is the spinlock structure unlocked?
+ * @lock: queue spinlock structure
+ * Return: 1 if it is unlocked, 0 otherwise
+ *
+ * N.B. Whenever there are tasks waiting for the lock, it is considered
+ *  locked wrt the lockref code to avoid lock stealing by the lockref
+ *  code and change things underneath the lock. This also allows some
+ *  optimizations to be applied without conflict with lockref.
+ */
+static __always_inline int queue_spin_value_unlocked(struct qspinlock lock)
+{
+   return !atomic_read(lock.val);
+}
+
+/**
+ * queue_spin_is_contended - check if the lock is contended
+ * @lock : Pointer to queue spinlock structure
+ * Return: 1 if lock contended, 0 otherwise
+ */
+static __always_inline int queue_spin_is_contended(struct qspinlock *lock)
+{
+   return atomic_read(lock-val)  ~_Q_LOCKED_MASK;
+}
+/**
+ * queue_spin_trylock - try to acquire the queue spinlock
+ * @lock : Pointer to queue spinlock structure
+ * Return: 1 if lock acquired, 0 if failed
+ */
+static __always_inline int queue_spin_trylock(struct qspinlock *lock)
+{
+   if (!atomic_read(lock-val) 
+  (atomic_cmpxchg(lock-val, 0, _Q_LOCKED_VAL) == 0))
+   return 1;
+   return 0;
+}
+
+extern void queue_spin_lock_slowpath(struct qspinlock *lock, u32 val);
+
+/**
+ * queue_spin_lock - acquire a queue spinlock
+ * @lock: Pointer to queue spinlock structure
+ */
+static __always_inline void queue_spin_lock(struct qspinlock *lock)
+{
+   u32

[Xen-devel] [PATCH v16 00/14] qspinlock: a 4-byte queue spinlock with PV support

2015-04-24 Thread Waiman Long

v15-v16:
 - Remove the lfsr patch and use linear probing as lfsr is not really
   necessary in most cases.
 - Move the paravirt PV_CALLEE_SAVE_REGS_THUNK code to an asm header.
 - Add a patch to collect PV qspinlock statistics which also
   supersedes the PV lock hash debug patch.
 - Add PV qspinlock performance numbers.

v14-v15:
 - Incorporate PeterZ's v15 qspinlock patch and improve upon the PV
   qspinlock code by dynamically allocating the hash table as well
   as some other performance optimization.
 - Simplified the Xen PV qspinlock code as suggested by David Vrabel
   david.vra...@citrix.com.
 - Add benchmarking data for 3.19 kernel to compare the performance
   of a spinlock heavy test with and without the qspinlock patch
   under different cpufreq drivers and scaling governors.

v13-v14:
 - Patches 1  2: Add queue_spin_unlock_wait() to accommodate commit
   78bff1c86 from Oleg Nesterov.
 - Fix the system hang problem when using PV qspinlock in an
   over-committed guest due to a racing condition in the
   pv_set_head_in_tail() function.
 - Increase the MAYHALT_THRESHOLD from 10 to 1024.
 - Change kick_cpu into a regular function pointer instead of a
   callee-saved function.
 - Change lock statistics code to use separate bits for different
   statistics.

v12-v13:
 - Change patch 9 to generate separate versions of the
   queue_spin_lock_slowpath functions for bare metal and PV guest. This
   reduces the performance impact of the PV code on bare metal systems.

v11-v12:
 - Based on PeterZ's version of the qspinlock patch
   (https://lkml.org/lkml/2014/6/15/63).
 - Incorporated many of the review comments from Konrad Wilk and
   Paolo Bonzini.
 - The pvqspinlock code is largely from my previous version with
   PeterZ's way of going from queue tail to head and his idea of
   using callee saved calls to KVM and XEN codes.

v10-v11:
  - Use a simple test-and-set unfair lock to simplify the code,
but performance may suffer a bit for large guest with many CPUs.
  - Take out Raghavendra KT's test results as the unfair lock changes
may render some of his results invalid.
  - Add PV support without increasing the size of the core queue node
structure.
  - Other minor changes to address some of the feedback comments.

v9-v10:
  - Make some minor changes to qspinlock.c to accommodate review feedback.
  - Change author to PeterZ for 2 of the patches.
  - Include Raghavendra KT's test results in patch 18.

v8-v9:
  - Integrate PeterZ's version of the queue spinlock patch with some
modification:
http://lkml.kernel.org/r/20140310154236.038181...@infradead.org
  - Break the more complex patches into smaller ones to ease review effort.
  - Fix a racing condition in the PV qspinlock code.

v7-v8:
  - Remove one unneeded atomic operation from the slowpath, thus
improving performance.
  - Simplify some of the codes and add more comments.
  - Test for X86_FEATURE_HYPERVISOR CPU feature bit to enable/disable
unfair lock.
  - Reduce unfair lock slowpath lock stealing frequency depending
on its distance from the queue head.
  - Add performance data for IvyBridge-EX CPU.

v6-v7:
  - Remove an atomic operation from the 2-task contending code
  - Shorten the names of some macros
  - Make the queue waiter to attempt to steal lock when unfair lock is
enabled.
  - Remove lock holder kick from the PV code and fix a race condition
  - Run the unfair lock  PV code on overcommitted KVM guests to collect
performance data.

v5-v6:
 - Change the optimized 2-task contending code to make it fairer at the
   expense of a bit of performance.
 - Add a patch to support unfair queue spinlock for Xen.
 - Modify the PV qspinlock code to follow what was done in the PV
   ticketlock.
 - Add performance data for the unfair lock as well as the PV
   support code.

v4-v5:
 - Move the optimized 2-task contending code to the generic file to
   enable more architectures to use it without code duplication.
 - Address some of the style-related comments by PeterZ.
 - Allow the use of unfair queue spinlock in a real para-virtualized
   execution environment.
 - Add para-virtualization support to the qspinlock code by ensuring
   that the lock holder and queue head stay alive as much as possible.

v3-v4:
 - Remove debugging code and fix a configuration error
 - Simplify the qspinlock structure and streamline the code to make it
   perform a bit better
 - Add an x86 version of asm/qspinlock.h for holding x86 specific
   optimization.
 - Add an optimized x86 code path for 2 contending tasks to improve
   low contention performance.

v2-v3:
 - Simplify the code by using numerous mode only without an unfair option.
 - Use the latest smp_load_acquire()/smp_store_release() barriers.
 - Move the queue spinlock code to kernel/locking.
 - Make the use of queue spinlock the default for x86-64 without user
   configuration.
 - Additional performance tuning.

v1-v2:
 - Add some more comments to document what the code does.
 -

[Xen-devel] [PATCH v16 09/14] pvqspinlock, x86: Implement the paravirt qspinlock call patching

2015-04-24 Thread Waiman Long

From: Peter Zijlstra (Intel) pet...@infradead.org

We use the regular paravirt call patching to switch between:

  native_queue_spin_lock_slowpath() __pv_queue_spin_lock_slowpath()
  native_queue_spin_unlock()__pv_queue_spin_unlock()

We use a callee saved call for the unlock function which reduces the
i-cache footprint and allows 'inlining' of SPIN_UNLOCK functions
again.

We further optimize the unlock path by patching the direct call with a
movb $0,%arg1 if we are indeed using the native unlock code. This
makes the unlock code almost as fast as the !PARAVIRT case.

This significantly lowers the overhead of having
CONFIG_PARAVIRT_SPINLOCKS enabled, even for native code.

Signed-off-by: Peter Zijlstra (Intel) pet...@infradead.org
Signed-off-by: Waiman Long waiman.l...@hp.com
---
 arch/x86/Kconfig  |2 +-
 arch/x86/include/asm/paravirt.h   |   29 -
 arch/x86/include/asm/paravirt_types.h |   10 ++
 arch/x86/include/asm/qspinlock.h  |   25 -
 arch/x86/include/asm/qspinlock_paravirt.h |6 ++
 arch/x86/kernel/paravirt-spinlocks.c  |   24 +++-
 arch/x86/kernel/paravirt_patch_32.c   |   22 ++
 arch/x86/kernel/paravirt_patch_64.c   |   22 ++
 8 files changed, 128 insertions(+), 12 deletions(-)
 create mode 100644 arch/x86/include/asm/qspinlock_paravirt.h

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 49fecb1..a0946e7 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -661,7 +661,7 @@ config PARAVIRT_DEBUG
 config PARAVIRT_SPINLOCKS
bool Paravirtualization layer for spinlocks
depends on PARAVIRT  SMP
-   select UNINLINE_SPIN_UNLOCK
+   select UNINLINE_SPIN_UNLOCK if !QUEUE_SPINLOCK
---help---
  Paravirtualized spinlocks allow a pvops backend to replace the
  spinlock implementation with something virtualization-friendly
diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index 965c47d..f76199a 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -712,6 +712,31 @@ static inline void __set_fixmap(unsigned /* enum 
fixed_addresses */ idx,
 
 #if defined(CONFIG_SMP)  defined(CONFIG_PARAVIRT_SPINLOCKS)
 
+#ifdef CONFIG_QUEUE_SPINLOCK
+
+static __always_inline void pv_queue_spin_lock_slowpath(struct qspinlock *lock,
+   u32 val)
+{
+   PVOP_VCALL2(pv_lock_ops.queue_spin_lock_slowpath, lock, val);
+}
+
+static __always_inline void pv_queue_spin_unlock(struct qspinlock *lock)
+{
+   PVOP_VCALLEE1(pv_lock_ops.queue_spin_unlock, lock);
+}
+
+static __always_inline void pv_wait(u8 *ptr, u8 val)
+{
+   PVOP_VCALL2(pv_lock_ops.wait, ptr, val);
+}
+
+static __always_inline void pv_kick(int cpu)
+{
+   PVOP_VCALL1(pv_lock_ops.kick, cpu);
+}
+
+#else /* !CONFIG_QUEUE_SPINLOCK */
+
 static __always_inline void __ticket_lock_spinning(struct arch_spinlock *lock,
__ticket_t ticket)
 {
@@ -724,7 +749,9 @@ static __always_inline void __ticket_unlock_kick(struct 
arch_spinlock *lock,
PVOP_VCALL2(pv_lock_ops.unlock_kick, lock, ticket);
 }
 
-#endif
+#endif /* CONFIG_QUEUE_SPINLOCK */
+
+#endif /* SMP  PARAVIRT_SPINLOCKS */
 
 #ifdef CONFIG_X86_32
 #define PV_SAVE_REGS pushl %ecx; pushl %edx;
diff --git a/arch/x86/include/asm/paravirt_types.h 
b/arch/x86/include/asm/paravirt_types.h
index 7549b8b..f6acaea 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -333,9 +333,19 @@ struct arch_spinlock;
 typedef u16 __ticket_t;
 #endif
 
+struct qspinlock;
+
 struct pv_lock_ops {
+#ifdef CONFIG_QUEUE_SPINLOCK
+   void (*queue_spin_lock_slowpath)(struct qspinlock *lock, u32 val);
+   struct paravirt_callee_save queue_spin_unlock;
+
+   void (*wait)(u8 *ptr, u8 val);
+   void (*kick)(int cpu);
+#else /* !CONFIG_QUEUE_SPINLOCK */
struct paravirt_callee_save lock_spinning;
void (*unlock_kick)(struct arch_spinlock *lock, __ticket_t ticket);
+#endif /* !CONFIG_QUEUE_SPINLOCK */
 };
 
 /* This contains all the paravirt structures: we get a convenient
diff --git a/arch/x86/include/asm/qspinlock.h b/arch/x86/include/asm/qspinlock.h
index 64c925e..c8290db 100644
--- a/arch/x86/include/asm/qspinlock.h
+++ b/arch/x86/include/asm/qspinlock.h
@@ -3,6 +3,7 @@
 
 #include asm/cpufeature.h
 #include asm-generic/qspinlock_types.h
+#include asm/paravirt.h
 
 #definequeue_spin_unlock queue_spin_unlock
 /**
@@ -11,11 +12,33 @@
  *
  * A smp_store_release() on the least-significant byte.
  */
-static inline void queue_spin_unlock(struct qspinlock *lock)
+static inline void native_queue_spin_unlock(struct qspinlock *lock)
 {
smp_store_release((u8 *)lock, 0);
 }
 
+#ifdef CONFIG_PARAVIRT_SPINLOCKS
+extern void

[Xen-devel] [PATCH v16 10/14] pvqspinlock, x86: Enable PV qspinlock for KVM

2015-04-24 Thread Waiman Long

This patch adds the necessary KVM specific code to allow KVM to
support the CPU halting and kicking operations needed by the queue
spinlock PV code.

Signed-off-by: Waiman Long waiman.l...@hp.com
---
 arch/x86/kernel/kvm.c |   43 +++
 kernel/Kconfig.locks  |2 +-
 2 files changed, 44 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index e354cc6..4bb42c0 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -584,6 +584,39 @@ static void kvm_kick_cpu(int cpu)
kvm_hypercall2(KVM_HC_KICK_CPU, flags, apicid);
 }
 
+
+#ifdef CONFIG_QUEUE_SPINLOCK
+
+#include asm/qspinlock.h
+
+static void kvm_wait(u8 *ptr, u8 val)
+{
+   unsigned long flags;
+
+   if (in_nmi())
+   return;
+
+   local_irq_save(flags);
+
+   if (READ_ONCE(*ptr) != val)
+   goto out;
+
+   /*
+* halt until it's our turn and kicked. Note that we do safe halt
+* for irq enabled case to avoid hang when lock info is overwritten
+* in irq spinlock slowpath and no spurious interrupt occur to save us.
+*/
+   if (arch_irqs_disabled_flags(flags))
+   halt();
+   else
+   safe_halt();
+
+out:
+   local_irq_restore(flags);
+}
+
+#else /* !CONFIG_QUEUE_SPINLOCK */
+
 enum kvm_contention_stat {
TAKEN_SLOW,
TAKEN_SLOW_PICKUP,
@@ -817,6 +850,8 @@ static void kvm_unlock_kick(struct arch_spinlock *lock, 
__ticket_t ticket)
}
 }
 
+#endif /* !CONFIG_QUEUE_SPINLOCK */
+
 /*
  * Setup pv_lock_ops to exploit KVM_FEATURE_PV_UNHALT if present.
  */
@@ -828,8 +863,16 @@ void __init kvm_spinlock_init(void)
if (!kvm_para_has_feature(KVM_FEATURE_PV_UNHALT))
return;
 
+#ifdef CONFIG_QUEUE_SPINLOCK
+   __pv_init_lock_hash();
+   pv_lock_ops.queue_spin_lock_slowpath = __pv_queue_spin_lock_slowpath;
+   pv_lock_ops.queue_spin_unlock = PV_CALLEE_SAVE(__pv_queue_spin_unlock);
+   pv_lock_ops.wait = kvm_wait;
+   pv_lock_ops.kick = kvm_kick_cpu;
+#else /* !CONFIG_QUEUE_SPINLOCK */
pv_lock_ops.lock_spinning = PV_CALLEE_SAVE(kvm_lock_spinning);
pv_lock_ops.unlock_kick = kvm_unlock_kick;
+#endif
 }
 
 static __init int kvm_spinlock_init_jump(void)
diff --git a/kernel/Kconfig.locks b/kernel/Kconfig.locks
index c6a8f7c..537b13e 100644
--- a/kernel/Kconfig.locks
+++ b/kernel/Kconfig.locks
@@ -240,7 +240,7 @@ config ARCH_USE_QUEUE_SPINLOCK
 
 config QUEUE_SPINLOCK
def_bool y if ARCH_USE_QUEUE_SPINLOCK
-   depends on SMP  !PARAVIRT_SPINLOCKS
+   depends on SMP  (!PARAVIRT_SPINLOCKS || !XEN)
 
 config ARCH_USE_QUEUE_RWLOCK
bool
-- 
1.7.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH v16 06/14] qspinlock: Use a simple write to grab the lock

2015-04-24 Thread Waiman Long

Currently, atomic_cmpxchg() is used to get the lock. However, this
is not really necessary if there is more than one task in the queue
and the queue head don't need to reset the tail code. For that case,
a simple write to set the lock bit is enough as the queue head will
be the only one eligible to get the lock as long as it checks that
both the lock and pending bits are not set. The current pending bit
waiting code will ensure that the bit will not be set as soon as the
tail code in the lock is set.

With that change, the are some slight improvement in the performance
of the queue spinlock in the 5M loop micro-benchmark run on a 4-socket
Westere-EX machine as shown in the tables below.

[Standalone/Embedded - same node]
  # of tasksBefore patchAfter patch %Change
  ----- --  ---
   3 2324/2321  2248/2265-3%/-2%
   4 2890/2896  2819/2831-2%/-2%
   5 3611/3595  3522/3512-2%/-2%
   6 4281/4276  4173/4160-3%/-3%
   7 5018/5001  4875/4861-3%/-3%
   8 5759/5750  5563/5568-3%/-3%

[Standalone/Embedded - different nodes]
  # of tasksBefore patchAfter patch %Change
  ----- --  ---
   312242/12237 12087/12093  -1%/-1%
   410688/10696 10507/10521  -2%/-2%

It was also found that this change produced a much bigger performance
improvement in the newer IvyBridge-EX chip and was essentially to close
the performance gap between the ticket spinlock and queue spinlock.

The disk workload of the AIM7 benchmark was run on a 4-socket
Westmere-EX machine with both ext4 and xfs RAM disks at 3000 users
on a 3.14 based kernel. The results of the test runs were:

AIM7 XFS Disk Test
  kernel JPMReal Time   Sys TimeUsr Time
  -  ----   
  ticketlock56782333.17   96.61   5.81
  qspinlock 57507993.13   94.83   5.97

AIM7 EXT4 Disk Test
  kernel JPMReal Time   Sys TimeUsr Time
  -  ----   
  ticketlock1114551   16.15  509.72   7.11
  qspinlock 21844668.24  232.99   6.01

The ext4 filesystem run had a much higher spinlock contention than
the xfs filesystem run.

The ebizzy -m test was also run with the following results:

  kernel   records/s  Real Time   Sys TimeUsr Time
  --  -   
  ticketlock 2075   10.00  216.35   3.49
  qspinlock  3023   10.00  198.20   4.80

Signed-off-by: Waiman Long waiman.l...@hp.com
Signed-off-by: Peter Zijlstra (Intel) pet...@infradead.org
---
 kernel/locking/qspinlock.c |   66 +--
 1 files changed, 50 insertions(+), 16 deletions(-)

diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c
index bcc99e6..99503ef 100644
--- a/kernel/locking/qspinlock.c
+++ b/kernel/locking/qspinlock.c
@@ -105,24 +105,37 @@ static inline struct mcs_spinlock *decode_tail(u32 tail)
  * By using the whole 2nd least significant byte for the pending bit, we
  * can allow better optimization of the lock acquisition for the pending
  * bit holder.
+ *
+ * This internal structure is also used by the set_locked function which
+ * is not restricted to _Q_PENDING_BITS == 8.
  */
-#if _Q_PENDING_BITS == 8
-
 struct __qspinlock {
union {
atomic_t val;
-   struct {
 #ifdef __LITTLE_ENDIAN
+   struct {
+   u8  locked;
+   u8  pending;
+   };
+   struct {
u16 locked_pending;
u16 tail;
+   };
 #else
+   struct {
u16 tail;
u16 locked_pending;
-#endif
};
+   struct {
+   u8  reserved[2];
+   u8  pending;
+   u8  locked;
+   };
+#endif
};
 };
 
+#if _Q_PENDING_BITS == 8
 /**
  * clear_pending_set_locked - take ownership and clear the pending bit.
  * @lock: Pointer to queue spinlock structure
@@ -195,6 +208,19 @@ static __always_inline u32 xchg_tail(struct qspinlock 
*lock, u32 tail)
 #endif /* _Q_PENDING_BITS == 8 */
 
 /**
+ * set_locked - Set the lock bit and own the lock
+ * @lock: Pointer to queue spinlock structure
+ *
+ * *,*,0 - *,0,1
+ */
+static __always_inline void set_locked(struct qspinlock *lock)
+{
+   struct __qspinlock *l = (void *)lock;
+
+   WRITE_ONCE(l-locked, _Q_LOCKED_VAL);
+}
+
+/**
  *

1 2 >

1 - 100 of 124 matches

Mail list logo