Re: [Xen-devel] [PATCH v11 6/9] xen: Add ring 3 vmware_port support

2015-06-12 Thread Jan Beulich
 On 12.06.15 at 00:10, dsl...@verizon.com wrote:
 On 06/05/15 06:54, Ian Campbell wrote:
 It would be really useful to see a comprehensive list of exactly what
 guest ring3 access to the vmware port actually enables i.e. a list of
 specific features which require it.
 
 Ok, I have done some testing.  Here is what I know:
 
 Without ring3 support:
 
 1) VMware tools will not install on linux and windows.
 2) open-vm-tools (https://github.com/vmware/open-vm-tools) will not
 install (how ever it is not hard to change it to do so, you need to add
 a call to iopl(3) need to be added in a few places) on linux
 
 However if VMware tools did get installed on the window disk bits
 somehow, the VMware mouse support works.  Linux gets this because Xorg
 detects and uses the VMware mouse under IOPL(3).

Now that tells us that the tools may not work, but not what
implications that has on the usability of the VM once migrated
to Xen. Them not installing is a non-issue afaict, since after
having moved the VM to Xen there shouldn't be a need to
install them anymore - either they've been there, or I don't
see why they would be needed _after_ the move. And you
say that the mouse works in both cases if the tools happen
to be there.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v3][PATCH 06/16] hvmloader: get guest memory map into memory_map[]

2015-06-12 Thread Chen, Tiejun

On 2015/6/11 17:38, Tian, Kevin wrote:

From: Chen, Tiejun
Sent: Thursday, June 11, 2015 9:15 AM

Now we get this map layout by call XENMEM_memory_map then
save them into one global variable memory_map[]. It should
include lowmem range, rdm range and highmem range. Note
rdm range and highmem range may not exist in some cases.

And here we need to check if any reserved memory conflicts with
[RESERVED_MEMORY_DYNAMIC_START - 1, RESERVED_MEMORY_DYNAMIC_END].
This range is used to allocate memory in hvmloder level, and
we would lead hvmloader failed in case of conflict since its
another rare possibility in real world.

Signed-off-by: Tiejun Chen tiejun.c...@intel.com
---
  tools/firmware/hvmloader/e820.h  |  7 +++
  tools/firmware/hvmloader/hvmloader.c | 37

  tools/firmware/hvmloader/util.c  | 26 +
  tools/firmware/hvmloader/util.h  | 11 +++
  4 files changed, 81 insertions(+)

diff --git a/tools/firmware/hvmloader/e820.h b/tools/firmware/hvmloader/e820.h
index b2ead7f..8b5a9e0 100644
--- a/tools/firmware/hvmloader/e820.h
+++ b/tools/firmware/hvmloader/e820.h
@@ -15,6 +15,13 @@ struct e820entry {
  uint32_t type;
  } __attribute__((packed));

+#define E820MAX128
+
+struct e820map {
+unsigned int nr_map;
+struct e820entry map[E820MAX];
+};
+
  #endif /* __HVMLOADER_E820_H__ */

  /*
diff --git a/tools/firmware/hvmloader/hvmloader.c
b/tools/firmware/hvmloader/hvmloader.c
index 25b7f08..c9f170e 100644
--- a/tools/firmware/hvmloader/hvmloader.c
+++ b/tools/firmware/hvmloader/hvmloader.c
@@ -107,6 +107,8 @@ asm (
  .text   \n
  );

+struct e820map memory_map;
+
  unsigned long scratch_start = SCRATCH_PHYSICAL_ADDRESS;

  static void init_hypercalls(void)
@@ -199,6 +201,39 @@ static void apic_setup(void)
  ioapic_write(0x11, SET_APIC_ID(LAPIC_ID(0)));
  }

+void memory_map_setup(void)
+{
+unsigned int nr_entries = E820MAX, i;
+int rc;
+uint64_t alloc_addr = RESERVED_MEMORY_DYNAMIC_START - 1;
+uint64_t alloc_size = RESERVED_MEMORY_DYNAMIC_END - alloc_addr;
+
+rc = get_mem_mapping_layout(memory_map.map, nr_entries);
+
+if ( rc )
+{
+printf(Failed to get guest memory map.\n);
+BUG();
+}
+
+BUG_ON(!nr_entries);
+memory_map.nr_map = nr_entries;
+
+for ( i = 0; i  nr_entries; i++ )
+{
+if ( memory_map.map[i].type == E820_RESERVED )
+{
+if ( check_overlap(alloc_addr, alloc_size,
+   memory_map.map[i].addr,
+   memory_map.map[i].size) )
+{
+printf(RDM conflicts Memory allocation.\n);


hvmloader has no concept of RDM here. It's just E820_RESERVED
type. Please make the error message clear, e.g. Fail to setup
memory map due to conflict on dynamic reserved memory range.



Okay,

+{
+printf(Fail to setup memory map due to conflict);
+printf( on dynamic reserved memory range.\n);
+BUG();
+}



Otherwise:

Reviewed-by: Kevin Tian kevin.t...@intel.com



Thanks
Tiejun

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH V4 3/7] libxl: add pvusb API

2015-06-12 Thread Chun Yan Liu


 On 6/12/2015 at 03:39 PM, in message
557afd2f0266000d4...@relay2.provo.novell.com, Chun Yan Liu
cy...@suse.com wrote: 

  
 On 6/12/2015 at 12:42 AM, in message 
 21881.47707.526863.158...@mariner.uk.xensource.com, Ian Jackson 
 ian.jack...@eu.citrix.com wrote:  
  Chunyan Liu writes ([Xen-devel] [PATCH V4 3/7] libxl: add pvusb API):  
   Add pvusb APIs, including:  
  ...  

  Thanks for your contribution.  I'm afraid I haven't had time to  
  completely finish my review this, but here's what I have:  

   --- /dev/null  
   +++ b/docs/misc/pvusb.txt  
   @@ -0,0 +1,243 @@  
   +1. Overview  

  It's good that you have provided documentation.  But I think this  
  document is a bit confused about its audience.  

  Information about design choices should be removed from this user- and  
  application-facing document, and put in comments in the code, or  
  commit messages, I think.  
  
 Thanks. Will update. 
  


   +int libxl_device_usbctrl_getinfo(libxl_ctx *ctx, uint32_t domid,  
   +libxl_device_usbctrl *usbctrl,  
   +libxl_usbctrlinfo *usbctrlinfo)  
   +LIBXL_EXTERNAL_CALLERS_ONLY;  

  Why is this function marked LIBXL_EXTERNAL_CALLERS_ONLY ?  
  
 Currently libxl itself won't call it. Exposed for toolstack usage. Will 
 remove that if it's not proper. 
  

   diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h  
   index ef7aa1d..89a9f07 100644  
   --- a/tools/libxl/libxl_internal.h  
   +++ b/tools/libxl/libxl_internal.h  
   @@ -2439,6 +2439,18 @@ _hidden void libxl__device_vtpm_add(libxl__egc 
   *egc,  
   
  uint32_t domid,  
  ...  
   +/* AO operation to connect a PVUSB controller.  
   + * Adding a PVUSB controller will add a 'vusb' device entry in xenstore, 

   + * and will wait for device connection.  

  In this context I think will wait for device connection is  
  misleading.  What you mean is that the vusb is available for adding  
  devices to, but won't have any yet.  Nothing in libxl is waiting.  
  
 Here I mean libxl_wait_for_device_connection. Since it adds a new device  
 entry 
 to xenstore, it needs to wait for a moment for frontend/backend connection. 
  

   diff --git a/tools/libxl/libxl_pvusb.c b/tools/libxl/libxl_pvusb.c  
   new file mode 100644  
   index 000..a6e1aa1  
   --- /dev/null  
   +++ b/tools/libxl/libxl_pvusb.c  
   @@ -0,0 +1,1249 @@  
  ...  
   +void libxl__device_usbctrl_add(libxl__egc *egc, uint32_t domid,  
   +   libxl_device_usbctrl *usbctrl,  
   +   libxl__ao_device *aodev)  
   +{  
   +STATE_AO_GC(aodev-ao);  
  ...  
   +libxl_domain_config_init(d_config);  
   +libxl_device_usbctrl_init(usbctrl_saved);  
   +libxl_device_usbctrl_copy(CTX, usbctrl_saved, usbctrl);  

  Wei will need to review the saved/live saved device info handling, and  
  the json, etc.  

   +static int  
   +libxl_device_usbctrl_remove_common(libxl_ctx *ctx, uint32_t domid,  
   +   libxl_device_usbctrl *usbctrl,  
   +   const libxl_asyncop_how *ao_how,  
   +   int force)  

  As discussed, you mustn't call this within libxl. 
 I don't quite understand why. I guess it's the same as usb_add problem, 
 something related to embedded ao? 
 As in usb_add: 
 libxl_device_usb_add() 
AO_CREATE(ctx, domid, ao_how) 
libxl__device_usb_add() 
  libxl__device_usb_setdefault() 
libxl_device_usbctrl_add_common() 
  AO_CREATE(ctx, domid, NULL) 
 if this is not allowed, what is the correct way? 

Saw the discussion thread and got it. Will update the codes.

  
  If you need to, you  
  need to break it out into an internal function  
  (libxl__initiate_device_usbctrl_remove, maybe) which makes a callback  
  when done. 

   +libxl_device_usbctrl *  
   +libxl_device_usbctrl_list(libxl_ctx *ctx, uint32_t domid, int *num)  
   +{  
   +GC_INIT(ctx);  
   +libxl_device_usbctrl *usbctrls = NULL;  
   +char *fe_path = NULL;  
   +char **dir = NULL;  
   +unsigned int ndirs = 0;  
   +  
   +*num = 0;  
   +  
   +fe_path = GCSPRINTF(%s/device/vusb,  
   +libxl__xs_get_dompath(gc, domid));  
   +dir = libxl__xs_directory(gc, XBT_NULL, fe_path, ndirs);  
   +  
   +if (dir  ndirs) {  
   +usbctrls = malloc(sizeof(*usbctrls) * ndirs);  

  Please use libxl__calloc with NOGC.  
  
 Thanks. Missing this one. 
  

   +libxl_device_usbctrl* usbctrl;  
   +libxl_device_usbctrl* end = usbctrls + ndirs;  
   +for (usbctrl = usbctrls; usbctrl  end; usbctrl++, dir++,  
 (*num)++)   
  {  
   +char *tmp;  
   +const char *be_path = libxl__xs_read(gc, XBT_NULL,  
   +   

Re: [Xen-devel] [v3][PATCH 08/16] hvmloader/e820: construct guest e820 table

2015-06-12 Thread Chen, Tiejun

On 2015/6/11 17:59, Tian, Kevin wrote:

From: Chen, Tiejun
Sent: Thursday, June 11, 2015 9:15 AM

Now we can use that memory map to build our final
e820 table but it may need to reorder all e820
entries.

Signed-off-by: Tiejun Chen tiejun.c...@intel.com
---
  tools/firmware/hvmloader/e820.c | 62
+++--
  1 file changed, 48 insertions(+), 14 deletions(-)

diff --git a/tools/firmware/hvmloader/e820.c b/tools/firmware/hvmloader/e820.c
index 2e05e93..c39b0aa 100644
--- a/tools/firmware/hvmloader/e820.c
+++ b/tools/firmware/hvmloader/e820.c
@@ -73,7 +73,8 @@ int build_e820_table(struct e820entry *e820,
   unsigned int lowmem_reserved_base,
   unsigned int bios_image_base)
  {
-unsigned int nr = 0;
+unsigned int nr = 0, i, j;
+uint64_t low_mem_pgend = hvm_info-low_mem_pgend  PAGE_SHIFT;


You may call it low_mem_end to differentiate from original
low_mem_pgend since one means actual address while the
other means pfn.


Okay.





  if ( !lowmem_reserved_base )
  lowmem_reserved_base = 0xA;
@@ -117,13 +118,6 @@ int build_e820_table(struct e820entry *e820,
  e820[nr].type = E820_RESERVED;
  nr++;

-/* Low RAM goes here. Reserve space for special pages. */
-BUG_ON((hvm_info-low_mem_pgend  PAGE_SHIFT)  (2u  20));
-e820[nr].addr = 0x10;
-e820[nr].size = (hvm_info-low_mem_pgend  PAGE_SHIFT) - e820[nr].addr;
-e820[nr].type = E820_RAM;
-nr++;
-
  /*
   * Explicitly reserve space for special pages.
   * This space starts at RESERVED_MEMBASE an extends to cover various
@@ -159,16 +153,56 @@ int build_e820_table(struct e820entry *e820,
  nr++;
  }

-
-if ( hvm_info-high_mem_pgend )
+/*
+ * Construct the remaining according memory_map.


Construct E820 table according to recorded memory map


Fixed.




+ *
+ * Note memory_map includes,


The memory map created by toolstack may include:



Fixed.


+ *
+ * #1. Low memory region
+ *
+ * Low RAM starts at least from 1M to make sure all standard regions
+ * of the PC memory map, like BIOS, VGA memory-mapped I/O and vgabios,
+ * have enough space.
+ *
+ * #2. RDM region if it exists


Reserved regions if they exist


Fixed.




+ *
+ * #3. High memory region if it exists
+ */
+for ( i = 0; i  memory_map.nr_map; i++ )
  {
-e820[nr].addr = ((uint64_t)1  32);
-e820[nr].size =
-((uint64_t)hvm_info-high_mem_pgend  PAGE_SHIFT) - e820[nr].addr;
-e820[nr].type = E820_RAM;
+e820[nr] = memory_map.map[i];
  nr++;
  }

+/* Low RAM goes here. Reserve space for special pages. */
+BUG_ON(low_mem_pgend  (2u  20));
+/*
+ * We may need to adjust real lowmem end since we may
+ * populate RAM to get enough MMIO previously.
+ */
+for ( i = 0; i  memory_map.nr_map; i++ )


since you already translate memory map into e820 earlier, here
you should use 'nr' instead of memory_map.nr_map.



As we're saying in the code comment above, we're just handling the 
lowmem entry, so I think memory_map.nr_map is enough.



+{
+uint64_t end = e820[i].addr + e820[i].size;
+if ( e820[i].type == E820_RAM 
+ low_mem_pgend  e820[i].addr  low_mem_pgend  end )
+e820[i].size = low_mem_pgend - e820[i].addr;
+}


Sorry I may miss the code but could you elaborate where the
low_mem_pgend is changed after memory map is created? If
it happens within hvmloader, suppose the amount of reduced
memory from original E820_RAM entry should be added to
another E820_RAM entry for highmem, right?


You're right so I really should compensate this in highmem entry,

add_high_mem = end - low_mem_end;

/*
 * And then we also need to adjust highmem.
 */
if ( add_high_mem )
{
for ( i = 0; i  memory_map.nr_map; i++ )
{
if ( e820[i].type == E820_RAM 
 e820[i].addr  (1ull  32))
e820[i].size += add_high_mem;
}
}


Thanks
Tiejun




+
+/* Finally we need to reorder all e820 entries. */
+for ( j = 0; j  nr-1; j++ )
+{
+for ( i = j+1; i  nr; i++ )
+{
+if ( e820[j].addr  e820[i].addr )
+{
+struct e820entry tmp;
+tmp = e820[j];
+e820[j] = e820[i];
+e820[i] = tmp;
+}
+}
+}
+
  return nr;
  }

--
1.9.1






___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v3][PATCH 00/16] Fix RMRR

2015-06-12 Thread Chen, Tiejun

On 2015/6/12 16:04, Jan Beulich wrote:

On 12.06.15 at 04:10, tiejun.c...@intel.com wrote:

On 2015/6/11 20:52, Tim Deegan wrote:

which would be better handeld explicitly:

 if ( p2mt == p2m_invalid || p2mt == p2m_mmio_dm )
 ...



So if I'm correct, we should do this check explicitly,

 if ( p2mt == p2m_invalid ||
  (p2mt == p2m_mmio_dm  !mfn_valid(mfn) )

Note this is equivalent to Jan's comment.


I think the !mfn_valid() part is really redundant - p2m_mmio_dm
should never be put on a page translating to a valid MFN.



I'm not sure immediately but I believe you're definitely right at this 
point, so lets goes there :)


Thanks
Tiejun


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v8 2/8] xen/arm: Add functions of mapping between vCPUID and virtual affinity

2015-06-12 Thread Chen Baozi
From: Chen Baozi baoz...@gmail.com

GICv3 restricts that the maximum number of CPUs in affinity 0 (one
cluster) is 16. (See the note of 'Bits[15:0]' in '5.7.29 ICC_SGI0R_EL1
ICC_SGI1R_EL1 and ICC_ASGI1R_EL1, GICv3 Architecture Specification')
That is to say the upper 4 bits of affinity 0 is unused. Current
implementation considers that AFF0 is equal to vCPUID, which makes all
vCPUs in one cluster, limiting its number to 16. If we would like to
support more than 16 number of vCPU in one guest, we need to make use
of AFF1. Considering the unused upper 4 bits, we need to create a pair
of functions mapping the vCPUID and virtual affinity.

Signed-off-by: Chen Baozi baoz...@gmail.com
Reviewed-by: Julien Grall julien.gr...@citrix.com
---
 xen/include/asm-arm/domain.h | 38 ++
 1 file changed, 38 insertions(+)

diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h
index 75b17af..35b9a6d 100644
--- a/xen/include/asm-arm/domain.h
+++ b/xen/include/asm-arm/domain.h
@@ -266,6 +266,44 @@ static inline unsigned int domain_max_vcpus(const struct 
domain *d)
 return MAX_VIRT_CPUS;
 }
 
+/*
+ * Due to the restriction of GICv3, the number of vCPUs in AFF0 is
+ * limited to 16, thus only the first 4 bits of AFF0 are legal. We will
+ * use the first 2 affinity levels here, expanding the number of vCPU up
+ * to 4096 (16*256), which is more than the PEs that GIC-500 supports.
+ *
+ * Since we don't save information of vCPU's topology (affinity) in
+ * vMPIDR at the moment, we map the vcpuid to the vMPIDR linearly.
+ */
+static inline unsigned int vaffinity_to_vcpuid(register_t vaff)
+{
+unsigned int vcpuid;
+
+vaff = MPIDR_HWID_MASK;
+
+vcpuid = MPIDR_AFFINITY_LEVEL(vaff, 0);
+vcpuid |= MPIDR_AFFINITY_LEVEL(vaff, 1)  4;
+
+return vcpuid;
+}
+
+static inline register_t vcpuid_to_vaffinity(unsigned int vcpuid)
+{
+register_t vaff;
+
+/*
+ * Right now only AFF0 and AFF1 are supported in virtual affinity.
+ * Since only the first 4 bits in AFF0 are used in GICv3, the
+ * available bits are 12 (4+8).
+ */
+BUILD_BUG_ON(!(MAX_VIRT_CPUS  ((1  12;
+
+vaff = (vcpuid  0x0f)  MPIDR_LEVEL_SHIFT(0);
+vaff |= ((vcpuid  4)  MPIDR_LEVEL_MASK)  MPIDR_LEVEL_SHIFT(1);
+
+return vaff;
+}
+
 #endif /* __ASM_DOMAIN_H__ */
 
 /*
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v8 5/8] tools/libxl: Set 'reg' of cpu node equal to MPIDR affinity for domU

2015-06-12 Thread Chen Baozi
From: Chen Baozi baoz...@gmail.com

According to ARM CPUs bindings, the reg field should match the MPIDR's
affinity bits. We will use AFF0 and AFF1 when constructing the reg value
of the guest at the moment, for it is enough for the current max vcpu
number.

Signed-off-by: Chen Baozi baoz...@gmail.com
Reviewed-by: Julien Grall julien.gr...@citrix.com
Acked-by: Ian Campbell ian.campb...@citrix.com
---
 tools/libxl/libxl_arm.c | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/tools/libxl/libxl_arm.c b/tools/libxl/libxl_arm.c
index c5088c4..5f3c434 100644
--- a/tools/libxl/libxl_arm.c
+++ b/tools/libxl/libxl_arm.c
@@ -272,6 +272,7 @@ static int make_cpus_node(libxl__gc *gc, void *fdt, int 
nr_cpus,
   const struct arch_info *ainfo)
 {
 int res, i;
+uint64_t mpidr_aff;
 
 res = fdt_begin_node(fdt, cpus);
 if (res) return res;
@@ -283,7 +284,16 @@ static int make_cpus_node(libxl__gc *gc, void *fdt, int 
nr_cpus,
 if (res) return res;
 
 for (i = 0; i  nr_cpus; i++) {
-const char *name = GCSPRINTF(cpu@%d, i);
+const char *name;
+
+/*
+ * According to ARM CPUs bindings, the reg field should match
+ * the MPIDR's affinity bits. We will use AFF0 and AFF1 when
+ * constructing the reg value of the guest at the moment, for it
+ * is enough for the current max vcpu number.
+ */
+mpidr_aff = (i  0x0f) | (((i  4)  0xff)  8);
+name = GCSPRINTF(cpu@%PRIx64, mpidr_aff);
 
 res = fdt_begin_node(fdt, name);
 if (res) return res;
@@ -297,7 +307,7 @@ static int make_cpus_node(libxl__gc *gc, void *fdt, int 
nr_cpus,
 res = fdt_property_string(fdt, enable-method, psci);
 if (res) return res;
 
-res = fdt_property_regs(gc, fdt, 1, 0, 1, (uint64_t)i);
+res = fdt_property_regs(gc, fdt, 1, 0, 1, mpidr_aff);
 if (res) return res;
 
 res = fdt_end_node(fdt);
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v8 4/8] xen/arm: Use AFF1 when translating ICC_SGI1R_EL1 to cpumask

2015-06-12 Thread Chen Baozi
From: Chen Baozi baoz...@gmail.com

The old unsigned long type of vcpu_mask can only express 64 cpus at the
most, which might not be enough for the guest which used vGICv3. We
introduce a new struct sgi_target for the target cpu list of SGI, which
holds the affinity path information (only level 1 at the moment). For
GICv2 that has no affinity level, we can just set the corresponding
fields to be 0.

Signed-off-by: Chen Baozi baoz...@gmail.com
---
 xen/arch/arm/vgic-v2.c|  7 +++---
 xen/arch/arm/vgic-v3.c| 10 +
 xen/arch/arm/vgic.c   | 45 +--
 xen/include/asm-arm/gic_v3_defs.h |  3 +++
 xen/include/asm-arm/vgic.h|  7 +-
 5 files changed, 38 insertions(+), 34 deletions(-)

diff --git a/xen/arch/arm/vgic-v2.c b/xen/arch/arm/vgic-v2.c
index 3be1a51..5949cf1 100644
--- a/xen/arch/arm/vgic-v2.c
+++ b/xen/arch/arm/vgic-v2.c
@@ -201,16 +201,17 @@ static int vgic_v2_to_sgi(struct vcpu *v, register_t sgir)
 int virq;
 int irqmode;
 enum gic_sgi_mode sgi_mode;
-unsigned long vcpu_mask = 0;
+struct sgi_target target;
 
+memset(target, 0, sizeof(struct sgi_target));
 irqmode = (sgir  GICD_SGI_TARGET_LIST_MASK)  GICD_SGI_TARGET_LIST_SHIFT;
 virq = (sgir  GICD_SGI_INTID_MASK);
-vcpu_mask = (sgir  GICD_SGI_TARGET_MASK)  GICD_SGI_TARGET_SHIFT;
 
 /* Map GIC sgi value to enum value */
 switch ( irqmode )
 {
 case GICD_SGI_TARGET_LIST_VAL:
+target.list = (sgir  GICD_SGI_TARGET_MASK)  GICD_SGI_TARGET_SHIFT;
 sgi_mode = SGI_TARGET_LIST;
 break;
 case GICD_SGI_TARGET_OTHERS_VAL:
@@ -226,7 +227,7 @@ static int vgic_v2_to_sgi(struct vcpu *v, register_t sgir)
 return 0;
 }
 
-return vgic_to_sgi(v, sgir, sgi_mode, virq, vcpu_mask);
+return vgic_to_sgi(v, sgir, sgi_mode, virq, target);
 }
 
 static int vgic_v2_distr_mmio_write(struct vcpu *v, mmio_info_t *info)
diff --git a/xen/arch/arm/vgic-v3.c b/xen/arch/arm/vgic-v3.c
index ef9a71a..93610d0 100644
--- a/xen/arch/arm/vgic-v3.c
+++ b/xen/arch/arm/vgic-v3.c
@@ -977,17 +977,19 @@ static int vgic_v3_to_sgi(struct vcpu *v, register_t sgir)
 int virq;
 int irqmode;
 enum gic_sgi_mode sgi_mode;
-unsigned long vcpu_mask = 0;
+struct sgi_target target;
 
+memset(target, 0, sizeof(struct sgi_target));
 irqmode = (sgir  ICH_SGI_IRQMODE_SHIFT)  ICH_SGI_IRQMODE_MASK;
 virq = (sgir  ICH_SGI_IRQ_SHIFT )  ICH_SGI_IRQ_MASK;
-/* SGI's are injected at Rdist level 0. ignoring affinity 1, 2, 3 */
-vcpu_mask = sgir  ICH_SGI_TARGETLIST_MASK;
 
 /* Map GIC sgi value to enum value */
 switch ( irqmode )
 {
 case ICH_SGI_TARGET_LIST:
+/* We assume that only AFF1 is used in ICC_SGI1R_EL1. */
+target.aff1 = (sgir  ICH_SGI_AFFINITY_LEVEL(1))  ICH_SGI_AFFx_MASK;
+target.list = sgir  ICH_SGI_TARGETLIST_MASK;
 sgi_mode = SGI_TARGET_LIST;
 break;
 case ICH_SGI_TARGET_OTHERS:
@@ -998,7 +1000,7 @@ static int vgic_v3_to_sgi(struct vcpu *v, register_t sgir)
 return 0;
 }
 
-return vgic_to_sgi(v, sgir, sgi_mode, virq, vcpu_mask);
+return vgic_to_sgi(v, sgir, sgi_mode, virq, target);
 }
 
 static int vgic_v3_emulate_sysreg(struct cpu_user_regs *regs, union hsr hsr)
diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c
index 7b387b7..59bd98a 100644
--- a/xen/arch/arm/vgic.c
+++ b/xen/arch/arm/vgic.c
@@ -318,15 +318,14 @@ void vgic_enable_irqs(struct vcpu *v, uint32_t r, int n)
 }
 }
 
-/* TODO: unsigned long is used to fit vcpu_mask.*/
 int vgic_to_sgi(struct vcpu *v, register_t sgir, enum gic_sgi_mode irqmode, 
int virq,
-unsigned long vcpu_mask)
+const struct sgi_target *target)
 {
 struct domain *d = v-domain;
 int vcpuid;
 int i;
-
-ASSERT(d-max_vcpus  8*sizeof(vcpu_mask));
+unsigned int base;
+unsigned long int bitmap;
 
 ASSERT( virq  16 );
 
@@ -334,29 +333,33 @@ int vgic_to_sgi(struct vcpu *v, register_t sgir, enum 
gic_sgi_mode irqmode, int
 {
 case SGI_TARGET_LIST:
 perfc_incr(vgic_sgi_list);
+base = target-aff1  4;
+bitmap = target-list;
+for_each_set_bit( i, bitmap, sizeof(target-list) * 8 )
+{
+vcpuid = base + i;
+if ( d-vcpu[vcpuid] != NULL  !is_vcpu_online(d-vcpu[vcpuid]) )
+{
+gprintk(XENLOG_WARNING, VGIC: write r=%PRIregister \
+target-list=%hx, wrong CPUTargetList \n,
+sgir, target-list);
+continue;
+}
+vgic_vcpu_inject_irq(d-vcpu[vcpuid], virq);
+}
 break;
 case SGI_TARGET_OTHERS:
-/*
- * We expect vcpu_mask to be 0 for SGI_TARGET_OTHERS and
- * SGI_TARGET_SELF mode. So Force vcpu_mask to 0
- */
 perfc_incr(vgic_sgi_others);
-vcpu_mask = 0;
 for ( i = 0; i  d-max_vcpus; 

[Xen-devel] [PATCH v8 6/8] xen/arm: Set 'reg' of cpu node for dom0 to match MPIDR's affinity

2015-06-12 Thread Chen Baozi
From: Chen Baozi baoz...@gmail.com

According to ARM CPUs bindings, the reg field should match the MPIDR's
affinity bits. We will use AFF0 and AFF1 when constructing the reg value
of the guest at the moment, for it is enough for the current max vcpu
number.

Signed-off-by: Chen Baozi baoz...@gmail.com
Acked-by: Ian Campbell ian.campb...@citrix.com
Reviewed-by: Julien Grall julien.gr...@citrix.com
---
 xen/arch/arm/domain_build.c | 14 +++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
index a156de9..12b46ca 100644
--- a/xen/arch/arm/domain_build.c
+++ b/xen/arch/arm/domain_build.c
@@ -712,6 +712,7 @@ static int make_cpus_node(const struct domain *d, void *fdt,
 char buf[15];
 u32 clock_frequency;
 bool_t clock_valid;
+uint64_t mpidr_aff;
 
 DPRINT(Create cpus node\n);
 
@@ -761,9 +762,16 @@ static int make_cpus_node(const struct domain *d, void 
*fdt,
 
 for ( cpu = 0; cpu  d-max_vcpus; cpu++ )
 {
-DPRINT(Create cpu@%u node\n, cpu);
+/*
+ * According to ARM CPUs bindings, the reg field should match
+ * the MPIDR's affinity bits. We will use AFF0 and AFF1 when
+ * constructing the reg value of the guest at the moment, for it
+ * is enough for the current max vcpu number.
+ */
+mpidr_aff = vcpuid_to_vaffinity(cpu);
+DPRINT(Create cpu@%PRIx64 (logical CPUID: %d) node\n, mpidr_aff, 
cpu);
 
-snprintf(buf, sizeof(buf), cpu@%u, cpu);
+snprintf(buf, sizeof(buf), cpu@%lx, mpidr_aff);
 res = fdt_begin_node(fdt, buf);
 if ( res )
 return res;
@@ -776,7 +784,7 @@ static int make_cpus_node(const struct domain *d, void *fdt,
 if ( res )
 return res;
 
-res = fdt_property_cell(fdt, reg, cpu);
+res = fdt_property_cell(fdt, reg, mpidr_aff);
 if ( res )
 return res;
 
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v8 7/8] xen/arm: make domain_max_vcpus return value from vgic_ops

2015-06-12 Thread Chen Baozi
From: Chen Baozi baoz...@gmail.com

Each vGIC driver supports different maximum numbers of vCPU. For
example, GICv2 is limited to 8 vCPUs, while GICv3 can support up
to 4096 vCPUs if we use both AFF0 and AFF1. Thus, domain_max_vcpus
should depend on not only MAX_VIRT_CPUS but also the version
of vGIC that the guest uses.

Since evtchn_init would call domain_max_vcpus to allocate poll_mask
when the vgic_ops haven't been initialised yet, we make it return
MAX_VIRT_CPUS at that time. On ARM32, event channel doesn't need
to allocate the poll_mask because MAX_VIRT_CPUS  BITS_PER_LONG,
while allocating more memory (2 unsigned long rather than 1) only
for poll_mask on arm64 with GICv2 looks not so expensive.

We didn't keep it as the old static inline form because it will break
compilation when access the member of struct domain:

In file included from xen/include/xen/domain.h:6:0,
 from xen/include/xen/sched.h:10,
 from arm64/asm-offsets.c:10:
xen/include/asm/domain.h: In function ‘domain_max_vcpus’:
xen/include/asm/domain.h:266:10: error: dereferencing pointer to incomplete type
 if (d-arch.vgic.version == GIC_V2)
  ^

Signed-off-by: Chen Baozi baoz...@gmail.com
---
 xen/arch/arm/domain.c| 14 ++
 xen/arch/arm/vgic-v2.c   |  1 +
 xen/arch/arm/vgic-v3.c   |  5 +
 xen/include/asm-arm/domain.h |  5 +
 xen/include/asm-arm/vgic.h   |  2 ++
 5 files changed, 23 insertions(+), 4 deletions(-)

diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
index 0cf147c..01d8ca89 100644
--- a/xen/arch/arm/domain.c
+++ b/xen/arch/arm/domain.c
@@ -890,6 +890,20 @@ void vcpu_block_unless_event_pending(struct vcpu *v)
 vcpu_unblock(current);
 }
 
+unsigned int domain_max_vcpus(const struct domain *d)
+{
+/*
+ * Since evtchn_init would call domain_max_vcpus for poll_mask
+ * allocation when the vgic_ops haven't been initialised yet,
+ * we return MAX_VIRT_CPUS if d-arch.vgic.handler is null.
+ */
+if ( !d-arch.vgic.handler )
+return MAX_VIRT_CPUS;
+else
+return min_t(unsigned int, MAX_VIRT_CPUS,
+ d-arch.vgic.handler-max_vcpus);
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/arch/arm/vgic-v2.c b/xen/arch/arm/vgic-v2.c
index 5949cf1..bbeb740 100644
--- a/xen/arch/arm/vgic-v2.c
+++ b/xen/arch/arm/vgic-v2.c
@@ -585,6 +585,7 @@ const struct vgic_ops vgic_v2_ops = {
 .domain_init = vgic_v2_domain_init,
 .get_irq_priority = vgic_v2_get_irq_priority,
 .get_target_vcpu = vgic_v2_get_target_vcpu,
+.max_vcpus = 8,
 };
 
 /*
diff --git a/xen/arch/arm/vgic-v3.c b/xen/arch/arm/vgic-v3.c
index 93610d0..93af6c8 100644
--- a/xen/arch/arm/vgic-v3.c
+++ b/xen/arch/arm/vgic-v3.c
@@ -1204,6 +1204,11 @@ const struct vgic_ops vgic_v3_ops = {
 .get_irq_priority = vgic_v3_get_irq_priority,
 .get_target_vcpu  = vgic_v3_get_target_vcpu,
 .emulate_sysreg  = vgic_v3_emulate_sysreg,
+/*
+ * We use both AFF1 and AFF0 in (v)MPIDR. Thus, the max number of CPU
+ * that can be supported is up to 4096(256*16) in theory.
+ */
+.max_vcpus = 4096,
 };
 
 /*
diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h
index 35b9a6d..23598dd 100644
--- a/xen/include/asm-arm/domain.h
+++ b/xen/include/asm-arm/domain.h
@@ -261,10 +261,7 @@ struct arch_vcpu
 void vcpu_show_execution_state(struct vcpu *);
 void vcpu_show_registers(const struct vcpu *);
 
-static inline unsigned int domain_max_vcpus(const struct domain *d)
-{
-return MAX_VIRT_CPUS;
-}
+unsigned int domain_max_vcpus(const struct domain *);
 
 /*
  * Due to the restriction of GICv3, the number of vCPUs in AFF0 is
diff --git a/xen/include/asm-arm/vgic.h b/xen/include/asm-arm/vgic.h
index d68539a..4477962 100644
--- a/xen/include/asm-arm/vgic.h
+++ b/xen/include/asm-arm/vgic.h
@@ -115,6 +115,8 @@ struct vgic_ops {
 struct vcpu *(*get_target_vcpu)(struct vcpu *v, unsigned int irq);
 /* vGIC sysreg emulation */
 int (*emulate_sysreg)(struct cpu_user_regs *regs, union hsr hsr);
+/* Maximum number of vCPU supported */
+const unsigned int max_vcpus;
 };
 
 /* Number of ranks of interrupt registers for a domain */
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v8 8/8] xen/arm64: increase MAX_VIRT_CPUS to 128 on arm64

2015-06-12 Thread Chen Baozi
From: Chen Baozi baoz...@gmail.com

After we have increased the size of GICR in address space for guest
and made use of both AFF0 and AFF1 in (v)MPIDR, we are now able to
support up to 4096 vCPUs in theory. However, it will cost 512M
address space for GICR region, which is unnecessary big at the
moment. Considering the max CPU number that GIC-500 can support and
the old value of MAX_VIRT_CPUS before commit aa25a61, we increase
its value to 128.

Signed-off-by: Chen Baozi baoz...@gmail.com
---
Since the domain_max_vcpus has been changed to depends on vgic_ops,
we could have done more work in order to drop the definition of
MAX_VIRT_CPUS. However, because it is still used for some conditional
compilation in common code, I think that would be better done in a
separate cleanup patch series.

 xen/arch/arm/vgic-v3.c   | 1 -
 xen/include/asm-arm/config.h | 4 
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/xen/arch/arm/vgic-v3.c b/xen/arch/arm/vgic-v3.c
index 93af6c8..2548b62 100644
--- a/xen/arch/arm/vgic-v3.c
+++ b/xen/arch/arm/vgic-v3.c
@@ -889,7 +889,6 @@ static int vgic_v3_distr_mmio_write(struct vcpu *v, 
mmio_info_t *info)
 rank = vgic_rank_offset(v, 64, gicd_reg - GICD_IROUTER,
 DABT_DOUBLE_WORD);
 if ( rank == NULL ) goto write_ignore;
-BUG_ON(v-domain-max_vcpus  8);
 new_irouter = *r;
 vgic_lock_rank(v, rank, flags);
 
diff --git a/xen/include/asm-arm/config.h b/xen/include/asm-arm/config.h
index 3b23e05..817c216 100644
--- a/xen/include/asm-arm/config.h
+++ b/xen/include/asm-arm/config.h
@@ -47,7 +47,11 @@
 #define NR_CPUS 128
 #endif
 
+#ifdef CONFIG_ARM_64
+#define MAX_VIRT_CPUS 128
+#else
 #define MAX_VIRT_CPUS 8
+#endif
 
 #define asmlinkage /* Nothing needed */
 
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2] x86: Avoid tripping watchdog when constructing dom0

2015-06-12 Thread Ross Lagerwall
Constructing dom0 may take a few seconds, particularly if the slow VESA
graphics terminal is used. Process pending softirqs a few times to avoid
tripping a watchdog with a short timeout.

Signed-off-by: Ross Lagerwall ross.lagerw...@citrix.com
---

Changes in v2: Process pending softirqs inside ELF code.

 xen/arch/x86/domain_build.c| 4 
 xen/common/libelf/libelf-dominfo.c | 8 
 2 files changed, 12 insertions(+)

diff --git a/xen/arch/x86/domain_build.c b/xen/arch/x86/domain_build.c
index 03e4bfe..d76707f 100644
--- a/xen/arch/x86/domain_build.c
+++ b/xen/arch/x86/domain_build.c
@@ -930,6 +930,8 @@ int __init construct_dom0(
 BUG_ON(d-vcpu[0] == NULL);
 BUG_ON(v-is_initialised);
 
+process_pending_softirqs();
+
 printk(*** LOADING DOMAIN 0 ***\n);
 
 d-max_pages = ~0U;
@@ -1167,6 +1169,8 @@ int __init construct_dom0(
_p(v_start), _p(v_end));
 printk( ENTRY ADDRESS: %p\n, _p(parms.virt_entry));
 
+process_pending_softirqs();
+
 mpt_alloc = (vpt_start - v_start) + pfn_to_paddr(alloc_spfn);
 if ( vinitrd_start )
 mpt_alloc -= PAGE_ALIGN(initrd_len);
diff --git a/xen/common/libelf/libelf-dominfo.c 
b/xen/common/libelf/libelf-dominfo.c
index 6120dd4..4882c52 100644
--- a/xen/common/libelf/libelf-dominfo.c
+++ b/xen/common/libelf/libelf-dominfo.c
@@ -16,6 +16,10 @@
  * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301  
USA
  */
 
+#ifdef __XEN__
+#include xen/softirq.h
+#endif
+
 #include libelf-private.h
 
 /*  */
@@ -235,6 +239,10 @@ static unsigned elf_xen_parse_notes(struct elf_binary *elf,
   ELF_HANDLE_PTRVAL(note)  parms-elf_note_end;
   note = elf_note_next(elf, note) )
 {
+#ifdef __XEN__
+process_pending_softirqs();
+#endif
+
 if ( *total_note_count = ELF_MAX_TOTAL_NOTE_COUNT )
 {
 elf_mark_broken(elf, too many ELF notes);
-- 
2.1.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v3][PATCH 16/16] xen/vtd: prevent from assign the device with shared rmrr

2015-06-12 Thread Chen, Tiejun

On 2015/6/11 18:25, Tian, Kevin wrote:

From: Chen, Tiejun
Sent: Thursday, June 11, 2015 9:15 AM

Currently we're intending to cover this kind of devices


we're - we're not?


I mean currently we want to handle this shared case *simply* so I think 
its still we're, right?





with shared RMRR simply since the case of shared RMRR is
a rare case according to our previous experiences. But
late we can group these devices which shared rmrr, and
then allow all devices within a group to be assigned to
same domain.

Signed-off-by: Tiejun Chen tiejun.c...@intel.com


Acked-by: Kevin Tian kevin.t...@intel.com except one text
comment.


---
  xen/drivers/passthrough/vtd/iommu.c | 30
+++---
  1 file changed, 27 insertions(+), 3 deletions(-)

diff --git a/xen/drivers/passthrough/vtd/iommu.c
b/xen/drivers/passthrough/vtd/iommu.c
index d3233b8..f220081 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -2277,13 +2277,37 @@ static int intel_iommu_assign_device(
  if ( list_empty(acpi_drhd_units) )
  return -ENODEV;

+seg = pdev-seg;
+bus = pdev-bus;
+/*
+ * In rare cases one given rmrr is shared by multiple devices but
+ * obviously this would put the security of a system at risk. So
+ * we should prevent from this sort of device assignment.
+ *
+ * TODO: actually we can group these devices which shared rmrr, and
+ * then allow all devices within a group to be assigned to same domain.


TODO: in the future we can introduce group device assignment
interface to make sure devices sharing RMRR are assigned to the
same domain together.


Thank you to rephrase this.

Tiejun




+ */
+for_each_rmrr_device( rmrr, bdf, i )
+{
+if ( rmrr-segment == seg 
+ PCI_BUS(bdf) == bus 
+ PCI_DEVFN2(bdf) == devfn )
+{
+if ( rmrr-scope.devices_cnt  1 )
+{
+ret = -EPERM;
+printk(XENLOG_G_ERR VTDPREFIX
+cannot assign this device with shared RMRR for Dom%d 
(%d)\n,
+   d-domain_id, ret);
+return ret;
+}
+}
+}
+
  ret = reassign_device_ownership(hardware_domain, d, devfn, pdev);
  if ( ret )
  return ret;

-seg = pdev-seg;
-bus = pdev-bus;
-
  /* Setup rmrr identity mapping */
  for_each_rmrr_device( rmrr, bdf, i )
  {
--
1.9.1





___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] libxl backports for 4.4 and 4.5

2015-06-12 Thread Lars Kurth
Ian, Jan,

you seem to be disagreeing, or more likely Ian missed Jan's reply, so I wanted 
to double check.

 Right, but unless really urgent/important I think this is too late for
 4.5.1 now and needs to be put on the list of backports for 4.5.2.

Jan, just to clarify, are you saying back porting for 4.4.3 is OK, but 4.5.1 is 
too tight?  Looking at the 4 months cadence, 4.4.3 would be sometimes in July: 
so I can live with this as a compromise.

Even though having a released version of Xen which works out of the box with 
OpenStack is pretty important.

@Stefano, @Anthony: please keep on top of necessary back ports for OpenStack in 
future. I thought these have been requested and only found out when I talked to 
Anthony yesterday that this wasn't the case.

Regards
Lars
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v3][PATCH 04/16] xen/passthrough: extend hypercall to support rdm reservation policy

2015-06-12 Thread Chen, Tiejun

On 2015/6/11 17:28, Tian, Kevin wrote:

From: Chen, Tiejun
Sent: Thursday, June 11, 2015 9:15 AM

This patch extends the existing hypercall to support rdm reservation policy.
We return error or just throw out a warning message depending on whether
the policy is strict or relaxed when reserving RDM regions in pfn space.
Note in some special cases, e.g. add a device to hwdomain, and remove a
device from user domain, 'relaxed' is fine enough since this is always safe
to hwdomain.


could you elaborate  add a device to hwdomain, and remove a device
from user domain ? move a device from user domain to hwdomain
or completely irrelevant?


Yes, they're not relevant. And I think we shouldn't care our policy,

#1. When add a device to hwdomain

I think RMRR is always reserved on e820 so either of flag is fine.

#2. remove a device from domain

remove action also can ignore that since the original mechanism is 
enough.






Signed-off-by: Tiejun Chen tiejun.c...@intel.com
---
  xen/arch/x86/mm/p2m.c   |  8 +++-
  xen/drivers/passthrough/amd/pci_amd_iommu.c |  3 ++-
  xen/drivers/passthrough/arm/smmu.c  |  2 +-
  xen/drivers/passthrough/device_tree.c   | 11 ++-
  xen/drivers/passthrough/pci.c   | 10 ++
  xen/drivers/passthrough/vtd/iommu.c | 20 
  xen/include/asm-x86/p2m.h   |  2 +-
  xen/include/public/domctl.h |  5 +
  xen/include/xen/iommu.h |  2 +-
  9 files changed, 45 insertions(+), 18 deletions(-)

diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index c7198a5..3fcdcac 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -899,7 +899,7 @@ int set_mmio_p2m_entry(struct domain *d, unsigned long gfn,
mfn_t mfn,
  }

  int set_identity_p2m_entry(struct domain *d, unsigned long gfn,
-   p2m_access_t p2ma)
+   p2m_access_t p2ma, u32 flag)
  {
  p2m_type_t p2mt;
  p2m_access_t a;
@@ -924,6 +924,12 @@ int set_identity_p2m_entry(struct domain *d, unsigned long 
gfn,
  printk(XENLOG_G_WARNING
 Cannot identity map d%d:%lx, already mapped to %lx.\n,
 d-domain_id, gfn, mfn_x(mfn));
+
+if ( flag == XEN_DOMCTL_DEV_RDM_RELAXED )
+{
+ret = 0;
+printk(XENLOG_G_WARNING Some devices may work failed.\n);


Do you need this extra printk? The warning message is already given
several lines above and here you just need to change return value
for relaxed policy.


Okay.




+}
  }

  gfn_unlock(p2m, gfn, 0);
diff --git a/xen/drivers/passthrough/amd/pci_amd_iommu.c
b/xen/drivers/passthrough/amd/pci_amd_iommu.c
index e83bb35..920b35a 100644
--- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
+++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
@@ -394,7 +394,8 @@ static int reassign_device(struct domain *source, struct 
domain
*target,
  }

  static int amd_iommu_assign_device(struct domain *d, u8 devfn,
-   struct pci_dev *pdev)
+   struct pci_dev *pdev,
+   u32 flag)
  {
  struct ivrs_mappings *ivrs_mappings = get_ivrs_mappings(pdev-seg);
  int bdf = PCI_BDF2(pdev-bus, devfn);
diff --git a/xen/drivers/passthrough/arm/smmu.c
b/xen/drivers/passthrough/arm/smmu.c
index 6cc4394..9a667e9 100644
--- a/xen/drivers/passthrough/arm/smmu.c
+++ b/xen/drivers/passthrough/arm/smmu.c
@@ -2605,7 +2605,7 @@ static void arm_smmu_destroy_iommu_domain(struct
iommu_domain *domain)
  }

  static int arm_smmu_assign_dev(struct domain *d, u8 devfn,
-  struct device *dev)
+  struct device *dev, u32 flag)
  {
struct iommu_domain *domain;
struct arm_smmu_xen_domain *xen_domain;
diff --git a/xen/drivers/passthrough/device_tree.c
b/xen/drivers/passthrough/device_tree.c
index 5d3842a..ea85645 100644
--- a/xen/drivers/passthrough/device_tree.c
+++ b/xen/drivers/passthrough/device_tree.c
@@ -52,7 +52,8 @@ int iommu_assign_dt_device(struct domain *d, struct
dt_device_node *dev)
  goto fail;
  }

-rc = hd-platform_ops-assign_device(d, 0, dt_to_dev(dev));
+rc = hd-platform_ops-assign_device(d, 0, dt_to_dev(dev),
+ XEN_DOMCTL_DEV_NO_RDM);

  if ( rc )
  goto fail;
@@ -148,6 +149,14 @@ int iommu_do_dt_domctl(struct xen_domctl *domctl, struct
domain *d,
  if ( domctl-u.assign_device.dev != XEN_DOMCTL_DEV_DT )
  break;

+if ( domctl-u.assign_device.flag == XEN_DOMCTL_DEV_NO_RDM )
+{
+printk(XENLOG_G_ERR XEN_DOMCTL_assign_device: assign \%s\
+to dom%u failed (%d) since we don't support RDM.\n,
+   dt_node_full_name(dev), d-domain_id, ret);
+break;
+}
+
  if ( 

Re: [Xen-devel] [RFC v2] misc/xenmicrocode: Upload /lib/firmware/some blob to the hypervisor

2015-06-12 Thread Andrew Cooper
On 12/06/15 03:30, Luis R. Rodriguez wrote:
 On Fri, Jan 30, 2015 at 1:57 PM, Luis R. Rodriguez mcg...@suse.com wrote:
 On Fri, Jan 30, 2015 at 08:37:33PM +, Andrew Cooper wrote:
 The right thing to do is to fix the hypercall implementation, at which
 point the utility below is fine and xc_microcode_update() can be a thing
 wrapper around the hypercall.

 The actions Xen needs to take are:

 - Copy the buffer into Xen.
 - Scan the buffer for the correct patch
 - Rendezvous all online cpus in an IPI to apply the patch, and keep the
 processors in until all have completed the patch.

 The IPI itself probably wants common rendezvous code, and a system
 specific call for application.  The system specific call will need to
 adhere to the requirements in the relevant manual.  Care will have to be
 taken to avoid deadlocking with the time calibration rendezvous, and
 facilities such as the watchdog might temporally need pausing.

 If you feel up to all that, then please go ahead.  If not, I will
 attempt to find some copious free time.
 You can have a crack at that. Let me know when the above is ready and I'll
 respin. I'd try it but it seems you would spend considerbly less time than
 me doing the above.
 Andrew, wanted to check to see if you have had time to work on the above.

Sadly no - I have not had time.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v3][PATCH 03/16] xen/vtd: create RMRR mapping

2015-06-12 Thread Chen, Tiejun

On 2015/6/12 13:59, Tian, Kevin wrote:

From: Chen, Tiejun
Sent: Friday, June 12, 2015 1:58 PM

On 2015/6/12 10:43, Chen, Tiejun wrote:

On 2015/6/11 22:07, Tim Deegan wrote:

At 17:31 +0800 on 11 Jun (1434043916), Chen, Tiejun wrote:

while ( base_pfn  end_pfn )
{
-int err = intel_iommu_map_page(d, base_pfn, base_pfn,
-
IOMMUF_readable|IOMMUF_writable);
+int err = set_identity_p2m_entry(d, base_pfn, p2m_access_rw);

if ( err )
return err;


Tim has another comment to replace earlier unmap with


Yes, I knew this.


guest_physmap_remove_page() which will call iommu
unmap internally. Please include this change too.



But,

guest_physmap_remove_page()
   |
   + p2m_remove_page()
 |
 + iommu_unmap_page()
 |
 + p2m_set_entry(p2m, gfn, _mfn(INVALID_MFN), xxx)

I think this already remove these pages both on ept/vt-d sides, right?


Yes; this is about this code further up in the same function:

 while ( base_pfn  end_pfn )
 {
 if ( intel_iommu_unmap_page(d, base_pfn) )
 ret = -ENXIO;
 base_pfn++;
 }

which ought to be calling guest_physmap_remove_page() or similar, to
make sure that both iommu and EPT mappings get removed.



I still just think current implementation might be fine at this point.

We have two scenarios here, the case of shared ept and the case of
non-shared ept. But no matter what case we're tracking, shouldn't
guest_physmap_remove_page() always call p2m-set_entry() to clear *all*
*valid* mfn which is owned by a given VM? And p2m-set_entry() also
calls iommu_unmap_page() internally. So nothing special should further
consider.

If I'm wrong or misunderstanding, please correct me :)



Sorry for my misunderstanding to this. Right now Kevin help me
understand what you mean. Sounds like we should address something
specific to unmap rmrr here.

So I will do this as follows:

#1. Provide a clear helper

+int clear_identity_p2m_entry(struct domain *d, unsigned long gfn,
+ unsigned int page_order)
+{
+struct p2m_domain *p2m = p2m_get_hostp2m(d);
+int ret;
+gfn_lock(p2m, gfn, page_order);
+ret = p2m_remove_page(p2m, gfn, gfn, page_order);
+gfn_unlock(p2m, gfn, page_order);
+return ret;
+}
+

#2. Call such a helper

@@ -1840,7 +1840,7 @@ static int rmrr_identity_mapping(struct domain *d,
bool_t map,

   while ( base_pfn  end_pfn )
   {
-if ( intel_iommu_unmap_page(d, base_pfn) )
+if ( clear_identity_p2m_entry(d, base_pfn, 0) )
   ret = -ENXIO;
   base_pfn++;
   }
Is this right?

Thanks
Tiejun


could you explain why existing guest_physmap_remove_page can't
serve the purpose so you need invent a new identity mapping
specific one? For unmapping suppose it should be common regardless
of whether it's identity-mapped or not. :-)


I have some concerns here:

#1. guest_physmap_remove_page() is a void function without a returning 
value, so you still need a little change.


#2. guest_physmap_remove_page() doesn't make readable in such a code 
context;


rmrr_identity_mapping()
{
...
guest_physmap_remove_page()
...
}

#3. a new helper is good to further extend if necessary.

Of course, I'd like to modify-to-use guest_physmap_remove_page() if you 
guys aren't in agreement with me :)


Thanks
Tiejun

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH V4 5/7] xl: add pvusb commands

2015-06-12 Thread Chun Yan Liu


 On 6/12/2015 at 03:37 PM, in message 557a8c57.1040...@suse.com, Juergen 
 Gross
jgr...@suse.com wrote: 
 On 06/10/2015 05:20 AM, Chunyan Liu wrote: 
  Add pvusb commands: usb-ctrl-attach, usb-ctrl-detach, usb-list, 
  usb-attach and usb-detach. 
  
  To attach a usb device to guest through pvusb, one could follow 
  following example: 
  
#xl usb-ctrl-attach test_vm version=1 num_ports=8 
  
#xl usb-list test_vm 
will show the usb controllers and port usage under the domain. 
  
#xl usb-assignable-list 
will list assignable USB devices 
  
 xl usb-assignable-list is not part of this patch. Either merge this 
 patch and the following one, or describe the command in the next patch. 

Oh, yes, I forget to split.

  
  
#xl usb-attach test_vm 1.6 
will find the first usable controller:port, and attach usb 
device whose bus address is 1.6 (busnum is 1, devnum is 6) 
to it. One could also specify which controller and which port. 
  
#xl usb-detach test_vm 1.6 
  
#xl usb-ctrl-detach test_vm dev_id 
will destroy the controller with specified dev_id. Dev_id 
can be traced in usb-list info. 
  
  Signed-off-by: Chunyan Liu cy...@suse.com 
  Signed-off-by: Simon Cao caobosi...@gmail.com 
  --- 
docs/man/xl.pod.1 |  38 +++ 
tools/libxl/xl.h  |   5 + 
tools/libxl/xl_cmdimpl.c  | 251  
 ++ 
tools/libxl/xl_cmdtable.c |  25 + 
4 files changed, 319 insertions(+) 
  
  
 ... 
  
  diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c 
  index c858068..b29d0fc 100644 
  --- a/tools/libxl/xl_cmdimpl.c 
  +++ b/tools/libxl/xl_cmdimpl.c 
  @@ -3219,6 +3219,257 @@ int main_cd_insert(int argc, char **argv) 
return 0; 
} 
  
  +static void usbinfo_print(libxl_device_usb *usbs, int num) { 
  +int i; 
  
 Blank line missing. 
  
  +if (usbs == NULL) 
  + return; 
  +for (i = 0; i  num; i++) { 
  +libxl_usbinfo usbinfo; 
  
 Blank line missing. 
  
  +libxl_usbinfo_init(usbinfo); 
  + 
  +if (usbs[i].port) 
  +printf(  Port %d:, usbs[i].port); 
  +if (!libxl_device_usb_getinfo(ctx, usbs[i], usbinfo)) { 
  +printf( Bus %03x Device %03x: ID %04x:%04x %s %s\n, 
  +usbinfo.busnum, usbinfo.devnum, 
  +usbinfo.idVendor, usbinfo.idProduct, 
  +usbinfo.manuf ?: , usbinfo.prod ?: ); 
  
 Is it really possible for a device to be assigned but without a port 
 number? 

For assigned usb device, it's not possible. But this function will be called
in two places, one is to list assigned usb devices in 'xl usb-list'; another
is to list assignable usb devices in 'xl usb-assignable-list'. If
'usb-assignable-list' is not taken, this could be improved.

  
 I'd rather combine the two if's and printf statements. This would avoid 
 the case where   Port 1: Port 2: ... is printed due to a failing 
 libxl_device_usb_getinfo() for port 1. 
  
  +} 
  +libxl_usbinfo_dispose(usbinfo); 
  +} 
  +} 
  + 
  +int main_usbctrl_attach(int argc, char **argv) 
  +{ 
  +uint32_t domid; 
  +int opt; 
  +char *oparg; 
  +libxl_device_usbctrl usbctrl; 
  + 
  +SWITCH_FOREACH_OPT(opt, , NULL, usb-ctrl-attach, 1) { 
  +/* No options */ 
  +} 
  + 
  +domid = find_domain(argv[optind++]); 
  + 
  +libxl_device_usbctrl_init(usbctrl); 
  + 
  +while (argc  optind) { 
  +if (MATCH_OPTION(type, argv[optind], oparg)) { 
  +if (!strcmp(oparg, pv)) { 
  +   usbctrl.protocol = LIBXL_USB_PROTOCOL_PV; 
  +} else { 
  +   fprintf(stderr, unsupported type `%s'\n, oparg); 
  +   exit(-1); 
  +} 
  +} else if (MATCH_OPTION(version, argv[optind], oparg)) { 
  +usbctrl.version = atoi(oparg); 
  
 Shouldn't you check for valid versions? 

OK. Will check.

  
  +} else if (MATCH_OPTION(ports, argv[optind], oparg)) { 
  +usbctrl.ports = atoi(oparg); 
  
 Same here for number of ports. Otherwise you could blow up xenstore by 
 e.g. specifying 2 billion ports here. 

Will add check. What's the supported max ports? 

  
  +} else { 
  +fprintf(stderr, unrecognized argument `%s'\n, argv[optind]); 
  +exit(-1); 
  +} 
  +optind++; 
  +} 
  + 
  +if (usbctrl.protocol == LIBXL_USB_PROTOCOL_AUTO) 
  +usbctrl.protocol = LIBXL_USB_PROTOCOL_PV; 
  
 Is this really necessary? You do it in libxl, too. 
Yes, seems not necessary. Anyway (from config file or hotplug), it will
call libxl_device_usbctrl_setdefault and do that.

Thanks,
Chunyan
  
  
 Juergen 
  
 ___ 
 Xen-devel mailing list 
 Xen-devel@lists.xen.org 
 http://lists.xen.org/xen-devel 
  
  


___
Xen-devel mailing list

[Xen-devel] [PATCH] pvusb: don't rely on linux kernel macros for the interface

2015-06-12 Thread Juergen Gross
The interface description of pvUSB lacks some access macros as using
linux kernel macros is assumed to work well. This solution is rather
unfriendly for pvusb implementations being outside the linux kernel.
Additionally things will break quite unpleasent in case the linux
kernel implementation is changed.

To avoid these problems define own macros for accessing bitfields of
the interface and for values of several structure members.

While working on the file add some more comments, especially for the
xenstore interface.

Signed-off-by: Juergen Gross jgr...@suse.com
---
 xen/include/public/io/usbif.h | 140 --
 1 file changed, 121 insertions(+), 19 deletions(-)

diff --git a/xen/include/public/io/usbif.h b/xen/include/public/io/usbif.h
index 0af2a38..9ef0cdc 100644
--- a/xen/include/public/io/usbif.h
+++ b/xen/include/public/io/usbif.h
@@ -31,6 +31,76 @@
 #include ring.h
 #include ../grant_table.h
 
+/*
+ * Feature and Parameter Negotiation
+ * =
+ * The two halves of a Xen pvUSB driver utilize nodes within the XenStore to
+ * communicate capabilities and to negotiate operating parameters. This
+ * section enumerates these nodes which reside in the respective front and
+ * backend portions of the XenStore, following the XenBus convention.
+ *
+ * Any specified default value is in effect if the corresponding XenBus node
+ * is not present in the XenStore.
+ *
+ * XenStore nodes in sections marked PRIVATE are solely for use by the
+ * driver side whose XenBus tree contains them.
+ *
+ *
+ *Backend XenBus Nodes
+ *
+ *
+ *-- Backend Device Identification (PRIVATE) --
+ *
+ * num-ports
+ *  Values: unsigned [1...31]
+ *
+ *  Number of ports for this (virtual) USB host connector.
+ *
+ * usb-ver
+ *  Values: unsigned [1...2]
+ *
+ *  USB version of this host connector: 1 = USB 1.1, 2 = USB 2.0.
+ *
+ * port/[1...31]
+ *  Values: string
+ *
+ *  Physical USB device connected to the given port, e.g. 3-1.5.
+ *
+ *
+ *Frontend XenBus Nodes
+ *
+ *
+ *--- Request Transport Parameters ---
+ *
+ * event-channel
+ *  Values: unsigned
+ *
+ *  The identifier of the Xen event channel used to signal activity
+ *  in the ring buffer.
+ *
+ * urb-ring-ref
+ *  Values: unsigned
+ *
+ *  The Xen grant reference granting permission for the backend to map
+ *  the sole page in a single page sized ring buffer. This is the ring
+ *  buffer for urb requests.
+ *
+ * conn-ring-ref
+ *  Values: unsigned
+ *
+ *  The Xen grant reference granting permission for the backend to map
+ *  the sole page in a single page sized ring buffer. This is the ring
+ *  buffer for connection/disconnection requests.
+ *
+ * protocol
+ *  Values: string (XEN_IO_PROTO_ABI_*)
+ *  Default Value:  XEN_IO_PROTO_ABI_NATIVE
+ *
+ *  The machine ABI rules governing the format of all ring request and
+ *  response structures.
+ *
+ */
+
 enum usb_spec_version {
USB_VER_UNKNOWN = 0,
USB_VER_USB11,
@@ -41,37 +111,65 @@ enum usb_spec_version {
 /*
  *  USB pipe in usbif_request
  *
- *  bits 0-5 are specific bits for virtual USB driver.
- *  bits 7-31 are standard urb pipe.
- *
- *  - port number(NEW):bits 0-4
- * (USB_MAXCHILDREN is 31)
+ *  - port number: bits 0-4
+ * (USB_MAXCHILDREN is 31)
  *
- *  - operation flag(NEW): bit 5
- * (0 = submit urb,
- *  1 = unlink urb)
+ *  - operation flag:  bit 5
+ * (0 = submit urb,
+ *  1 = unlink urb)
  *
  *  - direction:   bit 7
- * (0 = Host-to-Device [Out]
- *   1 = Device-to-Host [In])
+ * (0 = Host-to-Device [Out]
+ *  1 = Device-to-Host [In])
  *
  *  - device address:  bits 8-14
  *
  *  - endpoint:bits 15-18
  *
- *  - pipe type:   bits 30-31
- * (00 = isochronous, 01 = interrupt,
- *   10 = control, 11 = bulk)
+ *  - pipe type:   bits 30-31
+ * (00 = isochronous, 01 = interrupt,
+ *  10 = control, 11 = bulk)
  */
-#define usbif_pipeportnum(pipe) ((pipe)  0x1f)
-#define usbif_setportnum_pipe(pipe, portnum) \
-   

Re: [Xen-devel] [OSSTEST Nested PATCH v11 6/7] Compose the main recipe of nested test job

2015-06-12 Thread Ian Campbell
On Fri, 2015-06-12 at 11:42 +0800, Robert Hu wrote:
 Hi Ian J., because nested Xen isn't that matured at present; it is
 currently 'tech preview' phase. We now just add some sanity test case to
 defend current work fruits. We can add more complicated test cases later
 as nested Xen development moves forward.

I don't think it needs doing as part of this series, but I do think it
would be worth adding the standard suite of tests steps to L2 guests
soon after.

The way osstest deals with test failures i.e. classifying them into has
always failed vs. regression means that it is fine to add tests for
features which aren't mature yet, since they will fall into the has
always failed set and not block pushes. In fact it is in some sense
good to do this since it gives a concrete set of (necessary but not
necessarily sufficient) things which need to be fixed for the feature to
reach maturity.

Ian.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v3][PATCH 04/16] xen/passthrough: extend hypercall to support rdm reservation policy

2015-06-12 Thread Jan Beulich
 On 12.06.15 at 08:31, tiejun.c...@intel.com wrote:
 On 2015/6/11 17:28, Tian, Kevin wrote:
 From: Chen, Tiejun
 Sent: Thursday, June 11, 2015 9:15 AM
 @@ -1940,7 +1942,8 @@ static int intel_iommu_remove_device(u8 devfn, struct 
 pci_dev
 *pdev)
PCI_DEVFN2(bdf) != devfn )
   continue;

 -rmrr_identity_mapping(pdev-domain, 0, rmrr);
 +rmrr_identity_mapping(pdev-domain, 0, rmrr,
 +  XEN_DOMCTL_DEV_RDM_RELAXED);

 ditto
 
 It doesn't matter when we're trying to remove a device since we don't 
 care this flag.

In such a case it helps to add a brief comment saying that the precise
value passed is irrelevant. Or maybe this could be expressed by
folding this and the map parameters of the function (in which case it
might become self-documenting)?

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] libxl backports for 4.4 and 4.5

2015-06-12 Thread Jan Beulich
 On 12.06.15 at 11:32, lars.kurth@gmail.com wrote:
 Ian, Jan,
 
 you seem to be disagreeing, or more likely Ian missed Jan's reply, so I 
 wanted to double check.
 
 Right, but unless really urgent/important I think this is too late for
 4.5.1 now and needs to be put on the list of backports for 4.5.2.
 
 Jan, just to clarify, are you saying back porting for 4.4.3 is OK, but 4.5.1 
 is too tight?  Looking at the 4 months cadence, 4.4.3 would be sometimes in 
 July: so I can live with this as a compromise.
 
 Even though having a released version of Xen which works out of the box with 
 OpenStack is pretty important.

The release being stuck anyway, perhaps we can put them in with
the understanding that this time round we'll need at least another
RC. Ian - you know how risky the changes are, please use your
judgment.

(Btw., as to 4.4.3, I don't think I want to kick off that process until
4.5.1 gets out of its stuck state, i.e. as all stable trees are affected
by the underlying problem, I don't want both trees to be hold up half
way through the stable release process.)

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH V4 5/7] xl: add pvusb commands

2015-06-12 Thread Juergen Gross

On 06/10/2015 05:20 AM, Chunyan Liu wrote:

Add pvusb commands: usb-ctrl-attach, usb-ctrl-detach, usb-list,
usb-attach and usb-detach.

To attach a usb device to guest through pvusb, one could follow
following example:

  #xl usb-ctrl-attach test_vm version=1 num_ports=8

  #xl usb-list test_vm
  will show the usb controllers and port usage under the domain.

  #xl usb-assignable-list
  will list assignable USB devices


xl usb-assignable-list is not part of this patch. Either merge this
patch and the following one, or describe the command in the next patch.



  #xl usb-attach test_vm 1.6
  will find the first usable controller:port, and attach usb
  device whose bus address is 1.6 (busnum is 1, devnum is 6)
  to it. One could also specify which controller and which port.

  #xl usb-detach test_vm 1.6

  #xl usb-ctrl-detach test_vm dev_id
  will destroy the controller with specified dev_id. Dev_id
  can be traced in usb-list info.

Signed-off-by: Chunyan Liu cy...@suse.com
Signed-off-by: Simon Cao caobosi...@gmail.com
---
  docs/man/xl.pod.1 |  38 +++
  tools/libxl/xl.h  |   5 +
  tools/libxl/xl_cmdimpl.c  | 251 ++
  tools/libxl/xl_cmdtable.c |  25 +
  4 files changed, 319 insertions(+)



...


diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index c858068..b29d0fc 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -3219,6 +3219,257 @@ int main_cd_insert(int argc, char **argv)
  return 0;
  }

+static void usbinfo_print(libxl_device_usb *usbs, int num) {
+int i;


Blank line missing.


+if (usbs == NULL)
+ return;
+for (i = 0; i  num; i++) {
+libxl_usbinfo usbinfo;


Blank line missing.


+libxl_usbinfo_init(usbinfo);
+
+if (usbs[i].port)
+printf(  Port %d:, usbs[i].port);
+if (!libxl_device_usb_getinfo(ctx, usbs[i], usbinfo)) {
+printf( Bus %03x Device %03x: ID %04x:%04x %s %s\n,
+usbinfo.busnum, usbinfo.devnum,
+usbinfo.idVendor, usbinfo.idProduct,
+usbinfo.manuf ?: , usbinfo.prod ?: );


Is it really possible for a device to be assigned but without a port
number?

I'd rather combine the two if's and printf statements. This would avoid
the case where   Port 1: Port 2: ... is printed due to a failing
libxl_device_usb_getinfo() for port 1.


+}
+libxl_usbinfo_dispose(usbinfo);
+}
+}
+
+int main_usbctrl_attach(int argc, char **argv)
+{
+uint32_t domid;
+int opt;
+char *oparg;
+libxl_device_usbctrl usbctrl;
+
+SWITCH_FOREACH_OPT(opt, , NULL, usb-ctrl-attach, 1) {
+/* No options */
+}
+
+domid = find_domain(argv[optind++]);
+
+libxl_device_usbctrl_init(usbctrl);
+
+while (argc  optind) {
+if (MATCH_OPTION(type, argv[optind], oparg)) {
+if (!strcmp(oparg, pv)) {
+   usbctrl.protocol = LIBXL_USB_PROTOCOL_PV;
+} else {
+   fprintf(stderr, unsupported type `%s'\n, oparg);
+   exit(-1);
+}
+} else if (MATCH_OPTION(version, argv[optind], oparg)) {
+usbctrl.version = atoi(oparg);


Shouldn't you check for valid versions?


+} else if (MATCH_OPTION(ports, argv[optind], oparg)) {
+usbctrl.ports = atoi(oparg);


Same here for number of ports. Otherwise you could blow up xenstore by
e.g. specifying 2 billion ports here.


+} else {
+fprintf(stderr, unrecognized argument `%s'\n, argv[optind]);
+exit(-1);
+}
+optind++;
+}
+
+if (usbctrl.protocol == LIBXL_USB_PROTOCOL_AUTO)
+usbctrl.protocol = LIBXL_USB_PROTOCOL_PV;


Is this really necessary? You do it in libxl, too.


Juergen

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3 10/11] x86/intel_pstate: support the use of intel_pstate in pmstat.c

2015-06-12 Thread Jan Beulich
 On 12.06.15 at 05:03, wei.w.w...@intel.com wrote:
 On 11/06/2015 22:02, Julien Grall wrote:
 On 11/06/2015 04:31, Wei Wang wrote:
  +else
  +{
  +list_for_each(pos, cpufreq_governor_list)
gov_num++;
 
 The indentation looks wrong to me.
 
 It has four +$, should be correct.

The gov_num++ is supposedly the body of the list_for_each(), and
hence needs to be indented by four more spaces than the loop header
(i.e. a total of 12).

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [OSSTEST Nested PATCH v11 6/7] Compose the main recipe of nested test job

2015-06-12 Thread Ian Campbell
On Fri, 2015-06-12 at 17:00 +0800, Robert Hu wrote:
 On Fri, 2015-06-12 at 09:44 +0100, Ian Campbell wrote:
  On Fri, 2015-06-12 at 11:42 +0800, Robert Hu wrote:
   Hi Ian J., because nested Xen isn't that matured at present; it is
   currently 'tech preview' phase. We now just add some sanity test case to
   defend current work fruits. We can add more complicated test cases later
   as nested Xen development moves forward.
  
  I don't think it needs doing as part of this series, but I do think it
  would be worth adding the standard suite of tests steps to L2 guests
  soon after.
 Agree. We can separately add those part in a later patch series.  
  
  The way osstest deals with test failures i.e. classifying them into has
  always failed vs. regression means that it is fine to add tests for
  features which aren't mature yet, since they will fall into the has
  always failed set and not block pushes. In fact it is in some sense
  good to do this since it gives a concrete set of (necessary but not
  necessarily sufficient) things which need to be fixed for the feature to
  reach maturity.
 We would like this test job's failure to be marked 'regression', which
 shall block related breaking code's commitment.

This is all done automatically, by comparison with previous flights.

In essence if it passed last time then it will be considered a
regression if it fails and for as long as it keeps failing (unless it is
manually overridden via a forced push).

So there is nothing to be marked.

 We can consider to contribute more code later, including proved/success
 test cases (which will fall into regression as this case), and fail test
 cases (which shall soon be fixed/developed).

You should just add the cases, await the first results and investigate
if it is not a pass as expected.

Ian.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: zero ioreq page only one time

2015-06-12 Thread Paul Durrant
 -Original Message-
 From: Wen Congyang [mailto:we...@cn.fujitsu.com]
 Sent: 12 June 2015 04:22
 To: Paul Durrant; Yang Hongyang; Andrew Cooper; xen-devel@lists.xen.org
 Cc: Wei Liu; Ian Campbell; guijianf...@cn.fujitsu.com;
 yunhong.ji...@intel.com; Eddie Dong; rshri...@cs.ubc.ca; Ian Jackson
 Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: zero ioreq
 page only one time
 
 On 06/11/2015 09:25 PM, Paul Durrant wrote:
  -Original Message-
  From: Yang Hongyang [mailto:yan...@cn.fujitsu.com]
  Sent: 11 June 2015 13:59
  To: Paul Durrant; Wen Congyang; Andrew Cooper; xen-
 de...@lists.xen.org
  Cc: Wei Liu; Ian Campbell; guijianf...@cn.fujitsu.com;
  yunhong.ji...@intel.com; Eddie Dong; rshri...@cs.ubc.ca; Ian Jackson
  Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: zero
 ioreq
  page only one time
 
 
 
  On 06/11/2015 06:20 PM, Paul Durrant wrote:
  -Original Message-
  From: Wen Congyang [mailto:we...@cn.fujitsu.com]
  Sent: 11 June 2015 09:48
  To: Paul Durrant; Andrew Cooper; Yang Hongyang; xen-
  de...@lists.xen.org
  Cc: Wei Liu; Ian Campbell; guijianf...@cn.fujitsu.com;
  [...]
 
 
  In our implementation, we don't start a new emulator. The codes can
  work,
  but some bugs may be not triggered.
 
 
  How do you reconcile the incoming QEMU save record with the running
  emulator state?
 
  We introduce a qmp command xen-load-devices-
  state(libxl__qmp_restore) which
  can restore the emulator state. The step of resotre emulator state at a
  checkpoint is:
 
  1. libxl__qmp_stop- vm_stop() in qemu
  2. libxl__qmp_restore - load_vmstate() in qemu
  3. libxl__qmp_resume  - vm_start() in qemu
 
 
  Ok, that sounds like the ideal time to hook back into Xen by creating a new
 ioreq server.
 
 I have some questions about ioreq server:
 1. If we use old version xen and newest version qemu, is it OK? Is default
ioreq server created when the guest is created. xen_create_ioreq_server()
 does
nothing, and xen_get_ioreq_server_info() will get the default ioreq server
 information.
Is it right?

No. It's not compatible in that direction. A new Xen will work with an old QEMU 
but not the other way round.

 2. Why we create a default ioreq server when getting the hvm param if there
 is already a
not default ioreq server?

If something reads the 'legacy' HVM params then that is Xen's trigger to create 
the default server. Any 'new' emulator should be using the ioreq server 
hypercalls so the default server will not be needed.

 3. In the far end, we will clear the ioreq page, and this ioreq page is used 
 for
 default
ioreq server, is it right?

Yes, AFAIK it's only the 'magic' pages that get cleared at the far end - and 
that includes the default server pages. Other ioreq servers will have their 
pages cleared on re-insertion to the P2M at the source end when the server is 
disabled.

  Paul

 
 Thanks
 Wen Congyang
 
 
Paul
 
 
 Paul
 
  Thanks
  Wen Congyang
 
 
 Paul
 
 
  Thanks
  Wen Congyang
 
 
 Paul
 
  We will set to the guest to a new state, the old state should be
  dropped.
 
  Thanks
  Wen Congyang
 
 
 Paul
 
 
  Thanks
  Wen Congyang
 
 
 Paul
 
 
  Thanks
  Wen Congyang
 
 
  ~Andrew
  .
 
 
  .
 
 
  .
 
 
 
  ___
  Xen-devel mailing list
  Xen-devel@lists.xen.org
  http://lists.xen.org/xen-devel
 
  ___
  Xen-devel mailing list
  Xen-devel@lists.xen.org
  http://lists.xen.org/xen-devel
  .
 
 
  .
 
 
 
  ___
  Xen-devel mailing list
  Xen-devel@lists.xen.org
  http://lists.xen.org/xen-devel
  .
 
 
  --
  Thanks,
  Yang.
  .
 


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH V4 3/7] libxl: add pvusb API

2015-06-12 Thread Chun Yan Liu


 On 6/12/2015 at 12:42 AM, in message
21881.47707.526863.158...@mariner.uk.xensource.com, Ian Jackson
ian.jack...@eu.citrix.com wrote: 
 Chunyan Liu writes ([Xen-devel] [PATCH V4 3/7] libxl: add pvusb API): 
  Add pvusb APIs, including: 
 ... 
  
 Thanks for your contribution.  I'm afraid I haven't had time to 
 completely finish my review this, but here's what I have: 
  
  --- /dev/null 
  +++ b/docs/misc/pvusb.txt 
  @@ -0,0 +1,243 @@ 
  +1. Overview 
  
 It's good that you have provided documentation.  But I think this 
 document is a bit confused about its audience. 
  
 Information about design choices should be removed from this user- and 
 application-facing document, and put in comments in the code, or 
 commit messages, I think. 

Thanks. Will update.

  
  
  +int libxl_device_usbctrl_getinfo(libxl_ctx *ctx, uint32_t domid, 
  +libxl_device_usbctrl *usbctrl, 
  +libxl_usbctrlinfo *usbctrlinfo) 
  +LIBXL_EXTERNAL_CALLERS_ONLY; 
  
 Why is this function marked LIBXL_EXTERNAL_CALLERS_ONLY ? 

Currently libxl itself won't call it. Exposed for toolstack usage. Will
remove that if it's not proper.

  
  diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h 
  index ef7aa1d..89a9f07 100644 
  --- a/tools/libxl/libxl_internal.h 
  +++ b/tools/libxl/libxl_internal.h 
  @@ -2439,6 +2439,18 @@ _hidden void libxl__device_vtpm_add(libxl__egc *egc, 
   
 uint32_t domid, 
 ... 
  +/* AO operation to connect a PVUSB controller. 
  + * Adding a PVUSB controller will add a 'vusb' device entry in xenstore, 
  + * and will wait for device connection. 
  
 In this context I think will wait for device connection is 
 misleading.  What you mean is that the vusb is available for adding 
 devices to, but won't have any yet.  Nothing in libxl is waiting. 

Here I mean libxl_wait_for_device_connection. Since it adds a new device entry
to xenstore, it needs to wait for a moment for frontend/backend connection.

  
  diff --git a/tools/libxl/libxl_pvusb.c b/tools/libxl/libxl_pvusb.c 
  new file mode 100644 
  index 000..a6e1aa1 
  --- /dev/null 
  +++ b/tools/libxl/libxl_pvusb.c 
  @@ -0,0 +1,1249 @@ 
 ... 
  +void libxl__device_usbctrl_add(libxl__egc *egc, uint32_t domid, 
  +   libxl_device_usbctrl *usbctrl, 
  +   libxl__ao_device *aodev) 
  +{ 
  +STATE_AO_GC(aodev-ao); 
 ... 
  +libxl_domain_config_init(d_config); 
  +libxl_device_usbctrl_init(usbctrl_saved); 
  +libxl_device_usbctrl_copy(CTX, usbctrl_saved, usbctrl); 
  
 Wei will need to review the saved/live saved device info handling, and 
 the json, etc. 
  
  +static int 
  +libxl_device_usbctrl_remove_common(libxl_ctx *ctx, uint32_t domid, 
  +   libxl_device_usbctrl *usbctrl, 
  +   const libxl_asyncop_how *ao_how, 
  +   int force) 
  
 As discussed, you mustn't call this within libxl.
I don't quite understand why. I guess it's the same as usb_add problem,
something related to embedded ao?
As in usb_add:
libxl_device_usb_add()
   AO_CREATE(ctx, domid, ao_how)
   libxl__device_usb_add()
 libxl__device_usb_setdefault()
   libxl_device_usbctrl_add_common()
 AO_CREATE(ctx, domid, NULL)
if this is not allowed, what is the correct way?

 If you need to, you 
 need to break it out into an internal function 
 (libxl__initiate_device_usbctrl_remove, maybe) which makes a callback 
 when done.
  
  +libxl_device_usbctrl * 
  +libxl_device_usbctrl_list(libxl_ctx *ctx, uint32_t domid, int *num) 
  +{ 
  +GC_INIT(ctx); 
  +libxl_device_usbctrl *usbctrls = NULL; 
  +char *fe_path = NULL; 
  +char **dir = NULL; 
  +unsigned int ndirs = 0; 
  + 
  +*num = 0; 
  + 
  +fe_path = GCSPRINTF(%s/device/vusb, 
  +libxl__xs_get_dompath(gc, domid)); 
  +dir = libxl__xs_directory(gc, XBT_NULL, fe_path, ndirs); 
  + 
  +if (dir  ndirs) { 
  +usbctrls = malloc(sizeof(*usbctrls) * ndirs); 
  
 Please use libxl__calloc with NOGC. 

Thanks. Missing this one.

  
  +libxl_device_usbctrl* usbctrl; 
  +libxl_device_usbctrl* end = usbctrls + ndirs; 
  +for (usbctrl = usbctrls; usbctrl  end; usbctrl++, dir++, 
  (*num)++)  
 { 
  +char *tmp; 
  +const char *be_path = libxl__xs_read(gc, XBT_NULL, 
  +GCSPRINTF(%s/%s/backend, fe_path,  
 *dir)); 
  + 
  +libxl_device_usbctrl_init(usbctrl); 
  +usbctrl-devid = atoi(*dir); 
  
 This function (and the corresponding writing code) is quite formulaic. 
 Perhaps a macro or something could be used. 
  
 I would make a similar criticism of libxl_device_usbctrl_getinfo. 
  
  +/* check if USB device is already assigned to a domain */ 
  +static bool is_usb_assigned(libxl__gc *gc, 

Re: [Xen-devel] [v3][PATCH 07/16] hvmloader/pci: skip reserved ranges

2015-06-12 Thread Chen, Tiejun

On 2015/6/11 17:51, Tian, Kevin wrote:

From: Chen, Tiejun
Sent: Thursday, June 11, 2015 9:15 AM

When allocating mmio address for PCI bars, we need to make
sure they don't overlap with reserved regions.

Signed-off-by: Tiejun Chen tiejun.c...@intel.com
---
  tools/firmware/hvmloader/pci.c | 36
++--
  1 file changed, 34 insertions(+), 2 deletions(-)

diff --git a/tools/firmware/hvmloader/pci.c b/tools/firmware/hvmloader/pci.c
index 5ff87a7..98af568 100644
--- a/tools/firmware/hvmloader/pci.c
+++ b/tools/firmware/hvmloader/pci.c
@@ -59,8 +59,8 @@ void pci_setup(void)
  uint32_t bar_reg;
  uint64_t bar_sz;
  } *bars = (struct bars *)scratch_start;
-unsigned int i, nr_bars = 0;
-uint64_t mmio_hole_size = 0;
+unsigned int i, j, nr_bars = 0;
+uint64_t mmio_hole_size = 0, reserved_end, max_bar_sz = 0;

  const char *s;
  /*
@@ -226,6 +226,8 @@ void pci_setup(void)
  bars[i].devfn   = devfn;
  bars[i].bar_reg = bar_reg;
  bars[i].bar_sz  = bar_sz;
+if ( bar_sz  max_bar_sz )
+max_bar_sz = bar_sz;

  if ( ((bar_data  PCI_BASE_ADDRESS_SPACE) ==
PCI_BASE_ADDRESS_SPACE_MEMORY) ||
@@ -301,6 +303,21 @@ void pci_setup(void)
  pci_mem_start = 1;
  }

+/* Relocate PCI memory that overlaps reserved space, like RDM. */
+for ( j = 0; j  memory_map.nr_map ; j++ )
+{
+if ( memory_map.map[j].type != E820_RAM )
+{
+reserved_end = memory_map.map[j].addr + memory_map.map[j].size;
+if ( check_overlap(pci_mem_start, pci_mem_end,
+   memory_map.map[j].addr,
+   memory_map.map[j].size) )
+pci_mem_start -= memory_map.map[j].size  PAGE_SHIFT;


what's the point of subtracting reserved size here? I think you want
to move pci_mem_start higher instead of lower to avoid conflict, right?


No.




+pci_mem_start = (pci_mem_start + max_bar_sz - 1) 
+~(uint64_t)(max_bar_sz - 1);


better have some comment to explain what exactly you're trying to
achieve here.


Actually I didn't have this code fragment. Here I add this chunk of 
codes to address one concern that Jan raised. Please see the below.





+}
+}
+
  if ( mmio_total  (pci_mem_end - pci_mem_start) )
  {
  printf(Low MMIO hole not large enough for all devices,
@@ -407,8 +424,23 @@ void pci_setup(void)
  }

  base = (resource-base  + bar_sz - 1)  ~(uint64_t)(bar_sz - 1);
+ reallocate_mmio:


In earlier comment you said:


+/* Relocate PCI memory that overlaps reserved space, like RDM. */


If pci_mem_start has been relocated to avoid overlapping, how will actual
allocation here will conflict again? Sorry I may miss the two relocations 
here...


  bar_data |= (uint32_t)base;
  bar_data_upper = (uint32_t)(base  32);
+for ( j = 0; j  memory_map.nr_map ; j++ )
+{
+if ( memory_map.map[j].type != E820_RAM )
+{
+reserved_end = memory_map.map[j].addr +
memory_map.map[j].size;
+if ( check_overlap(base, bar_sz,
+   memory_map.map[j].addr,
+   memory_map.map[j].size) )
+{
+base = (reserved_end  + bar_sz - 1)  ~(uint64_t)(bar_sz - 
1);
+goto reallocate_mmio;


That is because our previous implementation is just skipping that 
conflict region,


But you do nothing to make sure the MMIO regions all fit in the
available window (see the code ahead of this relocating RAM if
necessary). and ...it simply skips assigning resources. Your changes 
potentially growing the space needed to fit all MMIO BARs therefore also 
needs to adjust the up front calculation, such that if necessary more 
RAM can be relocated to make the hole large enough.


And then I replied as follows,

You're right.

Just think about we're always trying to check pci_mem_start to populate 
more RAM to obtain enough PCI mempry,


/* Relocate RAM that overlaps PCI space (in 64k-page chunks). */
while ( (pci_mem_start  PAGE_SHIFT)  hvm_info-low_mem_pgend )
{
struct xen_add_to_physmap xatp;
unsigned int nr_pages = min_t(
unsigned int,
hvm_info-low_mem_pgend - (pci_mem_start  PAGE_SHIFT),
(1u  16) - 1);
if ( hvm_info-high_mem_pgend == 0 )
hvm_info-high_mem_pgend = 1ull  (32 - PAGE_SHIFT);
hvm_info-low_mem_pgend -= nr_pages;
printf(Relocating 0x%x pages from PRIllx to PRIllx\
for lowmem MMIO hole\n,
   nr_pages,
   PRIllx_arg(((uint64_t)hvm_info-low_mem_pgend)PAGE_SHIFT),

PRIllx_arg(((uint64_t)hvm_info-high_mem_pgend)PAGE_SHIFT));
xatp.domid = DOMID_SELF;
xatp.space = 

Re: [Xen-devel] [Draft F] Xen on ARM vITS Handling

2015-06-12 Thread Vijay Kilari
On Thu, Jun 11, 2015 at 3:10 PM, Ian Campbell ian.campb...@citrix.com wrote:
 Draft F follows. Also at:
 http://xenbits.xen.org/people/ianc/vits/draftF.{pdf,html}

 Here's a quick update based on feedback prior to meeting on #xenarm at
 12:00AM BST / 7:00AM EDT / 4:30PM IST (which is ~1:20 from now)

 Ian.

 % Xen on ARM vITS Handling
 % Ian Campbell ian.campb...@citrix.com
 % Draft F

 # Changelog

 ## Since Draft E

 * Discussion of `struct pending_irq`
 * Fix handling of enable/disable, requiring switching back to trapping
   the virtual cfg table again. get_vlpi_cfg is no longer needed.
 * Fix p2m_lookup to also use get_page_from_gfn.

 ## Since Draft D

 * Fixed assumptions about vLPI-pLPI mapping, which is not
   possible. This lead to changes to the model for enabling and
   disabling pLPI and vLPI and the handling of the virtual LPI
   configuration table, resolving _Unresolved Issue 1_.
 * Made the pLPI and vLPI interrupt priorities explicit.
 * Attempted to clarify the trust issues regarding in-guest data
   structures.
 * Mandate a particular cacheability for tables in guest memory.

 ## Since Draft C

 * _Major_ rework, in an attempt to simplify everything into something
   more likely to be achievable for 4.6.
 * Made some simplifying assumptions.
 * Reduced the scope of some support.
 * Command emulation is now mostly trivial.
 * Expanded detail on host setup, allowing other assumptions to be
   made during emulation.
 * Many other things lost in the noise of the above.

 ## Since Draft B

 * Details of command translation (thanks to Julien and Vijay)
 * Added background on LPI Translation and Pending tables
 * Added background on Collections
 * Settled on `N:N` scheme for vITS:pat's mapping.
 * Rejigged section nesting a bit.
 * Since we now thing translation should be cheap, settle on
   translation at scheduling time.
 * Lazy `INVALL` and `SYNC`

 ## Since Draft A

 * Added discussion of when/where command translation occurs.
 * Contention on scheduler lock, suggestion to use SOFTIRQ.
 * Handling of domain shutdown.
 * More detailed discussion of multiple vs single vits pros/cons.

 # Introduction

 ARM systems containing a GIC version 3 or later may contain one or
 more ITS logical blocks. An ITS is used to route Message Signalled
 interrupts from devices into an LPI injection on the processor.

 The following summarises the ITS hardware design and serves as a set
 of assumptions for the vITS software design. For full details of the
 ITS see the GIC Architecture Specification.

 ## Locality-specific Peripheral Interrupts (`LPI`)

 This is a new class of message signalled interrupts introduced in
 GICv3. They occupy the interrupt ID space from `8192..(2^32)-1`.

 The number of LPIs support by an ITS is exposed via
 `GITS_TYPER.IDbits` (as number of bits - 1), it may be up to
 2^32. _Note_: This field also contains the number of Event IDs
 supported by the ITS.

 ### LPI Configuration Table

 Each LPI has an associated configuration byte in the LPI Configuration
 Table (managed via the GIC Redistributor and placed at
 `GICR_PROPBASER` or `GICR_VPROPBASER`). This byte configures:

 * The LPI's priority;
 * Whether the LPI is enabled or disabled.

 Software updates the Configuration Table directly but must then issue
 an invalidate command (per-device `INV` ITS command, global `INVALL`
 ITS command or write `GICR_INVLPIR`) for the affect to be guaranteed
 to become visible (possibly requiring an ITS `SYNC` command to ensure
 completion of the `INV` or `INVALL`). Note that it is valid for an
 implementation to reread the configuration table at any time (IOW it
 is _not_ guaranteed that a change to the LPI Configuration Table won't
 be visible until an invalidate is issued).

 ### LPI Pending Table

 Each LPI also has an associated bit in the LPI Pending Table (managed
 by the GIC redistributor). This bit signals whether the LPI is pending
 or not.

 This region may contain out of date information and the mechanism to
 synchronise is `IMPLEMENTATION DEFINED`.

 ## Interrupt Translation Service (`ITS`)

 ### Device Identifiers

 Each device using the ITS is associated with a unique Device
 Identifier.

 The device IDs are properties of the implementation and are typically
 described via system firmware, e.g. the ACPI IORT table or via device
 tree.

 The number of device ids in a system depends on the implementation and
 can be discovered via `GITS_TYPER.Devbits`. This field allows an ITS
 to have up to 2^32 devices.

 ### Events

 Each device can generate Events (called `ID` in the spec) these
 correspond to possible interrupt sources in the device (e.g. MSI
 offset).

 The maximum number of interrupt sources is device specific. It is
 usually discovered either from firmware tables (e.g. DT or ACPI) or
 from bus specific mechanisms (e.g. PCI config space).

 The maximum number of events ids support by an ITS is exposed via
 `GITS_TYPER.IDbits` (as number of bits - 1), it 

Re: [Xen-devel] [PATCH] libxl: libxl_internal.h: Clarify ao rule against internal callers

2015-06-12 Thread Ian Campbell
On Thu, 2015-06-11 at 17:57 +0100, Ian Jackson wrote:
 Signed-off-by: Ian Jackson ian.jack...@eu.citrix.com
 CC: Ian Campbell ian.campb...@citrix.com
 CC: Wei Liu wei.l...@citrix.com
 CC: Juergen Gross jgr...@suse.com

Acked-by: Ian Campbell ian.campb...@citrix.com

AO_INITIATOR_ENTRY was just a stray remainder of some wip name?

 ---
  tools/libxl/libxl_internal.h |5 +++--
  1 file changed, 3 insertions(+), 2 deletions(-)
 
 diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
 index 465..bfc0729 100644
 --- a/tools/libxl/libxl_internal.h
 +++ b/tools/libxl/libxl_internal.h
 @@ -1911,8 +1911,9 @@ _hidden void libxl__egc_cleanup(libxl__egc *egc);
   * All slow functions (see below for the exact definition) need to
   * use the asynchronous operation (ao) machinery.  The function
   * should take a parameter const libxl_asyncop_how *ao_how and must
 - * start with a call to AO_INITIATOR_ENTRY.  These functions MAY NOT
 - * be called from inside libxl, because they can cause reentrancy
 + * start with a call to AO_CREATE or equivalent.  These functions MAY
 + * NOT be called from inside libxl (regardless of what is passed for
 + * ao_how), because they can cause reentrancy hazards due to
   * callbacks.
   *
   * For the same reason functions taking an ao_how may make themselves



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] Modified RTDS scheduler to use an event-driven model instead of polling.

2015-06-12 Thread Dario Faggioli
On Wed, 2015-06-10 at 22:50 -0700, Meng Xu wrote:
 Hi Dario,
 
 First I think I got most of the points you raised/explained! They are
 very very clear and thanks for the detailed explanation of your idea!
 
Glad you got it and (sort of :-) ) agree.

  2015-06-09 5:53 GMT-07:00 Dario Faggioli dario.faggi...@citrix.com:
 
  2) Using cyclictest as Dario mentioned before to test the real-time
  performance at end user. Dagaen, I can provide you the commands to run
  it, which is actually quite simple to run.

  So, yes, seeing some results would be great, independently from the
  specific work done in this patch.
 
 Right! I have some ideas about this test but won't want to mess up the
 focus of this thread. :-)
 I will raise this test again when we come to it.
 
Ok. Looking forward to see this happening.

  Yeah, and what I can't figure out is why you decided to do so. The
  reason I don't like it is that things become a lot (more) complex to
  understand, maintain and modify.
 
 Now I get why you think it is harder to maintain. Right. reusing the
 timer will just make the rt_schedule complex and make the hot path
 longer.

Exactly.

runq_tickle(snext)...WAIT, WHAT?!?! :-O
 
  I mean, and I'm noticing this now, if the replenishments done during a
  particular call to rt_schedule() are not enough to change the situation
  on that particular pcpu, and hence the task which was running (and that
  you are deliberately disturbing with _a_full_execution_ of the
  scheduler's core function!) should continue to do so, you're calling
  runq_tickle() right on it. So, AFAICT, you're tickling scurr, not a
  newly replenished vcpu!
 
  Am I missing or misreading something? Let's assume not, and see what
  happens in this case...
 
 You are right! Although this issue (i.e., tickling on scurr instead of
 the next high priority VCPU) can be fixed (dirtily), it can be
 avoided with the design option a) you said.
 
Of course it can be fixed.. Pretty much everything can! Point is the
reason why it happened, and how to make these things not happen and/or
easier to figure out. That's (one of) the point(s) of keeping things
simple and self contained, even within a single component (like a
scheduler), instead of let's do everything in this 10k lines
function! :-P

Glad that you saw what I meat.

  Jokes apart, if the above analysis is accurate, I think this is a good
  enough example of what I meant when saying to Dagaen this is making
  things too complex.
 Yes. The flow chart you drew is very clear! Thanks!
 
 @Dagaen, what do you think? Please comment on Dario's reply with your
 opinion and raise any of your concerns.
 
Indeed.

  Here's how I envision things to go. Again, I'm talking about sticking
  with option a), so no per-vcpu timers, just 1 timer and a global queue,
  which now is a replenishment queue:
 
timer interrupt
  TIMER_SOFTIRQ raised
process softirqs
  replenishment_timer_handler()
[spin_lock]
  for_each_replenishment_event(repl_time  NOW()) {
 replenish(vcpu)
 runq_tickle(vcpu)
   }
[spin_lock]
 
  Then, on the tickled pcpus (if any):
 
process softirqs
  SCHEDULE_SOFTIRQ
rt_schedule()
  [spin_lock]
  snext = runq_pick(): snext == vcpu X
  [spin_unlock]
 
  And get rid of __repl_update(), which makes my eyes bleed every time I
  open sched_rt.c. :-)
 
  Oh, and note that we probably can use a new spin lock for the
  replenishment queue, different from the existing global scheduler
  spinlock, lowering contention and increasing scalabiliy even further!
 
 Here is the main question I have about your advice.
 If we are introducing a new spin lock for the replenishment queue,
 what is the relation between the new lock and the old lock of the
 runq?

That's hard to tell in advance, you'll know it completely only when
trying to implement it. But, yes, when more locks are involved, it's
impotant to figure out the relationships between each other, or we risk
introducing deadlock or, even if things are correct, fail to improve the
performance (or do even worse!!).

The idea is, since the two operations (scheduling/budget enforcement and
replenishments) are logically independent, and if they're implemented in
a way that stress this independence, then it make sens to try to use
different spin locks.

As soon as you have more than one spin lock, what you should pay the
most attention to is, if they need to 'nest' (one is acquired when the
other is already being held), that has to happen consistently, or
deadlock will occur! :-/

 Because the deadline decides the priority of VCPUs and thus decides
 the ordering of VCPUs in the runq, the replenish(vcpu) will operate
 on the runq as well. As shown in the workflow in your reply:
  replenish(vcpu)
  runq_tickle(vcpu)
 The runq_tickle(vcpu) will pick the desired CPU. On that CPU,
 

Re: [Xen-devel] [RFC v2 07/15] vt-d: Add API to update IRTE when VT-d PI is used

2015-06-12 Thread Wu, Feng


 -Original Message-
 From: Jan Beulich [mailto:jbeul...@suse.com]
 Sent: Tuesday, June 09, 2015 10:33 PM
 To: Wu, Feng
 Cc: andrew.coop...@citrix.com; george.dun...@eu.citrix.com; Tian, Kevin;
 Zhang, Yang Z; xen-devel@lists.xen.org; k...@xen.org
 Subject: Re: [RFC v2 07/15] vt-d: Add API to update IRTE when VT-d PI is used
 
  On 08.05.15 at 11:07, feng...@intel.com wrote:
  +bool_t pi_update_irte(struct vcpu *v, struct pirq *pirq, uint8_t gvec)
 
 Without seeing the caller right away it's hard to judge, but generally I'd
 prefer functions to return -E... values as error indicators, i.e.
 
  +{
  +struct irq_desc *desc;
  +struct msi_desc *msi_desc;
  +int remap_index;
  +bool_t rc = 0;
  +struct pci_dev *pci_dev;
  +struct acpi_drhd_unit *drhd;
  +struct iommu *iommu;
  +struct ir_ctrl *ir_ctrl;
  +struct iremap_entry *iremap_entries = NULL, *p = NULL;
  +struct iremap_entry new_ire;
  +struct pi_desc *pi_desc = v-arch.hvm_vmx.pi_desc;
  +unsigned long flags;
  +
  +desc = pirq_spin_lock_irq_desc(pirq, NULL);
  +if ( !desc )
  +return 0;
 
 -ENOMEM
 
  +msi_desc = desc-msi_desc;
  +if ( !msi_desc )
  +goto unlock_out;
 
 -EBADSLT
 
  +pci_dev = msi_desc-dev;
  +if ( !pci_dev )
  +goto unlock_out;
 
 -ENODEV
 
  +remap_index = msi_desc-remap_index;
  +drhd = acpi_find_matched_drhd_unit(pci_dev);
  +if ( !drhd )
  +{
  +dprintk(XENLOG_INFO VTDPREFIX,
  +%pv: failed to get drhd, pci device: 
  +%04x:%02x:%02x.%u, guest vector: %u\n,
  +v, pci_dev-seg, pci_dev-bus,
 PCI_SLOT(pci_dev-devfn),
  +PCI_FUNC(pci_dev-devfn), gvec);
  +goto unlock_out;
  +}
  +
  +iommu = drhd-iommu;
  +ir_ctrl = iommu_ir_ctrl(iommu);
  +if ( !ir_ctrl )
  +{
  +dprintk(XENLOG_INFO VTDPREFIX,
  +%pv: failed to get ir_ctrl, pci device: 
  +%04x:%02x:%02x.%u, guest vector: %u\n,
  +v, pci_dev-seg, pci_dev-bus,
 PCI_SLOT(pci_dev-devfn),
  +PCI_FUNC(pci_dev-devfn), gvec);
  +goto unlock_out;
  +}
 
 Do you think these log messages are useful beyond your bringup
 purposes?

So what kind of message do you think I need to show here?

Thanks,
Feng

 
  +spin_lock_irqsave(ir_ctrl-iremap_lock, flags);
  +
  +GET_IREMAP_ENTRY(ir_ctrl-iremap_maddr, remap_index,
 iremap_entries, p);
  +
  +memcpy(new_ire, p, sizeof(new_ire));
 
 Please use structure assignment (being type safe) in preference to
 memcpy() (not being type safe).
 
  --- a/xen/drivers/passthrough/vtd/iommu.h
  +++ b/xen/drivers/passthrough/vtd/iommu.h
  @@ -327,6 +327,10 @@ struct iremap_entry {
 };
   };
 
  +#define PDA_LOW_BIT26
  +#define PDA_HIGH_BIT   32
  +#define PDA_MASK(XX)   (~(-1UL  PDA_##XX##_BIT))
 
 To me it would look more natural if you used ~0UL.
 
 Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC v2 14/15] Suppress posting interrupts when 'SN' is set

2015-06-12 Thread Wu, Feng


 -Original Message-
 From: Jan Beulich [mailto:jbeul...@suse.com]
 Sent: Wednesday, June 10, 2015 2:49 PM
 To: Wu, Feng
 Cc: andrew.coop...@citrix.com; george.dun...@eu.citrix.com; Tian, Kevin;
 Zhang, Yang Z; xen-devel@lists.xen.org; k...@xen.org
 Subject: Re: [RFC v2 14/15] Suppress posting interrupts when 'SN' is set
 
  On 08.05.15 at 11:07, feng...@intel.com wrote:
  --- a/xen/arch/x86/hvm/vmx/vmx.c
  +++ b/xen/arch/x86/hvm/vmx/vmx.c
  @@ -1664,9 +1664,20 @@ static void
 __vmx_deliver_posted_interrupt(struct vcpu *v)
 
   static void vmx_deliver_posted_intr(struct vcpu *v, u8 vector)
   {
  +int r, sn;
  +
   if ( pi_test_and_set_pir(vector, v-arch.hvm_vmx.pi_desc) )
   return;
 
  +/*
  + * Currently, we don't support urgent interrupt, all interrupts
  + * are recognized as non-urgent interrupt, so we cannot send
  + * posted-interrupt when 'SN' is set.
  + */
  +
  +sn = v-arch.hvm_vmx.pi_desc.sn;
  +r = pi_test_and_set_on(v-arch.hvm_vmx.pi_desc);
 
 I'm probably misunderstanding something here, but to me this looks
 like a change that would need to be done quite a bit earlier in the
 series (i.e. at this point it looks like it's fixing a bug/oversight of an
 earlier patch).

From hardware p.o.v, if 'SN' is set, the hardware doesn't send notification 
event.
vmx_deliver_posted_intr() is the software way to delivery posted-interrupts, so
we need to follow the HW's behavior. Hence we check 'SN' first, and not send
notification event if it is set.

 
 Apart from that I'm also not understanding the synchronization
 aspect here: What if SN gets set after having been latched above,
 but before the latched value gets looked at below?

Yes, that is a question. Here is the scenario your described above, right?

..

sn = v-arch.hvm_vmx.pi_desc.sn; /*sn gets 0 here*/

v-arch.hvm_vmx.pi_desc.sn gets set by others

else if ( !r  !sn ) /*Oops, sn cannot reflect the real 
v-arch.hvm_vmx.pi_desc.sn here*/

..

Maybe I need think about how to handle this.

Thanks,
Feng

 
 Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] VT-d async invalidation for Device-TLB.

2015-06-12 Thread Jan Beulich
 On 12.06.15 at 04:40, quan...@intel.com wrote:
   On 10.06.15 at 16:05, jbeul...@suse.com wrote:
  On 03.06.15 at 09:49, quan...@intel.com wrote:
  For Context Invalidation and IOTLB invalidation without Device-TLB
  invalidation, Invalidation Queue flushes synchronous invalidation as
  before(This is a tradeoff and the cost of interrupt is overhead).
 
 DMAR_OPERATION_TIMEOUT being 1s, are you saying that you're not intending
 to replace the current spinning for the non-ATS case?
 
 Yes, we are not intending to replace the current spinning for the non-ATS 
 case.

I'm not really happy about that.

 Considering that expiring these loops results in panic()s, I would expect 
 these to
 become asynchronous _and_ contained to the affected VM alongside the ATS
 induced changed behavior. You talking of overhead - can you quantify that?

 
 I tested it by a Myri-10G Dual-Protocol NIC, which is an ATS device. 
 for an invalidation:
  By sync way, it takes about 1.4 ms.
  By async way, it takes about 4.3 ms.

What's the theory on why this is? After all, it shouldn't matter how
the completion of the invalidation gets signaled.

Apart from that measuring the ATS case (in which case we're set to
use async mode anyway) is kind of pointless here - we'd need to
know the overhead of non-ATS async compared to non-ATS sync.

  More details:
 
  1. invalidation table. We define iommu _invl structure in domain.
  Struct iommu _invl {
  volatile u64 iommu _invl _poll_slot :62;
  domid_t dom_id;
  u64 iommu _invl _status_data :32;
  }__attribute__ ((aligned (64)));
 
 iommu _invl _poll_slot: Set it equal to the status address of wait
  descriptor when the invalidation queue is with Device-TLB.
 dom_id: Keep the id of the domain.
 iommu _invl _status_data: Keep the count of in-flight queue with
  Device-TLB invalidation.
 
 Without further explanation above/below I don't think I really understand 
 the
 purpose of this structure, nor its organization: Is this something imposed 
 by the
 VT-d specification? If so, a reference to the respective section in the spec 
 would
 be useful. If not, I can't see why the structure is laid out the (odd) way 
 it is.
 
 
 Refer to the explanation above. If it is still not clear, I will continue to 
 explain in next email.

The explanation above helped for what I asked above, but didn't
make clear to me what the structure here is, how it relates to hw
defined structures, and hence (as said) why it is laid out the way it
is.

  4. New interrupt handler for invalidation completion:
  - when hardware completes the invalidations with Device IOTLB, it
  generates an interrupt to notify hypervisor.
  - In interrupt handler, we will schedule a soft-irq to handle the
  finished invalidations.
  - soft-irq to handle finished invalidation:
  Scan the pending flush list
 for each entry in list
  check the values of iommu _invl _poll_slot and iommu _invl
  _status_data in each domain's invalidation table.
  if yes, clear iommu_pending_flush and invalidation table,
  then wakeup the domain.
 
 Did you put some consideration into how long this list may get, and hence 
 how
 long it may take you to iterate through the entire list?
 
 
 Only the domain which has the ATS device assigned will be tracked in this 
 list. So the list length shouldn't be very long.

Okay, if this is a list of domains (or of devices), that would hopefully
be acceptable (albeit on a huge system this could still be dozens). If
this was a list of pending flush requests, it might be worse.

 Besides, the DEVICE-IOTLB 
 invalidation doesn't happened frequently so the cost should be acceptable.

That's not a valid consideration: At no time must any processing
inside the hypervisor take arbitrarily long. This requirement is
entirely independent of how frequently such cases may occur.

Jan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v3][PATCH 00/16] Fix RMRR

2015-06-12 Thread Jan Beulich
 On 12.06.15 at 04:10, tiejun.c...@intel.com wrote:
 On 2015/6/11 20:52, Tim Deegan wrote:
 which would be better handeld explicitly:

 if ( p2mt == p2m_invalid || p2mt == p2m_mmio_dm )
 ...

 
 So if I'm correct, we should do this check explicitly,
 
 if ( p2mt == p2m_invalid ||
  (p2mt == p2m_mmio_dm  !mfn_valid(mfn) )
 
 Note this is equivalent to Jan's comment.

I think the !mfn_valid() part is really redundant - p2m_mmio_dm
should never be put on a page translating to a valid MFN.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [OSSTEST Nested PATCH v11 6/7] Compose the main recipe of nested test job

2015-06-12 Thread Robert Hu
On Fri, 2015-06-12 at 09:44 +0100, Ian Campbell wrote:
 On Fri, 2015-06-12 at 11:42 +0800, Robert Hu wrote:
  Hi Ian J., because nested Xen isn't that matured at present; it is
  currently 'tech preview' phase. We now just add some sanity test case to
  defend current work fruits. We can add more complicated test cases later
  as nested Xen development moves forward.
 
 I don't think it needs doing as part of this series, but I do think it
 would be worth adding the standard suite of tests steps to L2 guests
 soon after.
Agree. We can separately add those part in a later patch series.  
 
 The way osstest deals with test failures i.e. classifying them into has
 always failed vs. regression means that it is fine to add tests for
 features which aren't mature yet, since they will fall into the has
 always failed set and not block pushes. In fact it is in some sense
 good to do this since it gives a concrete set of (necessary but not
 necessarily sufficient) things which need to be fixed for the feature to
 reach maturity.
We would like this test job's failure to be marked 'regression', which
shall block related breaking code's commitment.
We can consider to contribute more code later, including proved/success
test cases (which will fall into regression as this case), and fail test
cases (which shall soon be fixed/developed).
 
 Ian.
 



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v3][PATCH 04/16] xen/passthrough: extend hypercall to support rdm reservation policy

2015-06-12 Thread Chen, Tiejun

On 2015/6/12 16:45, Jan Beulich wrote:

On 12.06.15 at 08:31, tiejun.c...@intel.com wrote:

On 2015/6/11 17:28, Tian, Kevin wrote:

From: Chen, Tiejun
Sent: Thursday, June 11, 2015 9:15 AM
@@ -1940,7 +1942,8 @@ static int intel_iommu_remove_device(u8 devfn, struct 
pci_dev
*pdev)
PCI_DEVFN2(bdf) != devfn )
   continue;

-rmrr_identity_mapping(pdev-domain, 0, rmrr);
+rmrr_identity_mapping(pdev-domain, 0, rmrr,
+  XEN_DOMCTL_DEV_RDM_RELAXED);


ditto


It doesn't matter when we're trying to remove a device since we don't
care this flag.


In such a case it helps to add a brief comment saying that the precise
value passed is irrelevant. Or maybe this could be expressed by


Okay.


folding this and the map parameters of the function (in which case it
might become self-documenting)?



Sorry, I don't know exactly how to implement this idea. Have we any 
similar example on Xen side?


Thanks
Tiejun

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC v2 09/15] Add a new per-vCPU tasklet to wakeup the blocked vCPU

2015-06-12 Thread Wu, Feng


 -Original Message-
 From: Jan Beulich [mailto:jbeul...@suse.com]
 Sent: Tuesday, June 09, 2015 11:09 PM
 To: Wu, Feng
 Cc: andrew.coop...@citrix.com; george.dun...@eu.citrix.com; Tian, Kevin;
 Zhang, Yang Z; xen-devel@lists.xen.org; k...@xen.org
 Subject: Re: [RFC v2 09/15] Add a new per-vCPU tasklet to wakeup the blocked
 vCPU
 
  On 08.05.15 at 11:07, feng...@intel.com wrote:
  This patch adds a new per-vCPU tasklet to wakeup the blocked
  vCPU. It can be used in the case vcpu_unblock cannot be called
  directly. This tasklet will be used in later patch in this
  series.
 
  Signed-off-by: Feng Wu feng...@intel.com
  ---
   xen/common/domain.c | 11 +++
   xen/include/xen/sched.h |  3 +++
   2 files changed, 14 insertions(+)
 
 What is the rationale for doing this in common code, modifying
 common structures?
 

Do you mean moving this tasklet to the architecture specific structure, such 
as, struct arch_vcpu ?

Thanks,
Feng

 Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC v2 08/15] Update IRTE according to guest interrupt config changes

2015-06-12 Thread Wu, Feng


 -Original Message-
 From: Jan Beulich [mailto:jbeul...@suse.com]
 Sent: Tuesday, June 09, 2015 11:06 PM
 To: Wu, Feng
 Cc: andrew.coop...@citrix.com; george.dun...@eu.citrix.com; Tian, Kevin;
 Zhang, Yang Z; xen-devel@lists.xen.org; k...@xen.org
 Subject: Re: [RFC v2 08/15] Update IRTE according to guest interrupt config
 changes
 
  On 08.05.15 at 11:07, feng...@intel.com wrote:
  +static bool_t pi_find_dest_vcpu(struct domain *d, uint8_t dest_id,
  +uint8_t dest_mode, uint8_t
 delivery_mode,
  +uint8_t gvec, struct vcpu
 **dest_vcpu)
  +{
  +struct vcpu *v, **dest_vcpu_array;
  +unsigned int dest_vcpu_num = 0;
  +int ret;
 
 This, being given as operand to return, should match in type with
 the function's return type.
 
  +dest_vcpu_array = xzalloc_array(struct vcpu *, d-max_vcpus);
 
 You realize that this can be quite big an allocation? (You could at
 least halve it by storing vCPU IDs instead of pointers, but if I'm
 not mistaken this could even be a simple bitmap.)

If we use vCPU IDs or bitmap, we need to iterate all the vCPUs in the
domain to find the right vCPU from the vcpu_id, right?

 
  +if ( !dest_vcpu_array )
  +{
  +dprintk(XENLOG_G_INFO,
  +dom%d: failed to allocate memeory.\n, d-domain_id);
 
 Please fix the typo and remove the stop.
 
  +return 0;
  +}
  +
  +for_each_vcpu ( d, v )
  +{
  +if ( !vlapic_match_dest(vcpu_vlapic(v), NULL, 0,
  +dest_id, dest_mode) )
  +continue;
  +
  +dest_vcpu_array[dest_vcpu_num++] = v;
  +}
  +
  +if ( delivery_mode == dest_LowestPrio )
  +{
  +if (  dest_vcpu_num != 0 )
  +{
  +*dest_vcpu = dest_vcpu_array[gvec % dest_vcpu_num];
  +ret = 1;
  +}
  +else
  +ret = 0;
  +}
  +else if (  dest_vcpu_num == 1 )
  +{
  +*dest_vcpu = dest_vcpu_array[0];
  +ret = 1;
  +}
  +else
  +ret = 0;
  +
  +xfree(dest_vcpu_array);
  +return ret;
  +}
 
 Blank line before final return statement please.
 
  @@ -330,11 +398,40 @@ int pt_irq_create_bind(
   /* Calculate dest_vcpu_id for MSI-type pirq migration. */
   dest = pirq_dpci-gmsi.gflags  VMSI_DEST_ID_MASK;
   dest_mode = !!(pirq_dpci-gmsi.gflags  VMSI_DM_MASK);
  +delivery_mode = (pirq_dpci-gmsi.gflags 
 GFLAGS_SHIFT_DELIV_MODE) 
  +VMSI_DELIV_MASK;
   dest_vcpu_id = hvm_girq_dest_2_vcpu_id(d, dest, dest_mode);
   pirq_dpci-gmsi.dest_vcpu_id = dest_vcpu_id;
   spin_unlock(d-event_lock);
   if ( dest_vcpu_id = 0 )
   hvm_migrate_pirqs(d-vcpu[dest_vcpu_id]);
  +
  +/* Use interrupt posting if it is supported */
  +if ( iommu_intpost )
  +{
  +struct vcpu *vcpu = NULL;
  +
  +if ( !pi_find_dest_vcpu(d, dest, dest_mode, delivery_mode,
  +pirq_dpci-gmsi.gvec, vcpu) )
 
 Why not have the function return the vCPU instead of passing it
 via indirection?

Good suggestion!

Thanks,
Feng

 
  +{
  +dprintk(XENLOG_G_WARNING,
  +%pv: failed to find the dest vCPU for PI, guest
 
  +vector:%u use software way to deliver the 
  + interrupts.\n, vcpu, pirq_dpci-gmsi.gvec);
 
 You shouldn't be printing the (NULL) vCPU here. And please print
 vectors as hex values.
 
  +break;
  +}
  +
  +if ( pi_update_irte( vcpu, info, pirq_dpci-gmsi.gvec ) != 0 )
  +{
  +dprintk(XENLOG_G_WARNING,
  +%pv: failed to update PI IRTE, guest
 vector:%u 
  +use software way to deliver the
 interrupts.\n,
  +vcpu, pirq_dpci-gmsi.gvec);
  +
  +break;
  +}
  +}
  +
   break;
 
 By using if() / else if() you could drop _both_ break-s you add.
 
 Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC v2 07/15] vt-d: Add API to update IRTE when VT-d PI is used

2015-06-12 Thread Wu, Feng


 -Original Message-
 From: Jan Beulich [mailto:jbeul...@suse.com]
 Sent: Wednesday, June 10, 2015 2:18 PM
 To: Wu, Feng
 Cc: andrew.coop...@citrix.com; george.dun...@eu.citrix.com; Tian, Kevin;
 Zhang, Yang Z; xen-devel@lists.xen.org; k...@xen.org
 Subject: Re: [RFC v2 07/15] vt-d: Add API to update IRTE when VT-d PI is used
 
  On 08.05.15 at 11:07, feng...@intel.com wrote:
  +static inline void setup_posted_irte(
  +struct iremap_entry *new_ire, struct pi_desc *pi_desc, uint8_t gvec)
  +{
  +new_ire-post.urg = 0;
  +new_ire-post.vector = gvec;
  +new_ire-post.pda_l = (((u64)virt_to_maddr(pi_desc)) 
  +   (32 - PDA_LOW_BIT))  PDA_MASK(LOW);
  +new_ire-post.pda_h = (((u64)virt_to_maddr(pi_desc))  32) 
  +   PDA_MASK(HIGH);
 
 Looking at this another time I can't see what the and-ing with
 PAD_MASK() is supposed to be good for. 

I cannot understand this well. Do you mean we don't need and PDA_MASK() here?

Thanks,
Feng

Nor can I see the need
 for the u64 casts. The two literal 32-s aren't ideal either, but I
 guess tolerable.
 
 Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC v2 11/15] vmx: Add a global wake-up vector for VT-d Posted-Interrupts

2015-06-12 Thread Wu, Feng


 -Original Message-
 From: xen-devel-boun...@lists.xen.org
 [mailto:xen-devel-boun...@lists.xen.org] On Behalf Of Jan Beulich
 Sent: Tuesday, June 09, 2015 11:20 PM
 To: Wu, Feng
 Cc: Tian, Kevin; k...@xen.org; george.dun...@eu.citrix.com;
 andrew.coop...@citrix.com; xen-devel@lists.xen.org; Zhang, Yang Z
 Subject: Re: [Xen-devel] [RFC v2 11/15] vmx: Add a global wake-up vector for
 VT-d Posted-Interrupts
 
  On 08.05.15 at 11:07, feng...@intel.com wrote:
  @@ -3262,6 +3271,28 @@ void vmx_vmenter_helper(const struct
 cpu_user_regs *regs)
   }
 
   /*
  + * Handle VT-d posted-interrupt when VCPU is blocked.
  + */
  +void pi_wakeup_interrupt(struct cpu_user_regs *regs)
 
 static (and perhaps move it up so you don't need to forward declare it).
 
  +{
  +struct vcpu *v;
  +unsigned int cpu = smp_processor_id();
  +
  +spin_lock(per_cpu(blocked_vcpu_lock, cpu));
  +list_for_each_entry(v, per_cpu(blocked_vcpu, cpu),
  +blocked_vcpu_list) {
 
 How long do you think such a list can grow? 

It depends on how many vCPU is current blocked on this specific pCPU.
I am not sure how long it can grow. 

The basic idea here is that, when notification wakeup event happens,
we need to find a way to find the right vCPU, and then wakeup it, do
you have any better idea for this?

Thanks,
Feng

 I'm afraid you might
 be adding quite a bit of latency here to the system.
 
  +struct pi_desc *pi_desc = v-arch.hvm_vmx.pi_desc;
  +
  +if ( pi_desc-on == 1 )
 
 Isn't this a single-bit (i.e. boolean) field? In which case - no reason
 to compare with 1.
 
 I also don't see the value of the local variable - it's being used only
 once, and without it the line wouldn't get overly long or unreadable.
 
  --- a/xen/include/xen/sched.h
  +++ b/xen/include/xen/sched.h
  @@ -148,6 +148,8 @@ struct vcpu
 
   struct vcpu *next_in_list;
 
  +struct list_head blocked_vcpu_list;
 
 Again - why here instead of in VT-d/VMX specific structures?

Do you think it is okay to put it in  struct arch_vmx_struct ?

Thanks,
Feng

 
 Jan
 
 
 ___
 Xen-devel mailing list
 Xen-devel@lists.xen.org
 http://lists.xen.org/xen-devel

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [libvirt test] 58401: tolerable FAIL - PUSHED

2015-06-12 Thread osstest service user
flight 58401 libvirt real [real]
http://logs.test-lab.xenproject.org/osstest/logs/58401/

Failures :-/ but no regressions.

Regressions which are regarded as allowable (not blocking):
 test-amd64-i386-libvirt  11 guest-start  fail   like 58334
 test-amd64-amd64-libvirt 11 guest-start  fail   like 58334
 test-amd64-i386-libvirt-xsm  11 guest-start  fail   like 58334
 test-armhf-armhf-libvirt 11 guest-start  fail   like 58334

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-xsm 11 guest-start  fail   never pass

version targeted for testing:
 libvirt  c0ef99525d55aecbcf84dea73ba6753c243680ca
baseline version:
 libvirt  c178d38b8faabf93521b4013bc6772c0f5e6b9e0


People who touched revisions under test:
  Eric Blake ebl...@redhat.com
  Michal Privoznik mpriv...@redhat.com


jobs:
 build-amd64-xsm  pass
 build-armhf-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-armhf-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-armhf-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-libvirt-xsm pass
 test-armhf-armhf-libvirt-xsm fail
 test-amd64-i386-libvirt-xsm  fail
 test-amd64-amd64-libvirt fail
 test-armhf-armhf-libvirt fail
 test-amd64-i386-libvirt  fail



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

+ branch=libvirt
+ revision=c0ef99525d55aecbcf84dea73ba6753c243680ca
+ . cri-lock-repos
++ . cri-common
+++ . cri-getconfig
+++ umask 002
+++ getconfig Repos
+++ perl -e '
use Osstest;
readglobalconfig();
print $c{Repos} or die $!;
'
++ repos=/home/osstest/repos
++ repos_lock=/home/osstest/repos/lock
++ '[' x '!=' x/home/osstest/repos/lock ']'
++ OSSTEST_REPOS_LOCK_LOCKED=/home/osstest/repos/lock
++ exec with-lock-ex -w /home/osstest/repos/lock ./ap-push libvirt 
c0ef99525d55aecbcf84dea73ba6753c243680ca
+ branch=libvirt
+ revision=c0ef99525d55aecbcf84dea73ba6753c243680ca
+ . cri-lock-repos
++ . cri-common
+++ . cri-getconfig
+++ umask 002
+++ getconfig Repos
+++ perl -e '
use Osstest;
readglobalconfig();
print $c{Repos} or die $!;
'
++ repos=/home/osstest/repos
++ repos_lock=/home/osstest/repos/lock
++ '[' x/home/osstest/repos/lock '!=' x/home/osstest/repos/lock ']'
+ . cri-common
++ . cri-getconfig
++ umask 002
+ select_xenbranch
+ case $branch in
+ tree=libvirt
+ xenbranch=xen-unstable
+ '[' xlibvirt = xlinux ']'
+ linuxbranch=
+ '[' x = x ']'
+ qemuubranch=qemu-upstream-unstable
+ : tested/2.6.39.x
+ . ap-common
++ : osst...@xenbits.xen.org
+++ getconfig OsstestUpstream
+++ perl -e '
use Osstest;
readglobalconfig();
print $c{OsstestUpstream} or die $!;
'
++ :
++ : git://xenbits.xen.org/xen.git
++ : osst...@xenbits.xen.org:/home/xen/git/xen.git
++ : git://xenbits.xen.org/staging/qemu-xen-unstable.git
++ : git://git.kernel.org
++ : git://git.kernel.org/pub/scm/linux/kernel/git
++ : git
++ : git://libvirt.org/libvirt.git
++ : osst...@xenbits.xen.org:/home/xen/git/libvirt.git
++ : git://xenbits.xen.org/libvirt.git
++ : git://xenbits.xen.org/rumpuser-xen.git
++ : git
++ : git://xenbits.xen.org/rumpuser-xen.git
++ : osst...@xenbits.xen.org:/home/xen/git/rumpuser-xen.git
+++ besteffort_repo https://github.com/rumpkernel/rumpkernel-netbsd-src
+++ local repo=https://github.com/rumpkernel/rumpkernel-netbsd-src
+++ cached_repo https://github.com/rumpkernel/rumpkernel-netbsd-src 
'[fetch=try]'
+++ local 

Re: [Xen-devel] [OSSTEST Nested PATCH v11 5/7] Add new script to customize nested test configuration

2015-06-12 Thread Ian Campbell
On Fri, 2015-06-12 at 03:46 +, Pang, LongtaoX wrote:
 
  -Original Message-
  From: Ian Jackson [mailto:ian.jack...@eu.citrix.com]
  Sent: Thursday, June 11, 2015 11:15 PM
  To: Pang, LongtaoX
  Cc: xen-devel@lists.xen.org; ian.campb...@citrix.com; wei.l...@citrix.com; 
  Hu,
  Robert
  Subject: RE: [OSSTEST Nested PATCH v11 5/7] Add new script to customize 
  nested
  test configuration
  
  Pang, LongtaoX writes (RE: [OSSTEST Nested PATCH v11 5/7] Add new script to
  customize nested test configuration):
   [Ian Jackson [mailto:ian.jack...@eu.citrix.com]]
  
We would avoid having to mention /dev/xvdb if we created the VG in the
host, before doing block-attach.  I'm not sure whether that's an
improvement.
  
   I am sorry, I cannot get your point. Could you please make it more
   clear? Thanks.
  
  As you explain in the comment, you have to mention /dev/xvdb here even
  though after rebooting into Xen this will be /dev/sdb.
  
  This wrinkle would become invisible if you did pvcreate and vgcreate
  in the L0 (before attach), rather than the L1 (after attach).
  Because, then none of your operations on L1 would not need to name the
  physical device.
  
 Thanks for your explanation.
 For nested job, we will install L2 guest inside L1 reuse 
 'ts-debian-hvm-install' script.
 But according to recent code design, inside L1 HVM guest, there is no vg 
 group, so that we need to
 create a vg group for L2 installation.
 So, inside L0, we create storage lv(lvcreate -L ${guest_storage_lv_size}M -n 
 $guest_storage_lv_name $vgname), 
 attach this lv to L1. Then inside L1, using the attached disk to create a 
 pv(pvcreate /dev/xvdb),
 then create a vg(vgcreate ${l1_gn}-disk /dev/xvdb) base on the pv. Then, 
 using this vg for installing L2.
 I think '/dev/xvdb' is necessary when create vg inside L1, using command 
 'vgcreate ${l1_gn}-disk /dev/xvdb' .
 Please correct me if I am wrong.

I think Ian J's suggestion is that instead of creating the ${l1_gn}-disk
VG via commands on /dev/xvdb in the L1 you can create it via commands on
${guest_storage_lvdev} in L0.

e.g. instead of
+target_cmd_root($l1, pvcreate /dev/xvdb  vgcreate ${l1_gn}-disk /dev/xvdb);
you can instead do:
+target_cmd_root($l0, pvcreate  ${guest_storage_lvdev}  vgcreate 
${l1_gn}-disk /dev/xvdb);
before you do the block-attach.

That way the VG is already present on the device when it appears in L1
and L1 will just find it.

This avoids any confusion about sdb vs xvdb naming.

Ian.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [linux-linus test] 58399: regressions - FAIL

2015-06-12 Thread osstest service user
flight 58399 linux-linus real [real]
http://logs.test-lab.xenproject.org/osstest/logs/58399/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-amd64-rumpuserxen-amd64 15 
rumpuserxen-demo-xenstorels/xenstorels.repeat fail REGR. vs. 58128

Regressions which are regarded as allowable (not blocking):
 test-amd64-i386-libvirt-xsm  11 guest-start   fail REGR. vs. 58128
 test-amd64-i386-libvirt  11 guest-start  fail   like 58128
 test-amd64-amd64-libvirt 11 guest-start  fail   like 58128
 test-amd64-i386-freebsd10-amd64  9 freebsd-install fail like 58128
 test-amd64-i386-freebsd10-i386  9 freebsd-install  fail like 58128
 test-armhf-armhf-libvirt-xsm 11 guest-start  fail   like 58128
 test-amd64-amd64-xl-qemuu-win7-amd64 16 guest-stop fail like 58128
 test-amd64-i386-xl-qemuu-win7-amd64 16 guest-stop  fail like 58128
 test-armhf-armhf-libvirt 11 guest-start  fail   like 58128

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-pvh-intel 13 guest-saverestorefail  never pass
 test-amd64-amd64-xl-pvh-amd  11 guest-start  fail   never pass
 test-amd64-amd64-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-xsm  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-sedf 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 12 migrate-support-checkfail  never pass
 test-amd64-amd64-xl-qemut-win7-amd64 16 guest-stop fail never pass
 test-armhf-armhf-xl-sedf-pin 12 migrate-support-checkfail   never pass
 test-amd64-i386-xl-qemut-win7-amd64 16 guest-stop  fail never pass
 test-armhf-armhf-xl-cubietruck 12 migrate-support-checkfail never pass
 test-armhf-armhf-xl-credit2  12 migrate-support-checkfail   never pass

version targeted for testing:
 linux2bfc60dd2869745f5bbed9583c517cc76e3db924
baseline version:
 linux37ef1647b7f73d4ff4c7993984599b6c4f26443a


People who touched revisions under test:
  Eric W. Biederman ebied...@xmission.com
  Tom Herbert t...@herbertland.com
  Aaro Koskinen aaro.koski...@nokia.com
  Alban Bedel al...@free.fr
  Alexei Starovoitov a...@plumgrid.com
  Andrew Morton a...@linux-foundation.org
  Anjali Singhai Jain anjali.sing...@intel.com
  Arnd Bergmann a...@arndb.de
  David S. Miller da...@davemloft.net
  David Woodhouse david.woodho...@intel.com
  Dmitry Torokhov dmitry.torok...@gmail.com
  Felipe Balbi ba...@ti.com
  Flavio Leitner f...@redhat.com
  Florian Fainelli f.faine...@gmail.com
  Gregory CLEMENT gregory.clem...@free-electrons.com
  Gu Zheng guz.f...@cn.fujitsu.com
  Guenter Roeck li...@roeck-us.net
  Hannes Frederic Sowa han...@stressinduktion.org
  Hauke Mehrtens ha...@hauke-m.de
  Heiko Carstens heiko.carst...@de.ibm.com
  Huacai Chen che...@lemote.com
  Imre Kaloz ka...@openwrt.org
  James Hogan james.ho...@imgtec.com
  Jeff Kirsher jeffrey.t.kirs...@intel.com
  Jeroen Hofstee jer...@myspectrum.nl
  Jesse Brandeburg jesse.brandeb...@intel.com
  Jiang Liu jiang@linux.intel.com
  Jim Young james.m.yo...@intel.com
  Jiri Benc jb...@redhat.com
  Joe Perches j...@perches.com
  Joerg Roedel jroe...@suse.de
  Johannes Weiner han...@cmpxchg.org
  Jonas Gorski j...@openwrt.org
  Josh Hunt joh...@akamai.com
  Joshua Kinard ku...@gentoo.org
  Kevin Hilman khil...@linaro.org
  Krzysztof Kozlowski k.kozlow...@samsung.com
  Lendacky, Thomas thomas.lenda...@amd.com
  Lennox Wu lennox...@gmail.com
  Linus Torvalds torva...@linux-foundation.org
  Maciej W. Rozycki ma...@linux-mips.org
  Marc Zyngier marc.zyng...@arm.com
  Markos Chandras markos.chand...@imgtec.com
  Matthias Brugger matthias@gmail.com
  Matthijs van Duin matthijsvand...@gmail.com
  Mel Gorman mgor...@suse.de
  Michael Holzheu holz...@linux.vnet.ibm.com
  Michael S. Tsirkin m...@redhat.com
  Michal Hocko mho...@suse.cz
  Minchan Kim minc...@kernel.org
  Nadav Haklai nad...@marvell.com
  Nicholas Mc Guire hof...@osadl.org
  Nicolas Schichan nschic...@freebox.fr
  Nikolay Aleksandrov niko...@cumulusnetworks.com
  Pavel Machek pa...@ucw.cz
  Peter Hutterer peter.hutte...@who-t.net
  Pravin B Shelar pshe...@nicira.com
  Ralf Baechle r...@linux-mips.org
  Robert Nelson robertcnel...@gmail.com
  Robert Shearman rshea...@brocade.com
  Sascha Hauer s.ha...@pengutronix.de
  Sergey Senozhatsky sergey.senozhat...@gmail.com
  Shawn Bohrer sboh...@rgmadvisors.com
  Sriharsha Basavapatna sriharsha.basavapa...@avagotech.com
  Tejun Heo t...@kernel.org
  Tero Kristo t-kri...@ti.com
  Thomas Petazzoni 

Re: [Xen-devel] [v3][PATCH 03/16] xen/vtd: create RMRR mapping

2015-06-12 Thread Tian, Kevin
 From: Chen, Tiejun
 Sent: Friday, June 12, 2015 1:58 PM
 
 On 2015/6/12 10:43, Chen, Tiejun wrote:
  On 2015/6/11 22:07, Tim Deegan wrote:
  At 17:31 +0800 on 11 Jun (1434043916), Chen, Tiejun wrote:
 while ( base_pfn  end_pfn )
 {
  -int err = intel_iommu_map_page(d, base_pfn, base_pfn,
  -
  IOMMUF_readable|IOMMUF_writable);
  +int err = set_identity_p2m_entry(d, base_pfn, p2m_access_rw);
 
 if ( err )
 return err;
 
  Tim has another comment to replace earlier unmap with
 
  Yes, I knew this.
 
  guest_physmap_remove_page() which will call iommu
  unmap internally. Please include this change too.
 
 
  But,
 
  guest_physmap_remove_page()
|
+ p2m_remove_page()
  |
  + iommu_unmap_page()
  |
  + p2m_set_entry(p2m, gfn, _mfn(INVALID_MFN), xxx)
 
  I think this already remove these pages both on ept/vt-d sides, right?
 
  Yes; this is about this code further up in the same function:
 
  while ( base_pfn  end_pfn )
  {
  if ( intel_iommu_unmap_page(d, base_pfn) )
  ret = -ENXIO;
  base_pfn++;
  }
 
  which ought to be calling guest_physmap_remove_page() or similar, to
  make sure that both iommu and EPT mappings get removed.
 
 
  I still just think current implementation might be fine at this point.
 
  We have two scenarios here, the case of shared ept and the case of
  non-shared ept. But no matter what case we're tracking, shouldn't
  guest_physmap_remove_page() always call p2m-set_entry() to clear *all*
  *valid* mfn which is owned by a given VM? And p2m-set_entry() also
  calls iommu_unmap_page() internally. So nothing special should further
  consider.
 
  If I'm wrong or misunderstanding, please correct me :)
 
 
 Sorry for my misunderstanding to this. Right now Kevin help me
 understand what you mean. Sounds like we should address something
 specific to unmap rmrr here.
 
 So I will do this as follows:
 
 #1. Provide a clear helper
 
 +int clear_identity_p2m_entry(struct domain *d, unsigned long gfn,
 + unsigned int page_order)
 +{
 +struct p2m_domain *p2m = p2m_get_hostp2m(d);
 +int ret;
 +gfn_lock(p2m, gfn, page_order);
 +ret = p2m_remove_page(p2m, gfn, gfn, page_order);
 +gfn_unlock(p2m, gfn, page_order);
 +return ret;
 +}
 +
 
 #2. Call such a helper
 
 @@ -1840,7 +1840,7 @@ static int rmrr_identity_mapping(struct domain *d,
 bool_t map,
 
   while ( base_pfn  end_pfn )
   {
 -if ( intel_iommu_unmap_page(d, base_pfn) )
 +if ( clear_identity_p2m_entry(d, base_pfn, 0) )
   ret = -ENXIO;
   base_pfn++;
   }
 Is this right?
 
 Thanks
 Tiejun

could you explain why existing guest_physmap_remove_page can't
serve the purpose so you need invent a new identity mapping
specific one? For unmapping suppose it should be common regardless
of whether it's identity-mapped or not. :-)

Thanks
Kevin


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [ovmf test] 58398: tolerable FAIL - PUSHED

2015-06-12 Thread osstest service user
flight 58398 ovmf real [real]
http://logs.test-lab.xenproject.org/osstest/logs/58398/

Failures :-/ but no regressions.

Regressions which are regarded as allowable (not blocking):
 test-amd64-amd64-xl-qemuu-win7-amd64 16 guest-stopfail REGR. vs. 56492
 test-amd64-i386-xl-qemuu-win7-amd64 16 guest-stop fail REGR. vs. 56492

version targeted for testing:
 ovmf 13af4ab06516eefb40fb985467141e09efe9c58b
baseline version:
 ovmf f1f0f0deb6e64df6b7c04ead7330afecf5537e46


People who touched revisions under test:
  Ma, Maurice maurice...@intel.com
  Mudusuru, Giri P giri.p.mudus...@intel.com
  Yao, Jiewen jiewen@intel.com
  Zachary Bobroff zacha...@ami.com
  Ard Biesheuvel ard.biesheu...@linaro.org
  Chao Zhang chao.b.zh...@intel.com
  Dandan Bi dandan...@intel.com
  David Wei  david@intel.com
  David Wei david@intel.com
  Eric Dong eric.d...@intel.com
  Eric Jin eric@intel.com
  Feng Tian feng.t...@intel.com
  Guo Dong guo.d...@intel.com
  Hao Wu hao.a...@intel.com
  Heyi Guo heyi@linaro.org
  Jaben Carsey jaben.car...@intel.com
  Jeff Fan jeff@intel.com
  jiaxinwu jiaxin...@intel.com
  Jordan Justen jordan.l.jus...@intel.com
  Laszlo Ersek ler...@redhat.com
  lhauch larry.ha...@intel.com
  Liming Gao liming@intel.com
  Long Qin qin.l...@intel.com
  Ludovic Rousseau ludovic.rouss...@gmail.com
  Ma, Maurice maurice...@intel.com
  Maurice Ma maurice...@intel.com
  Mudusuru, Giri P giri.p.mudus...@intel.com
  Olivier Martin olivier.mar...@arm.com
  Paulo Alcantara pca...@zytor.com
  Qiu Shumin shumin@intel.com
  Ruiyu Ni ruiyu...@intel.com
  Shifei Lu shifeix.a...@intel.com
  Star Zeng star.z...@intel.com
  Tapan Shah tapands...@hp.com
  Tian Feng feng.t...@intel.com
  Yao, Jiewen jiewen@intel.com
  Yingke Liu yingke.d@intel.com
  Zachary Bobroff zacha...@ami.com


jobs:
 build-amd64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64-xsmpass
 test-amd64-i386-xl-qemuu-debianhvm-amd64-xsm pass
 test-amd64-i386-qemuu-rhel6hvm-amd   pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-i386-xl-qemuu-debianhvm-amd64 pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 pass
 test-amd64-i386-xl-qemuu-ovmf-amd64  pass
 test-amd64-amd64-xl-qemuu-win7-amd64 fail
 test-amd64-i386-xl-qemuu-win7-amd64  fail
 test-amd64-i386-qemuu-rhel6hvm-intel pass
 test-amd64-i386-xl-qemuu-winxpsp3-vcpus1 pass
 test-amd64-amd64-xl-qemuu-winxpsp3   pass
 test-amd64-i386-xl-qemuu-winxpsp3pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

+ branch=ovmf
+ revision=13af4ab06516eefb40fb985467141e09efe9c58b
+ . cri-lock-repos
++ . cri-common
+++ . cri-getconfig
+++ umask 002
+++ getconfig Repos
+++ perl -e '
use Osstest;
readglobalconfig();
print $c{Repos} or die $!;
'
++ repos=/home/osstest/repos
++ repos_lock=/home/osstest/repos/lock
++ '[' x '!=' x/home/osstest/repos/lock ']'
++ OSSTEST_REPOS_LOCK_LOCKED=/home/osstest/repos/lock
++ exec with-lock-ex -w /home/osstest/repos/lock ./ap-push ovmf 
13af4ab06516eefb40fb985467141e09efe9c58b
+ branch=ovmf
+ revision=13af4ab06516eefb40fb985467141e09efe9c58b
+ . cri-lock-repos
++ . cri-common
+++ . cri-getconfig
+++ umask 002
+++ getconfig Repos
+++ perl -e '
use Osstest;
readglobalconfig();
print $c{Repos} or die $!;
'
++ repos=/home/osstest/repos
++ repos_lock=/home/osstest/repos/lock
++ '[' x/home/osstest/repos/lock '!=' x/home/osstest/repos/lock ']'
+ . cri-common
++ . cri-getconfig
++ umask 002
+ select_xenbranch
+ case $branch in
+ tree=ovmf
+ xenbranch=xen-unstable
+ '[' xovmf = xlinux ']'
+ 

Re: [Xen-devel] RIP MTRR - status update for upcoming v4.2

2015-06-12 Thread Jan Beulich
 On 12.06.15 at 01:23, toshi.k...@hp.com wrote:
 There are two usages on MTRRs:
  1) MTRR entries set by firmware
  2) MTRR entries set by OS drivers
 
 We can obsolete 2), but we have no control over 1).  As UEFI firmwares
 also set this up, this usage will continue to stay.  So, we should not
 get rid of the MTRR code that looks up the MTRR entries, while we have
 no need to modify them.
 
 Such MTRR entries provide safe guard to /dev/mem, which allows
 privileged user to access a range that may require UC mapping while
 the /dev/mem driver blindly maps it with WB.  MTRRs converts WB to UC in
 such a case.

But it wouldn't be impossible to simply read the MTRRs upon boot,
store the information, disable MTRRs, and correctly use PAT to
achieve the same effect (i.e. the blindly maps part of course
would need fixing).

 UEFI memory table has memory attribute, which describes cache types
 supported in physical memory ranges.  However, this information gets
 lost when it it is converted to e820 table.

I'm afraid you rather don't want to trust that information, as
firmware vendors frequently screw it up.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v3][PATCH 10/16] tools: extend xc_assign_device() to support rdm reservation policy

2015-06-12 Thread Chen, Tiejun

On 2015/6/11 18:02, Tian, Kevin wrote:

From: Chen, Tiejun
Sent: Thursday, June 11, 2015 9:15 AM

This patch passes rdm reservation policy to xc_assign_device() so the policy
is checked when assigning devices to a VM.

Signed-off-by: Tiejun Chen tiejun.c...@intel.com
---
  tools/libxc/include/xenctrl.h   |  3 ++-
  tools/libxc/xc_domain.c |  6 +-
  tools/libxl/libxl_pci.c |  3 ++-
  tools/ocaml/libs/xc/xenctrl_stubs.c | 18 ++
  tools/python/xen/lowlevel/xc/xc.c   | 29 +++--
  5 files changed, 42 insertions(+), 17 deletions(-)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 6c01362..7fd60d5 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -2078,7 +2078,8 @@ int xc_hvm_destroy_ioreq_server(xc_interface *xch,
  /* HVM guest pass-through */
  int xc_assign_device(xc_interface *xch,
   uint32_t domid,
- uint32_t machine_sbdf);
+ uint32_t machine_sbdf,
+ uint32_t flag);

  int xc_get_device_group(xc_interface *xch,
   uint32_t domid,
diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
index 4f96e1b..19127ec 100644
--- a/tools/libxc/xc_domain.c
+++ b/tools/libxc/xc_domain.c
@@ -1697,7 +1697,8 @@ int xc_domain_setdebugging(xc_interface *xch,
  int xc_assign_device(
  xc_interface *xch,
  uint32_t domid,
-uint32_t machine_sbdf)
+uint32_t machine_sbdf,
+uint32_t flag)
  {
  DECLARE_DOMCTL;

@@ -1705,6 +1706,7 @@ int xc_assign_device(
  domctl.domain = domid;
  domctl.u.assign_device.dev = XEN_DOMCTL_DEV_PCI;
  domctl.u.assign_device.u.pci.machine_sbdf = machine_sbdf;
+domctl.u.assign_device.flag = flag;

  return do_domctl(xch, domctl);
  }
@@ -1792,6 +1794,8 @@ int xc_assign_dt_device(

  domctl.u.assign_device.dev = XEN_DOMCTL_DEV_DT;
  domctl.u.assign_device.u.dt.size = size;
+/* DT doesn't own any RDM. */
+domctl.u.assign_device.flag = XEN_DOMCTL_DEV_NO_RDM;


still not clear about this NO_RDM flag. If a device-tree device doesn't
own RDM, the hypervisor will know it. Why do we require toolstack
to tell hypervisor not use it?


I think an explicit flag can make this sort of case identified, right? 
And other flags brings easily some confusions, or even a potential risk 
in the future.


Thanks
Tiejun




  set_xen_guest_handle(domctl.u.assign_device.u.dt.path, path);

  rc = do_domctl(xch, domctl);
diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
index e0743f8..632c15e 100644
--- a/tools/libxl/libxl_pci.c
+++ b/tools/libxl/libxl_pci.c
@@ -894,6 +894,7 @@ static int do_pci_add(libxl__gc *gc, uint32_t domid,
libxl_device_pci *pcidev, i
  FILE *f;
  unsigned long long start, end, flags, size;
  int irq, i, rc, hvm = 0;
+uint32_t flag = XEN_DOMCTL_DEV_RDM_RELAXED;

  if (type == LIBXL_DOMAIN_TYPE_INVALID)
  return ERROR_FAIL;
@@ -987,7 +988,7 @@ static int do_pci_add(libxl__gc *gc, uint32_t domid,
libxl_device_pci *pcidev, i

  out:
  if (!libxl_is_stubdom(ctx, domid, NULL)) {
-rc = xc_assign_device(ctx-xch, domid, pcidev_encode_bdf(pcidev));
+rc = xc_assign_device(ctx-xch, domid, pcidev_encode_bdf(pcidev), 
flag);
  if (rc  0  (hvm || errno != ENOSYS)) {
  LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, xc_assign_device 
failed);
  return ERROR_FAIL;
diff --git a/tools/ocaml/libs/xc/xenctrl_stubs.c 
b/tools/ocaml/libs/xc/xenctrl_stubs.c
index 64f1137..317bf75 100644
--- a/tools/ocaml/libs/xc/xenctrl_stubs.c
+++ b/tools/ocaml/libs/xc/xenctrl_stubs.c
@@ -1172,12 +1172,19 @@ CAMLprim value stub_xc_domain_test_assign_device(value
xch, value domid, value d
CAMLreturn(Val_bool(ret == 0));
  }

-CAMLprim value stub_xc_domain_assign_device(value xch, value domid, value desc)
+static int domain_assign_device_rdm_flag_table[] = {
+XEN_DOMCTL_DEV_NO_RDM,
+XEN_DOMCTL_DEV_RDM_RELAXED,
+XEN_DOMCTL_DEV_RDM_STRICT,
+};
+
+CAMLprim value stub_xc_domain_assign_device(value xch, value domid, value desc,
+value rflag)
  {
-   CAMLparam3(xch, domid, desc);
+   CAMLparam4(xch, domid, desc, rflag);
int ret;
int domain, bus, dev, func;
-   uint32_t sbdf;
+   uint32_t sbdf, flag;

domain = Int_val(Field(desc, 0));
bus = Int_val(Field(desc, 1));
@@ -1185,7 +1192,10 @@ CAMLprim value stub_xc_domain_assign_device(value xch,
value domid, value desc)
func = Int_val(Field(desc, 3));
sbdf = encode_sbdf(domain, bus, dev, func);

-   ret = xc_assign_device(_H(xch), _D(domid), sbdf);
+   ret = Int_val(Field(rflag, 0));
+   flag = domain_assign_device_rdm_flag_table[ret];
+
+   ret = xc_assign_device(_H(xch), _D(domid), sbdf, flag);

if (ret  0)
failwith_xc(_H(xch));
diff --git 

Re: [Xen-devel] [v3][PATCH 15/16] xen/vtd: enable USB device assignment

2015-06-12 Thread Chen, Tiejun

On 2015/6/11 18:22, Tian, Kevin wrote:

From: Chen, Tiejun
Sent: Thursday, June 11, 2015 9:15 AM

Before we refine RMRR mechanism, USB RMRR may conflict with guest bios
region so we always ignore USB RMRR.


If USB RMRR conflicts with guest bios, the conflict is always there
before and after your refinement. :-)


Yeah :)




Now this can be gone when we enable
pci_force to check/reserve RMRR.


So what about this?

USB RMRR may conflict with guest bios region so we always ignore
USB RMRR. But now this can be checked to handle after we introduce
our policy mechanism.



Signed-off-by: Tiejun Chen tiejun.c...@intel.com


Acked-by: Kevin Tian kevin.t...@intel.com except one small comment below


---
  xen/drivers/passthrough/vtd/dmar.h  |  1 -
  xen/drivers/passthrough/vtd/iommu.c | 11 ++-
  xen/drivers/passthrough/vtd/utils.c |  7 ---
  3 files changed, 2 insertions(+), 17 deletions(-)

diff --git a/xen/drivers/passthrough/vtd/dmar.h 
b/xen/drivers/passthrough/vtd/dmar.h
index af1feef..af205f5 100644
--- a/xen/drivers/passthrough/vtd/dmar.h
+++ b/xen/drivers/passthrough/vtd/dmar.h
@@ -129,7 +129,6 @@ do {\

  int vtd_hw_check(void);
  void disable_pmr(struct iommu *iommu);
-int is_usb_device(u16 seg, u8 bus, u8 devfn);
  int is_igd_drhd(struct acpi_drhd_unit *drhd);

  #endif /* _DMAR_H_ */
diff --git a/xen/drivers/passthrough/vtd/iommu.c
b/xen/drivers/passthrough/vtd/iommu.c
index d7c9e1c..d3233b8 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -2229,11 +2229,9 @@ static int reassign_device_ownership(
  /*
   * If the device belongs to the hardware domain, and it has RMRR, don't
   * remove it from the hardware domain, because BIOS may use RMRR at
- * booting time. Also account for the special casing of USB below (in
- * intel_iommu_assign_device()).
+ * booting time.


this code is run-time right?


According to one associated commit,

commit 8b99f4400b695535153dcd5d949b3f63602ca8bf
Author: Jan Beulich jbeul...@suse.com
Date:   Fri Oct 10 10:54:21 2014 +0200

VT-d: fix RMRR related error handling

- reassign_device_ownership() now tears down RMRR mappings (for other
  than Dom0)
- to facilitate that, rmrr_identity_mapping() now deals with both
  establishing and tearing down of these mappings (the open coded
  equivalent in intel_iommu_remove_device() is being replaced at once)
- intel_iommu_assign_device() now unrolls the assignment upon RMRR
  mapping errors
- intel_iommu_add_device() now returns consistent values upon RMRR
  mapping failures (was: failure when last iteration ran into a
  problem, success otherwise)
- intel_iommu_remove_device() no longer special cases Dom0 (it only
  ever gets called for devices removed from the _system_, not a domain)
- rmrr_identity_mapping() now returns a proper error indicator instead
  of -1 when intel_iommu_map_page() failed

Signed-off-by: Jan Beulich jbeul...@suse.com
Acked-by: Kevin Tian kevin.t...@intel.com

This chunk of codes resides inside intel_iommu_remove_device() so I 
think this shouldn't be for a running domain.


Thanks
Tiejun




   */
-if ( !is_hardware_domain(source) 
- !is_usb_device(pdev-seg, pdev-bus, pdev-devfn) )
+if ( !is_hardware_domain(source) )
  {
  const struct acpi_rmrr_unit *rmrr;
  u16 bdf;
@@ -2283,13 +2281,8 @@ static int intel_iommu_assign_device(
  if ( ret )
  return ret;

-/* FIXME: Because USB RMRR conflicts with guest bios region,
- * ignore USB RMRR temporarily.
- */
  seg = pdev-seg;
  bus = pdev-bus;
-if ( is_usb_device(seg, bus, pdev-devfn) )
-return 0;

  /* Setup rmrr identity mapping */
  for_each_rmrr_device( rmrr, bdf, i )
diff --git a/xen/drivers/passthrough/vtd/utils.c 
b/xen/drivers/passthrough/vtd/utils.c
index bd14c02..b8a077f 100644
--- a/xen/drivers/passthrough/vtd/utils.c
+++ b/xen/drivers/passthrough/vtd/utils.c
@@ -29,13 +29,6 @@
  #include extern.h
  #include asm/io_apic.h

-int is_usb_device(u16 seg, u8 bus, u8 devfn)
-{
-u16 class = pci_conf_read16(seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
-PCI_CLASS_DEVICE);
-return (class == 0xc03);
-}
-
  /* Disable vt-d protected memory registers. */
  void disable_pmr(struct iommu *iommu)
  {
--
1.9.1





___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH V4 2/7] libxl_read_file_contents: add new entry to read sysfs file

2015-06-12 Thread Chun Yan Liu


 On 6/12/2015 at 12:16 AM, in message
21881.46148.466883.923...@mariner.uk.xensource.com, Ian Jackson
ian.jack...@eu.citrix.com wrote: 
 Chunyan Liu writes ([Xen-devel] [PATCH V4 2/7] libxl_read_file_contents: add 
  
 new entry to read sysfs file): 
  Sysfs file has size=4096 but actual file content is less than that. 
  Current libxl_read_file_contents will treat it as error when file size 
  and actual file content differs, so reading sysfs file content with 
  this function always fails. 
   
  Add a new entry libxl_read_sysfs_file_contents to handle sysfs file 
  specially. It would be used in later pvusb work. 
  
 I think this patch is roughly right, but: 
  
  -int libxl_read_file_contents(libxl_ctx *ctx, const char *filename, 
  - void **data_r, int *datalen_r) { 
  +static int libxl_read_file_contents_core(libxl_ctx *ctx, const char  
 *filename, 
  + void **data_r, int *datalen_r, 
  + bool is_sysfs_file) 
  
 I would prefer a functional rather than contextual name for the 
 variabvle is_sysfs_file.  How about `tolerate_shrinking_file' ? 

OK.

  
  @@ -360,15 +362,16 @@ int libxl_read_file_contents(libxl_ctx *ctx, const  
 char *filename, 

   if (stab.st_size  data_r) { 
   data = malloc(datalen); 
  +memset(data, 0, datalen); 
  
 What is this for ? 

I found sometimes reading sysfs file contents, at the end, there is
some random character. With this line, there is no problem then.

  
   if (!data) goto xe; 

   rs = fread(data, 1, datalen, f); 
  -if (rs != datalen) { 
  +if (rs != datalen  !(feof(f)  is_sysfs_file)) { 
   if (ferror(f)) 
   LOGE(ERROR, failed to read %s, filename); 
   else if (feof(f)) 
  
 I think it would be better to handle the special case here, with 
 something like 
if (!tolerate_shrinking_file) 
error stuff 
else { 
assert(datalen_r); 
 and to move the `goto xe' into the individual branches. 

OK. Will update.

  
  
 Is there any risk that the file is actually bigger than advertised, 
 rather than smaller ? 

For sysfs file, couldn't be bigger.

  
  diff --git a/tools/libxl/libxl_utils.h b/tools/libxl/libxl_utils.h 
  index 1c1761d..7639662 100644 
  --- a/tools/libxl/libxl_utils.h 
  +++ b/tools/libxl/libxl_utils.h 
 ... 
  +int libxl_read_sysfs_file_contents(libxl_ctx *ctx, const char *filename, 
  +   void **data_r, int *datalen_r); 
  
 I think this is a function with sufficiently odd semantics, and a 
 sufficiently internal purpose, that it should probably not be exposed 
 in the API. 

So move to libxl_internal.h? Or not hacking here but just adding an internal
function in libxl_pvusb.c with repeated codes?

  
 Ian
  
 ___ 
 Xen-devel mailing list 
 Xen-devel@lists.xen.org 
 http://lists.xen.org/xen-devel 
  
  



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCHv11 4/4] gnttab: use per-VCPU maptrack free lists

2015-06-12 Thread Jan Beulich
 On 11.06.15 at 17:28, t...@xen.org wrote:
 At 15:51 +0100 on 05 Jun (1433519478), Jan Beulich wrote:
 Iirc before both of these changes, and the v10 ones imo should
 have invalidated it. Tim, I'm particularly trying to understand
 whether you're okay with the original's (potentially even heavier)
 resource use and/or this version's (risking to run out of maptrack
 entries _much_ earlier than currently).
 
 The concern with the earlier version being that the maximum maptrack
 limit gets a lot higher with many vcpus?  I was OK with that.  There
 are a lot of things that scale with #vcpus, and xenheap pages are not
 particularly scarce any more.  So let's say I don't find one 128-vcpu
 guest much different from 128 1-vcpu guests for this purpose.

I certainly can see that resource consumption may scale with the
number of vCPU-s a guest has. But whether that ought to apply to
everything, i.e. including all per-domain (rather than per-vCPU)
resources (which the maptrack table is only an example of) I'm not
sure about. In the end, if we were to talk about v9 and the default
of 256 frames, a 1024-vCPU DomU could consume 1Gb of memory
for its maptrack even if (quite likely) it doesn't serve any other
guest. IOW to me it would seem that a necessary prereq to this
change would be to make the limit a per-domain setting, with only
Dom0 getting its limit bumped by default (and even then the
at-least-one-page-per-vCPU requirement undermines that, i.e. a
slow path to avoid going over the set limit would still seem
necessary).

Additionally, with the maptrack pages no longer shared across
vCPU-s, running into an allocation failure namely on ballooned
setups (where Xen typically doesn't have much memory available,
since the ballooning ordinarily only accounts for the immediate
domain needs) is going to become quite a bit more likely. Yes, I
already hear David saying I don't care about the ballooning case,
but no, I don't think we can simply ignore this case (unless we
want to declare ballooning a deprecated, no longer supported
feature).

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH V4 5/7] xl: add pvusb commands

2015-06-12 Thread Juergen Gross

On 06/12/2015 10:03 AM, Chun Yan Liu wrote:




On 6/12/2015 at 03:37 PM, in message 557a8c57.1040...@suse.com, Juergen Gross

jgr...@suse.com wrote:

On 06/10/2015 05:20 AM, Chunyan Liu wrote:


...


+} else if (MATCH_OPTION(ports, argv[optind], oparg)) {
+usbctrl.ports = atoi(oparg);


Same here for number of ports. Otherwise you could blow up xenstore by
e.g. specifying 2 billion ports here.


Will add check. What's the supported max ports?


31. I'm about to send a patch updating the pvusb interface description
which will then contain a define for that (USBIF_MAX_PORTNR).


Juergen


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v3][PATCH 13/16] tools/libxl: detect and avoid conflicts with RDM

2015-06-12 Thread Chen, Tiejun

On 2015/6/11 18:19, Tian, Kevin wrote:

From: Chen, Tiejun
Sent: Thursday, June 11, 2015 9:15 AM

While building a VM, HVM domain builder provides struct hvm_info_table{}
to help hvmloader. Currently it includes two fields to construct guest
e820 table by hvmloader, low_mem_pgend and high_mem_pgend. So we should
check them to fix any conflict with RAM.

RMRR can reside in address space beyond 4G theoretically, but we never
see this in real world. So in order to avoid breaking highmem layout
we don't solve highmem conflict. Note this means highmem rmrr could still
be supported if no conflict.

But in the case of lowmem, RMRR probably scatter the whole RAM space.
Especially multiple RMRR entries would worsen this to lead a complicated
memory layout. And then its hard to extend hvm_info_table{} to work
hvmloader out. So here we're trying to figure out a simple solution to
avoid breaking existing layout. So when a conflict occurs,

 #1. Above a predefined boundary (default 2G)
 - move lowmem_end below reserved region to solve conflict;

 #2. Below a predefined boundary (default 2G)
 - Check strict/relaxed policy.
 strict policy leads to fail libxl. Note when both policies
 are specified on a given region, 'strict' is always preferred.
 relaxed policy issue a warning message and also mask this entry 
INVALID
 to indicate we shouldn't expose this entry to hvmloader.

Note this predefined boundary can be changes with the parameter
rdm_mem_boundary in .cfg file.

Signed-off-by: Tiejun Chen tiejun.c...@intel.com


Reviewed-by: Kevin Tian kevint.t...@intel.com

One comment though. could you be consistent to use RDM in the code?
RMRR  is just an example of RDM...


Sure.

Thanks
Tiejun





---
  docs/man/xl.cfg.pod.5  |  21 
  tools/libxc/xc_hvm_build_x86.c |   5 +-
  tools/libxl/libxl.h|   6 +
  tools/libxl/libxl_create.c |   6 +-
  tools/libxl/libxl_dm.c | 255
+
  tools/libxl/libxl_dom.c|  11 +-
  tools/libxl/libxl_internal.h   |  11 +-
  tools/libxl/libxl_types.idl|   8 ++
  tools/libxl/xl_cmdimpl.c   |   3 +
  9 files changed, 322 insertions(+), 4 deletions(-)

diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
index 638b350..6fd2370 100644
--- a/docs/man/xl.cfg.pod.5
+++ b/docs/man/xl.cfg.pod.5
@@ -767,6 +767,27 @@ to a given device, and strict is default here.

  Note this would override global Brdm option.

+=item Brdm_mem_boundary=MBYTES
+
+Number of megabytes to set a boundary for checking rdm conflict.
+
+When RDM conflicts with RAM, RDM probably scatter the whole RAM space.
+Especially multiple RMRR entries would worsen this to lead a complicated
+memory layout. So here we're trying to figure out a simple solution to
+avoid breaking existing layout. So when a conflict occurs,
+
+#1. Above a predefined boundary
+- move lowmem_end below reserved region to solve conflict;
+
+#2. Below a predefined boundary
+- Check strict/relaxed policy.
+strict policy leads to fail libxl. Note when both policies
+are specified on a given region, 'strict' is always preferred.
+relaxed policy issue a warning message and also mask this entry 
INVALID
+to indicate we shouldn't expose this entry to hvmloader.
+
+Here the default is 2G.
+
  =back

  =back
diff --git a/tools/libxc/xc_hvm_build_x86.c b/tools/libxc/xc_hvm_build_x86.c
index 0e98c84..5142578 100644
--- a/tools/libxc/xc_hvm_build_x86.c
+++ b/tools/libxc/xc_hvm_build_x86.c
@@ -21,6 +21,7 @@
  #include stdlib.h
  #include unistd.h
  #include zlib.h
+#include assert.h

  #include xg_private.h
  #include xc_private.h
@@ -270,7 +271,7 @@ static int setup_guest(xc_interface *xch,

  elf_parse_binary(elf);
  v_start = 0;
-v_end = args-mem_size;
+v_end = args-lowmem_end;

  if ( nr_pages  target_pages )
  memflags |= XENMEMF_populate_on_demand;
@@ -754,6 +755,8 @@ int xc_hvm_build_target_mem(xc_interface *xch,
  args.mem_size = (uint64_t)memsize  20;
  args.mem_target = (uint64_t)target  20;
  args.image_file_name = image_name;
+if ( args.mmio_size == 0 )
+args.mmio_size = HVM_BELOW_4G_MMIO_LENGTH;

  return xc_hvm_build(xch, domid, args);
  }
diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index 0a7913b..a6212fb 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -858,6 +858,12 @@ const char *libxl_defbool_to_string(libxl_defbool b);
  #define LIBXL_TIMER_MODE_DEFAULT -1
  #define LIBXL_MEMKB_DEFAULT ~0ULL

+/*
+ * We'd like to set a memory boundary to determine if we need to check
+ * any overlap with reserved device memory.
+ */
+#define LIBXL_RDM_MEM_BOUNDARY_MEMKB_DEFAULT (2048 * 1024)
+
  #define LIBXL_MS_VM_GENID_LEN 16
  typedef struct {
  uint8_t bytes[LIBXL_MS_VM_GENID_LEN];
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index 6c8ec63..0438731 100644
--- 

[Xen-devel] [PATCH v8 0/8] Support more than 8 vcpus on arm64 with GICv3

2015-06-12 Thread Chen Baozi
From: Chen Baozi baoz...@gmail.com

Currently the number of vcpus on arm64 with GICv3 is limited up to 8 due
to the fixed size of redistributor mmio region. Increasing the size
makes the number expand to 16 because of AFF0 restriction on GICv3.
To create a guest up to 128 vCPUs, which is the maxium number that GIC-500
can support, this patchset uses the AFF1 information to create a mapping
relation between vCPUID and vMPIDR and deals with the related issues.

These patches are written based upon Julien's GICv2 on GICv3 series
and the IROUTER emulation cleanup patch.

Changes from V7:
* Updates some commit logs
* Drop aff2 and aff3 in struct sgi_target for it is unused.
* Minor changes according to previous reviews.

Changes from V6:
* Use the new 'struct sgi_target' instead of cpumask_t in vgic_to_sgi.
* Make the domain_max_vcpus to return MAX_VIRT_CPUS and avoid to split
  arch_domain_init.

Changes form V5:
* Rework gicv3_sgir_to_cpumask in #5
* Rework #8 to split arch_domain_create into two parts:
  - arch_domain_preinit to initialise vgic_ops before evtchn_init is
called
  - the rest of logic remains in arch_domain_create
* Use a field value in struct vgic_ops instead of the function point
  for max_vcpus.
* Minor changes according to previous reviews.

Changes from V4:
* Split the patch 4/8 of V3 into two part:
  - Use cpumask_t type for vcpu_mask in vgic_to_sgi.
  - Use AFF1 when translating ICC_SGI1R_EL1 to cpumask.
* Use a more efficient algorithm when calculate cpumask.
* Add a patch to call arch_domain_create before evtchn_init, because
  evtchn_init needs vgic info which is initialised during
  acrh_domain_create.
* Get the max vcpu info from vgic_ops.
* Minor changes according to previous reviews.

Changes from V3:
* Drop the wrong patch that altering domain_max_vcpus to a macro.
* Change the domain_max_vcpus to return value accodring to the version
  of the vGIC in used.

Changes from V2:
* Reorder the patch which increases MAX_VIRT_CPUS to the last to make
  this series bisectable.
* Drop the dynamic re-distributor region allocation patch in tools.
* Use cpumask_t type instead of unsigned long in vgic_to_sgi and do the
  translation from GICD_SGIR to vcpu_mask in both vGICv2 and vGICv3.
* Make domain_max_vcpus be alias of max_vcpus in struct domain

Changes from V1:
* Use the way that expanding the GICR address space to support up to 128
  redistributor in guest memory layout rather than use the dynamic
  allocation.
* Add support to include AFF1 information in vMPIDR/logical CPUID.

Chen Baozi (8):
  xen/arm: gic-v3: Increase the size of GICR in address space for guest
  xen/arm: Add functions of mapping between vCPUID and virtual affinity
  xen/arm: Use the new functions for vCPUID/vaffinity transformation
  xen/arm: Use AFF1 when translating ICC_SGI1R_EL1 to cpumask
  tools/libxl: Set 'reg' of cpu node equal to MPIDR affinity for domU
  xen/arm: Set 'reg' of cpu node for dom0 to match MPIDR's affinity
  xen/arm: make domain_max_vcpus return value from vgic_ops
  xen/arm64: increase MAX_VIRT_CPUS to 128 on arm64

 tools/libxl/libxl_arm.c   | 14 ++--
 xen/arch/arm/domain.c | 20 -
 xen/arch/arm/domain_build.c   | 14 +---
 xen/arch/arm/vgic-v2.c|  8 ---
 xen/arch/arm/vgic-v3.c| 18 ++--
 xen/arch/arm/vgic.c   | 45 +--
 xen/arch/arm/vpsci.c  |  5 +
 xen/include/asm-arm/config.h  |  4 
 xen/include/asm-arm/domain.h  | 39 +++--
 xen/include/asm-arm/gic_v3_defs.h |  3 +++
 xen/include/asm-arm/vgic.h|  9 +++-
 xen/include/public/arch-arm.h |  4 ++--
 12 files changed, 129 insertions(+), 54 deletions(-)

-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v8 1/8] xen/arm: gic-v3: Increase the size of GICR in address space for guest

2015-06-12 Thread Chen Baozi
From: Chen Baozi baoz...@gmail.com

Currently it only supports up to 8 vCPUs. Increase the region to hold
up to 128 vCPUs, which is the maximum number that GIC-500 supports.

Signed-off-by: Chen Baozi baoz...@gmail.com
Reviewed-by: Julien Grall julien.gr...@citrix.com
Acked-by: Ian Campbell ian.campb...@citrix.com
---
 xen/include/public/arch-arm.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/xen/include/public/arch-arm.h b/xen/include/public/arch-arm.h
index c029e0f..ec0c261 100644
--- a/xen/include/public/arch-arm.h
+++ b/xen/include/public/arch-arm.h
@@ -388,8 +388,8 @@ struct xen_arch_domainconfig {
 #define GUEST_GICV3_RDIST_STRIDE   0x2ULL
 #define GUEST_GICV3_RDIST_REGIONS  1
 
-#define GUEST_GICV3_GICR0_BASE 0x0302ULL/* vCPU0 - vCPU7 */
-#define GUEST_GICV3_GICR0_SIZE 0x0010ULL
+#define GUEST_GICV3_GICR0_BASE 0x0302ULL/* vCPU0 - vCPU127 */
+#define GUEST_GICV3_GICR0_SIZE 0x0100ULL
 
 /*
  * 16MB == 4096 pages reserved for guest to use as a region to map its
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] PCI Passthrough ARM Design : Draft1

2015-06-12 Thread Ian Campbell
On Thu, 2015-06-11 at 14:38 -0700, Manish Jaggi wrote:
 
 On Wednesday 10 June 2015 12:21 PM, Julien Grall wrote:
  Hi,
 
  On 10/06/2015 08:45, Ian Campbell wrote:
  4. DomU access / assignment PCI device
  --
  When a device is attached to a domU, provision has to be made such that
  it can
  access the MMIO space of the device and xen is able to identify the 
  mapping
  between guest bdf and system bdf. Two hypercalls are introduced
 
  I don't think we want/need new hypercalls here, the same existing
  hypercalls which are used on x86 should be suitable.
 I think both the hypercalls are necessary
 a) the mapping of guest bdf to actual sbdf is required as domU accesses 
 for GIC are trapped and not handled by pciback.
 A device say 1:0:0.3 is assigned in domU at 0:0:0.3. This is the bestway 
 I could find that works.
 
 b) map_mmio call is issued just after the device is added on the pcu bus 
 (in case of domU)
 The function register_xen_pci_notifier (drivers/xen/pci.c) is modified 
 such that notification is received in domU and dom0.

In which please please add to the document a discussion of the current
interfaces and why they are not suitable.

  Beware that the 1:1 mapping doesn't fit with the current guest memory 
  layout which is pre-defined at Xen build time. So you would also have 
  to make it dynamically or decide to use the same memory layout as the 
  host.
 If same layout as host used, would there be any issue?

I'm not sure that a 1:1 mapping is any different to the host layout. But
in any case, the host layout also doesn't match the guest layout, so it
has the same issues.

Ian.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v8 3/8] xen/arm: Use the new functions for vCPUID/vaffinity transformation

2015-06-12 Thread Chen Baozi
From: Chen Baozi baoz...@gmail.com

There are 3 places to change:

* Initialise vMPIDR value in vcpu_initialise()
* Find the vCPU from vMPIDR affinity information when accessing GICD
  registers in vGIC
* Find the vCPU from vMPIDR affinity information when booting with vPSCI
  in vGIC
  - Both PSCI 0.1 and PSCI 0.2 are modified to respect the MPIDR like.

Signed-off-by: Chen Baozi baoz...@gmail.com
Reviewed-by: Julien Grall julien.gr...@citrix.com
---
PSCI 0.1 Section 6.3 (ARM DEN 0022A):

Ideally platform discovery mechanism such as firmware tables would be
used by secure firmware to describe the set of valid CPUIDs to the
hypervisor or Rich OS, if the former is not present. The hypervisor in
turn can create and supply virtual discovery mechanisms to its guests.

Thus, we consider CPUID is equal to the 'reg' register in DT (which
is an MPIDR-like value).

 xen/arch/arm/domain.c  | 6 +-
 xen/arch/arm/vgic-v3.c | 2 +-
 xen/arch/arm/vpsci.c   | 5 +
 3 files changed, 3 insertions(+), 10 deletions(-)

diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
index 2bde26e..0cf147c 100644
--- a/xen/arch/arm/domain.c
+++ b/xen/arch/arm/domain.c
@@ -501,11 +501,7 @@ int vcpu_initialise(struct vcpu *v)
 
 v-arch.sctlr = SCTLR_GUEST_INIT;
 
-/*
- * By default exposes an SMP system with AFF0 set to the VCPU ID
- * TODO: Handle multi-threading processor and cluster
- */
-v-arch.vmpidr = MPIDR_SMP | (v-vcpu_id  MPIDR_AFF0_SHIFT);
+v-arch.vmpidr = MPIDR_SMP | vcpuid_to_vaffinity(v-vcpu_id);
 
 v-arch.actlr = READ_SYSREG32(ACTLR_EL1);
 
diff --git a/xen/arch/arm/vgic-v3.c b/xen/arch/arm/vgic-v3.c
index 540f85f..ef9a71a 100644
--- a/xen/arch/arm/vgic-v3.c
+++ b/xen/arch/arm/vgic-v3.c
@@ -61,7 +61,7 @@ static struct vcpu *vgic_v3_irouter_to_vcpu(struct domain *d, 
uint64_t irouter)
 if ( irouter  GICD_IROUTER_SPI_MODE_ANY )
 return d-vcpu[0];
 
-vcpu_id = irouter  MPIDR_AFF0_MASK;
+vcpu_id = vaffinity_to_vcpuid(irouter);
 if ( vcpu_id = d-max_vcpus )
 return NULL;
 
diff --git a/xen/arch/arm/vpsci.c b/xen/arch/arm/vpsci.c
index 5d899be..aebe1e2 100644
--- a/xen/arch/arm/vpsci.c
+++ b/xen/arch/arm/vpsci.c
@@ -32,10 +32,7 @@ static int do_common_cpu_on(register_t target_cpu, 
register_t entry_point,
 int is_thumb = entry_point  1;
 register_t vcpuid;
 
-if( ver == XEN_PSCI_V_0_2 )
-vcpuid = (target_cpu  MPIDR_HWID_MASK);
-else
-vcpuid = target_cpu;
+vcpuid = vaffinity_to_vcpuid(target_cpu);
 
 if ( vcpuid = d-max_vcpus || (v = d-vcpu[vcpuid]) == NULL )
 return PSCI_INVALID_PARAMETERS;
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [Draft F] Xen on ARM vITS Handling

2015-06-12 Thread Ian Campbell
On Fri, 2015-06-12 at 14:07 +0530, Vijay Kilari wrote:

Please could you trim your quotes to only include the bit you are
referring to. Otherwise there is a high chance I will miss a one line
comment in the middle of the thousand lines of quoted matter.

  The `GITS_BASER0` will be setup to request sufficient memory for a
  device table consisting of entries of:
 
  struct vdevice_table {
  uint64_t vitt_ipa;
  uint32_t vitt_size;
  uint32_t padding;
  };
 
   How about adding valid bit to know if the entry is valid or not?

I suggest to use vitt_ipa == INVALID_PADDR to signal this rather than
using another bit.

  ## Handling of unrouted/spurious LPIs
 
  Since there is no 1:1 link between a `vLPI` and `pLPI` enabling and
  disabling of phyiscal LPIs cannot be driven from the state of an
  associated vLPI.
 
  Each `pLPI` is routed and enabled during device assignment, therefore
  it is possible to receive a physical LPI which has yet to be routed
  (via a `vITS`) to a `vLPI`.
 
 Why do we need to enable LPIs during device assignment?
 Can't we do it only on LPI configuration update, which is trapped in
 Xen as mentioned in 7.8? ( ## Enabling and disabling LPIs)

Quoting the first sentence/paragraph of this section:
Since there is no 1:1 link between a `vLPI` and `pLPI` enabling
and disabling of phyiscal LPIs cannot be driven from the state
of an associated vLPI.

To expand on that: The vITT can map multiple (vDevice,vEvent) pairs to
the same LPI, and each of those (vDevice,vEvent) pairs is related to a
different (pDevice,pEvent) which in turn has a unique pLPI associated
with it. Thus a vLPI can be associated with more than one pLPI.

Enumerating all pLPIs associated with a given vLPI would be expensive (a
complete walk of the vITT).

In addition if it were possible to do so we would also need to manage
enabling/disabling the pLPI in several other places that in vPLI cfg
traps, specifically MAPC and MAPD at least.

So pLPIs must be routed at device assignment time because in the vLPI
configuration table trap there is no mapping back to a single pLPI.

Ian.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v3][PATCH 04/16] xen/passthrough: extend hypercall to support rdm reservation policy

2015-06-12 Thread Jan Beulich
 On 12.06.15 at 11:20, tiejun.c...@intel.com wrote:
 On 2015/6/12 16:45, Jan Beulich wrote:
 On 12.06.15 at 08:31, tiejun.c...@intel.com wrote:
 On 2015/6/11 17:28, Tian, Kevin wrote:
 From: Chen, Tiejun
 Sent: Thursday, June 11, 2015 9:15 AM
 @@ -1940,7 +1942,8 @@ static int intel_iommu_remove_device(u8 devfn, 
 struct 
 pci_dev
 *pdev)
 PCI_DEVFN2(bdf) != devfn )
continue;

 -rmrr_identity_mapping(pdev-domain, 0, rmrr);
 +rmrr_identity_mapping(pdev-domain, 0, rmrr,
 +  XEN_DOMCTL_DEV_RDM_RELAXED);

 ditto

 It doesn't matter when we're trying to remove a device since we don't
 care this flag.

 In such a case it helps to add a brief comment saying that the precise
 value passed is irrelevant. Or maybe this could be expressed by
 
 Okay.
 
 folding this and the map parameters of the function (in which case it
 might become self-documenting)?

 
 Sorry, I don't know exactly how to implement this idea. Have we any 
 similar example on Xen side?

No idea what you're after. What we have with your change are
tuples like
(map, relaxed)
(map, strict)
(unmap, ignored)
Clearly these can be represented with three distinct numbers. I.e.
along with using XEN_DOMCTL_DEV_RDM_* without the map
boolean, (unmap, ignored) could e.g. be expressed by passing
-1.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: zero ioreq page only one time

2015-06-12 Thread Wen Congyang
On 06/12/2015 03:41 PM, Paul Durrant wrote:
 -Original Message-
 From: Wen Congyang [mailto:we...@cn.fujitsu.com]
 Sent: 12 June 2015 04:22
 To: Paul Durrant; Yang Hongyang; Andrew Cooper; xen-devel@lists.xen.org
 Cc: Wei Liu; Ian Campbell; guijianf...@cn.fujitsu.com;
 yunhong.ji...@intel.com; Eddie Dong; rshri...@cs.ubc.ca; Ian Jackson
 Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: zero ioreq
 page only one time

 On 06/11/2015 09:25 PM, Paul Durrant wrote:
 -Original Message-
 From: Yang Hongyang [mailto:yan...@cn.fujitsu.com]
 Sent: 11 June 2015 13:59
 To: Paul Durrant; Wen Congyang; Andrew Cooper; xen-
 de...@lists.xen.org
 Cc: Wei Liu; Ian Campbell; guijianf...@cn.fujitsu.com;
 yunhong.ji...@intel.com; Eddie Dong; rshri...@cs.ubc.ca; Ian Jackson
 Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: zero
 ioreq
 page only one time



 On 06/11/2015 06:20 PM, Paul Durrant wrote:
 -Original Message-
 From: Wen Congyang [mailto:we...@cn.fujitsu.com]
 Sent: 11 June 2015 09:48
 To: Paul Durrant; Andrew Cooper; Yang Hongyang; xen-
 de...@lists.xen.org
 Cc: Wei Liu; Ian Campbell; guijianf...@cn.fujitsu.com;
 [...]


 In our implementation, we don't start a new emulator. The codes can
 work,
 but some bugs may be not triggered.


 How do you reconcile the incoming QEMU save record with the running
 emulator state?

 We introduce a qmp command xen-load-devices-
 state(libxl__qmp_restore) which
 can restore the emulator state. The step of resotre emulator state at a
 checkpoint is:

 1. libxl__qmp_stop- vm_stop() in qemu
 2. libxl__qmp_restore - load_vmstate() in qemu
 3. libxl__qmp_resume  - vm_start() in qemu


 Ok, that sounds like the ideal time to hook back into Xen by creating a new
 ioreq server.

 I have some questions about ioreq server:
 1. If we use old version xen and newest version qemu, is it OK? Is default
ioreq server created when the guest is created. xen_create_ioreq_server()
 does
nothing, and xen_get_ioreq_server_info() will get the default ioreq server
 information.
Is it right?
 
 No. It's not compatible in that direction. A new Xen will work with an old 
 QEMU but not the other way round.
 
 2. Why we create a default ioreq server when getting the hvm param if there
 is already a
not default ioreq server?
 
 If something reads the 'legacy' HVM params then that is Xen's trigger to 
 create the default server. Any 'new' emulator should be using the ioreq 
 server hypercalls so the default server will not be needed.

If there are two ioreq servers: default ioreq server, and a ioreq server 
created by emulator. The guest can work it correctly in
this case? Is there any application(not emulator) that uses the libxenctrl 
directly?

Thanks
Wen Congyang

 
 3. In the far end, we will clear the ioreq page, and this ioreq page is used 
 for
 default
ioreq server, is it right?
 
 Yes, AFAIK it's only the 'magic' pages that get cleared at the far end - and 
 that includes the default server pages. Other ioreq servers will have their 
 pages cleared on re-insertion to the P2M at the source end when the server is 
 disabled.
 
   Paul
 

 Thanks
 Wen Congyang


   Paul


Paul

 Thanks
 Wen Congyang


Paul


 Thanks
 Wen Congyang


Paul

 We will set to the guest to a new state, the old state should be
 dropped.

 Thanks
 Wen Congyang


Paul


 Thanks
 Wen Congyang


Paul


 Thanks
 Wen Congyang


 ~Andrew
 .


 .


 .



 ___
 Xen-devel mailing list
 Xen-devel@lists.xen.org
 http://lists.xen.org/xen-devel

 ___
 Xen-devel mailing list
 Xen-devel@lists.xen.org
 http://lists.xen.org/xen-devel
 .


 .



 ___
 Xen-devel mailing list
 Xen-devel@lists.xen.org
 http://lists.xen.org/xen-devel
 .


 --
 Thanks,
 Yang.
 .

 
 .
 


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH OSSTEST] Arrange to upgrade microcode on x86 test hosts.

2015-06-12 Thread Ian Jackson
Ian Campbell writes (Re: [PATCH OSSTEST] Arrange to upgrade microcode on x86 
test hosts.):
 On Thu, 2015-06-11 at 16:39 +0100, Ian Jackson wrote:
  If $c{prop} is undef, the logm will produce a warning.  Maybe you want
  //'' somewhere.
 
 I think that was a stray wip debug thing, I could fix, remove or move it
 below the check of $c{$prop}. 

Any of those would be fine by me.

  This use of caps lock for unexported shell variables is rather
  unidiomatic, don't you think ?
 
 I'm not really in the habit of caring for short scripts, but I'll down
 case them.

I find it confusing, because I want to know what other process is
going to be using them.  Thanks.

  mg-debian-installer-update is idempotent.  Perhaps this script should
  be too ?  After all you can only overwrite something generated today.
 
 True. I'll arrange that.
 
 I'll see if I can also make it deterministic like with the installer
 stuff, which will make spotting differences easier.

Great.

   +for BIN in $AMD_BINS ; do
   +curl -s $LINUX_FW/plain/$BIN  $BIN
   +done
  
  We're really fetching these via the gitweb/cgit/... ?  Isn't that a
  bit fragile ?
 
 A bit, yes. The alternative is to clone the whole of linux-firmware.git.
 I suppose with --depth=1 that might not be so bad. Would you prefer
 that?

Yes.

Will go on to read your v2 just now...

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 COLOPre 06/13] tools/libxl: Introduce a new internal API libxl__domain_unpause()

2015-06-12 Thread Ian Jackson
Ian Campbell writes (Re: [Xen-devel] [PATCH v2 COLOPre 06/13] tools/libxl: 
Introduce a new internal API libxl__domain_unpause()):
 On Thu, 2015-06-11 at 17:09 +0800, Wen Congyang wrote:
  If the public API creates a new AO, it is safe to call it directly?
 
 A public function which takes an ao_how is, I believe, an exception to
 this rule and should be annotated with LIBXL_EXTERNAL_CALLERS_ONLY to
 prevent accidents.

Yes.

 I don't think libxl_domain_unpause is such a function though.

Indeed.

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] libxl backports for 4.4 and 4.5

2015-06-12 Thread Lars Kurth

 On 12 Jun 2015, at 13:01, Ian Jackson ian.jack...@eu.citrix.com wrote:
 
 Ian Jackson writes (Re: [Xen-devel] libxl backports for 4.4 and 4.5):
 I propose to push all of these today, unless you object.  I will
 double-check with you on irc.
 
 Now done.

Thanks
Lars


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [Draft F] Xen on ARM vITS Handling

2015-06-12 Thread Julien Grall



On 12/06/2015 08:55, Ian Campbell wrote:

On Thu, 2015-06-11 at 10:40 +0100, Ian Campbell wrote:

## Command Queue Virtualisation

The command translation/emulation in this design has been arranged to
be as cheap as possible (e.g. in many cases the actions are NOPs),
avoiding previous concerns about the length of time which an emulated
write to a `CWRITER` register may block the vcpu.

The vits will simply track its reader and writer pointers. On write
to `CWRITER` it will immediately and synchronously process all
commands in the queue and update its state accordingly.

It might be possible to implement a rudimentary form of preemption by
periodically (as determined by `hypercall_preempt_check()`) returning
to the guest without incrementing PC but with updated internal
`CREADR` state, meaning it will reexecute the write to `CWRITER` and
we can pickup where we left off for another iteration. This at least
lets us schedule other vcpus etc and prevents a monopoly.


In the presence of multiple VCPUs writing to GITS_CWRITER preemption
actually gets pretty subtle. I suggest leaving it out for now.


Would it be possible to do it with re-doing the write to the GITS_CWRITER?

Regards,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC v2 09/15] Add a new per-vCPU tasklet to wakeup the blocked vCPU

2015-06-12 Thread Jan Beulich
 On 12.06.15 at 11:40, feng...@intel.com wrote:
 From: Jan Beulich [mailto:jbeul...@suse.com]
 Sent: Tuesday, June 09, 2015 11:09 PM
  On 08.05.15 at 11:07, feng...@intel.com wrote:
  This patch adds a new per-vCPU tasklet to wakeup the blocked
  vCPU. It can be used in the case vcpu_unblock cannot be called
  directly. This tasklet will be used in later patch in this
  series.
 
  Signed-off-by: Feng Wu feng...@intel.com
  ---
   xen/common/domain.c | 11 +++
   xen/include/xen/sched.h |  3 +++
   2 files changed, 14 insertions(+)
 
 What is the rationale for doing this in common code, modifying
 common structures?
 
 
 Do you mean moving this tasklet to the architecture specific structure, such 
 as, struct arch_vcpu ?

Exactly.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3 06/11] x86/intel_pstate: the main boby of the intel_pstate driver

2015-06-12 Thread Julien Grall



On 11/06/2015 21:41, Wang, Wei W wrote:

On 11/06/2015 22:02, Julien Grall wrote:

On 11/06/2015 04:27, Wei Wang wrote:

diff --git a/xen/include/acpi/cpufreq/cpufreq.h

b/xen/include/acpi/cpufreq/cpufreq.h

index d10e4c7..71bb45c 100644
--- a/xen/include/acpi/cpufreq/cpufreq.h
+++ b/xen/include/acpi/cpufreq/cpufreq.h
@@ -34,6 +34,12 @@ struct acpi_cpufreq_data {

   extern struct acpi_cpufreq_data *cpufreq_drv_data[NR_CPUS];

+/*
+ * Maximum transition latency is in nanoseconds - if it's unknown,
+ * CPUFREQ_ETERNAL shall be used.
+ */
+#define CPUFREQ_ETERNAL(-1)
+
   struct cpufreq_cpuinfo {
   unsigned intmax_freq;
   unsigned intsecond_max_freq;/* P1 if Turbo Mode is on */
@@ -77,6 +83,8 @@ struct cpufreq_policy {
   };
   DECLARE_PER_CPU(struct cpufreq_policy *, cpufreq_cpu_policy);

+extern int intel_pstate_init(void);
+


As said on a previous version [1], intel_pstate_init is x86 specific.
Although xen/include/acpi contains common headers.


Please see our latest discussion here (the bottom of the link):  
http://lists.xen.org/archives/html/xen-devel/2015-06/msg00047.html


Well we are planning to move cpufreq.h out of acpi in order to use for 
device tree based platform. Most of these declaration is common.


Although any x86 specific function would have to go out in a separate 
header.


Please avoid to add new one when it's possible. I don't see why a new 
asm-x86/cpufreq.h can't be added...



   extern int __cpufreq_set_policy(struct cpufreq_policy *data,
   struct cpufreq_policy *policy);

@@ -101,6 +109,12 @@ struct cpufreq_freqs {
*  CPUFREQ GOVERNORS*


**
***/


+/* The four internal governors used in intel_pstate */
+#define CPUFREQ_POLICY_POWERSAVE(1)
+#define CPUFREQ_POLICY_PERFORMANCE  (2)
+#define CPUFREQ_POLICY_USERSPACE(3)
+#define CPUFREQ_POLICY_ONDEMAND (4)
+


  From the comment, this looks like x86 specific. Maybe even intel_pstate?



Yes. It's currently only used by the intel_pstate driver.


After looking to this series, this statement looks wrong to me... You 
are using all these defines in the common cpufreq code (parameters, 
pmstat,...).


The cpufreq framework should be agnostic to any cpufreq driver 
implementation.


So it looks like to me that we want CPUFREQ_* to be exposed for anyone. 
And specifying the behavior when policy = 0 would be great too rather 
than relying on a future developer to not define 0.


Regards,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] libxl backports for 4.4 and 4.5

2015-06-12 Thread Ian Jackson
Jan Beulich writes (Re: [Xen-devel] libxl backports for 4.4 and 4.5):
 On 12.06.15 at 11:32, lars.kurth@gmail.com wrote:
  you seem to be disagreeing, or more likely Ian missed Jan's reply, so I 
  wanted to double check.

Thanks for pointing this out, Lars.  I had just spotted it myself.

 The release being stuck anyway, perhaps we can put them in with
 the understanding that this time round we'll need at least another
 RC. Ian - you know how risky the changes are, please use your
 judgment.

These changes are old, well-tested, and they are important bugfixes.
I think they should go in.

 (Btw., as to 4.4.3, I don't think I want to kick off that process until
 4.5.1 gets out of its stuck state, i.e. as all stable trees are affected
 by the underlying problem, I don't want both trees to be hold up half
 way through the stable release process.)

Right.  So I take you to mean that I should apply backports to 4.4
freely.


Jan, things that I have queued for 4.5 are therefore:

 d83bf9d2 libxl: In libxl_set_vcpuonline check for
 maximum number of VCPUs against the cpumap

 12e817e2 Fix build of 32bit toolstacks on CentOS 5.x following XSA-125

 9369988  libxl: event handling: Break out ao_work_outstanding

 f1335f0  libxl: event handling: ao_inprogress does waits while reports
 outstanding

Of these, all but 12e817e2 have been in master for a long time.
12e817e2 was pushed this morning.

I propose to push all of these today, unless you object.  I will
double-check with you on irc.

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 00/17] x86/hvm: I/O emulation cleanup and fix

2015-06-12 Thread Paul Durrant
 -Original Message-
 From: Fabio Fantoni [mailto:fabio.fant...@m2r.biz]
 Sent: 12 June 2015 11:44
 To: Paul Durrant; xen-de...@lists.xenproject.org
 Subject: Re: [Xen-devel] [PATCH v2 00/17] x86/hvm: I/O emulation cleanup
 and fix
 
 Il 11/06/2015 17:42, Paul Durrant ha scritto:
  This patch series re-works much of the code involved in emulation of port
  and memory mapped I/O for HVM guests.
 
  The code has become very convoluted and, at least by inspection, certain
  emulations will apparently malfunction.
 
  The series is broken down into 17 patches (which are also available in
  my xenbits repo:
 http://xenbits.xen.org/gitweb/?p=people/pauldu/xen.git
  on the emulation22 branch) as follows:
 
  0001-x86-hvm-simplify-hvmemul_do_io.patch
  0002-x86-hvm-re-name-struct-hvm_mmio_handler-to-hvm_mmio_.patch
  0003-x86-hvm-unify-internal-portio-and-mmio-intercepts.patch
  0004-x86-hvm-unify-dpci-portio-intercept-with-standard-po.patch
  0005-x86-hvm-unify-stdvga-mmio-intercept-with-standard-mm.patch
  0006-x86-hvm-revert-82ed8716b-fix-direct-PCI-port-I-O-emu.patch
  0007-x86-hvm-only-call-hvm_io_assist-from-hvm_wait_for_io.patch
  0008-x86-hvm-split-I-O-completion-handling-from-state-mod.patch
  0009-x86-hvm-remove-hvm_io_pending-check-in-hvmemul_do_io.patch
  0010-x86-hvm-remove-HVMIO_dispatched-I-O-state.patch
  0011-x86-hvm-remove-hvm_io_state-enumeration.patch
  0012-x86-hvm-use-ioreq_t-to-track-in-flight-state.patch
  0013-x86-hvm-only-acquire-RAM-pages-for-emulation-when-we.patch
  0014-x86-hvm-remove-extraneous-parameter-from-hvmtrace_io.patch
  0015-x86-hvm-make-sure-translated-MMIO-reads-or-writes-fa.patch
  0016-x86-hvm-remove-multiple-open-coded-chunking-loops.patch
  0017-x86-hvm-track-large-memory-mapped-accesses-by-buffer.patch
 
  v2:
- Removed bogus assertion from patch 16
- Re-worked patch 17 after basic testing of back-port onto XenServer
 
 Thanks, I confirm that this new version solves the critical regression
 that causes dom0 insta-reboot.
 
 I tested them on windows and linux hvm domUs, I had strange results...
 On windows 7 64 bit sp1 domU (with new winpv drivers) seems there are a
 small performance regression (or probably only latency increased) but
 I not found warning/error or something userful in logs, I'm not sure is
 real regression of your patches, if needed I'll retry without.
 On linux domU (fedora 21) qxl that had sse2 instructions problem still
 not working, same of latest 2 years not always qemu crash with a gdb
 showing sse2 instruction problem but xorg crash with EQ overflowing
 and/or domU remain with 100% cpu used by xorg qemu process at 100% cpu.
 I posted the backtrace similar time ago and a qemu/spice developer told
 that was a conseguence of other problem if I remember good.
 How can I be sure that sse2 or similar istructions (with videoram) are
 now working correctly?

Once I have debugged through XenServer automated testing I plan to add a 
targeted SSE test somewhere, possibly as a smoke test in hvmloader.

  Paul

 
 If you need more informations/tests tell me and I'll post them.
 
 Thanks for any reply and sorry for my bad english.
 
 
  ___
  Xen-devel mailing list
  Xen-devel@lists.xen.org
  http://lists.xen.org/xen-devel


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [Draft F] Xen on ARM vITS Handling

2015-06-12 Thread Ian Campbell
On Fri, 2015-06-12 at 09:09 -0400, Julien Grall wrote:
 Hi Ian,
 
 On 12/06/2015 04:52, Ian Campbell wrote:
  On Fri, 2015-06-12 at 14:07 +0530, Vijay Kilari wrote:
  So pLPIs must be routed at device assignment time because in the vLPI
  configuration table trap there is no mapping back to a single pLPI.
 
 I just remembered the exact reason that made use to differ SPI enabling.

I can't parse this sentence, differ how?

 When the device is assigned, the domain VCPUs are still down (even VCPU0).
 
 If we receive an interrupt before the VCPU0 is unpaused, the interrupt 
 will be lost. Same if the interrupt is not yet configured (i.e before 
 the vITS setup correctly the table) with your proposal.

Is this any different to booting with the ITT not setup?

(SPIs are a slightly different case because they don't need h/w routing)

 This could happen when the device is not quiescent. We had this issue on 
 the vexpress at boot time when the network card was trying to send an 
 interrupt before DOM0 is setup.

I don't fully understand the issue you are trying to describe, but do
you want to propose a change to the spec?

Ian.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC v2 07/15] vt-d: Add API to update IRTE when VT-d PI is used

2015-06-12 Thread Jan Beulich
 On 12.06.15 at 11:40, feng...@intel.com wrote:
 -Original Message-
 From: Jan Beulich [mailto:jbeul...@suse.com]
  On 08.05.15 at 11:07, feng...@intel.com wrote:
  +static inline void setup_posted_irte(
  +struct iremap_entry *new_ire, struct pi_desc *pi_desc, uint8_t gvec)
  +{
  +new_ire-post.urg = 0;
  +new_ire-post.vector = gvec;
  +new_ire-post.pda_l = (((u64)virt_to_maddr(pi_desc)) 
  +   (32 - PDA_LOW_BIT))  PDA_MASK(LOW);
  +new_ire-post.pda_h = (((u64)virt_to_maddr(pi_desc))  32) 
  +   PDA_MASK(HIGH);
 
 Looking at this another time I can't see what the and-ing with
 PAD_MASK() is supposed to be good for. 
 
 I cannot understand this well. Do you mean we don't need and PDA_MASK() 
 here?

Correct - the bitfield width (where the data gets stored into) already
enforces the intended truncation afaict.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Relocatable Xen early boot code

2015-06-12 Thread Jan Beulich
 On 12.06.15 at 13:14, daniel.ki...@oracle.com wrote:
 Here I do not want to discuss GRUB2 and multiboot2 protocol support details
 for relocatable images. It is not needed. It is sufficient to know that it
 is able to put loaded image anywhere in available memory below 4 GiB. Loaded
 image is informed about its base address according to multiboot2 protocol 
 via
 special tag. This is new feature not available in upstream GRUB2. I work on
 upstreaming it in parallel. Relevant patches will be posted together with
 Xen patches.

Before going into any detail on what you write later on - if this isn't in
upstream grub2, why can't you do what you want to do first without
another change needed in the boot loader? In which case, if I'm reading
this correctly, you wouldn't need our boot code to become relocatable
either.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] PCI Passthrough ARM Design : Draft1

2015-06-12 Thread Ian Campbell
On Fri, 2015-06-12 at 07:41 -0400, Julien Grall wrote:

 I was suggesting to expose the host layout to the guest layout (similar 
 to e820).

We do this on x86 only as a workaround for broken hardware (essentially
magic system devices which bypass the IOMMU).

We shouldn't do this on ARM by default, but only as and when a similar
requirement arises.

 Although, this is not my preferred way and a non 1:1 mapping would be 
 the best.

Yes, a non-1:1 mapping is the right answer.

Ian.



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH 5/6] gnttab: fix/adjust gnttab_transfer()

2015-06-12 Thread Jan Beulich
- don't update shared entry's frame number for translated domains (as
  MFNs shouldn't be exposed to such guests)
- for v1 grant table format, force copying of the page also when the
  intended MFN doesn't fit in 32 bits (and the domain isn't translated)
- fix an apparent off-by-one error (it's unclear to me why commit
  5cc77f9098 (32-on-64: Fix domain address-size clamping, implement)
  uses BITS_PER_LONG-1 here, while using BITS_PER_LONG in the two other
  invocations of domain_clamp_alloc_bitsize())
- adjust comments accompanying the shared entry's frame field

Signed-off-by: Jan Beulich jbeul...@suse.com

--- a/xen/common/grant_table.c
+++ b/xen/common/grant_table.c
@@ -1639,7 +1639,8 @@ gnttab_transfer(
 }
 
 max_bitsize = domain_clamp_alloc_bitsize(
-e, BITS_PER_LONG+PAGE_SHIFT-1);
+e, e-grant_table-gt_version  1 || paging_mode_translate(e)
+   ? BITS_PER_LONG + PAGE_SHIFT : 32 + PAGE_SHIFT);
 if ( (1UL  (max_bitsize - PAGE_SHIFT)) = mfn )
 {
 struct page_info *new_page;
@@ -1736,14 +1737,18 @@ gnttab_transfer(
 if ( e-grant_table-gt_version == 1 )
 {
 grant_entry_v1_t *sha = shared_entry_v1(e-grant_table, gop.ref);
+
 guest_physmap_add_page(e, sha-frame, mfn, 0);
-sha-frame = mfn;
+if ( !paging_mode_translate(e) )
+sha-frame = mfn;
 }
 else
 {
 grant_entry_v2_t *sha = shared_entry_v2(e-grant_table, gop.ref);
+
 guest_physmap_add_page(e, sha-full_page.frame, mfn, 0);
-sha-full_page.frame = mfn;
+if ( !paging_mode_translate(e) )
+sha-full_page.frame = mfn;
 }
 smp_wmb();
 shared_entry_header(e-grant_table, gop.ref)-flags |=
--- a/xen/include/public/grant_table.h
+++ b/xen/include/public/grant_table.h
@@ -134,8 +134,10 @@ struct grant_entry_v1 {
 /* The domain being granted foreign privileges. [GST] */
 domid_t  domid;
 /*
- * GTF_permit_access: Frame that @domid is allowed to map and access. [GST]
- * GTF_accept_transfer: Frame whose ownership transferred by @domid. [XEN]
+ * GTF_permit_access: GFN that @domid is allowed to map and access. [GST]
+ * GTF_accept_transfer: GFN that @domid is allowed to transfer into. [GST]
+ * GTF_transfer_completed: MFN whose ownership transferred by @domid
+ * (non-translated guests only). [XEN]
  */
 uint32_t frame;
 };



gnttab: fix/adjust gnttab_transfer()

- don't update shared entry's frame number for translated domains (as
  MFNs shouldn't be exposed to such guests)
- for v1 grant table format, force copying of the page also when the
  intended MFN doesn't fit in 32 bits (and the domain isn't translated)
- fix an apparent off-by-one error (it's unclear to me why commit
  5cc77f9098 (32-on-64: Fix domain address-size clamping, implement)
  uses BITS_PER_LONG-1 here, while using BITS_PER_LONG in the two other
  invocations of domain_clamp_alloc_bitsize())
- adjust comments accompanying the shared entry's frame field

Signed-off-by: Jan Beulich jbeul...@suse.com

--- a/xen/common/grant_table.c
+++ b/xen/common/grant_table.c
@@ -1639,7 +1639,8 @@ gnttab_transfer(
 }
 
 max_bitsize = domain_clamp_alloc_bitsize(
-e, BITS_PER_LONG+PAGE_SHIFT-1);
+e, e-grant_table-gt_version  1 || paging_mode_translate(e)
+   ? BITS_PER_LONG + PAGE_SHIFT : 32 + PAGE_SHIFT);
 if ( (1UL  (max_bitsize - PAGE_SHIFT)) = mfn )
 {
 struct page_info *new_page;
@@ -1736,14 +1737,18 @@ gnttab_transfer(
 if ( e-grant_table-gt_version == 1 )
 {
 grant_entry_v1_t *sha = shared_entry_v1(e-grant_table, gop.ref);
+
 guest_physmap_add_page(e, sha-frame, mfn, 0);
-sha-frame = mfn;
+if ( !paging_mode_translate(e) )
+sha-frame = mfn;
 }
 else
 {
 grant_entry_v2_t *sha = shared_entry_v2(e-grant_table, gop.ref);
+
 guest_physmap_add_page(e, sha-full_page.frame, mfn, 0);
-sha-full_page.frame = mfn;
+if ( !paging_mode_translate(e) )
+sha-full_page.frame = mfn;
 }
 smp_wmb();
 shared_entry_header(e-grant_table, gop.ref)-flags |=
--- a/xen/include/public/grant_table.h
+++ b/xen/include/public/grant_table.h
@@ -134,8 +134,10 @@ struct grant_entry_v1 {
 /* The domain being granted foreign privileges. [GST] */
 domid_t  domid;
 /*
- * GTF_permit_access: Frame that @domid is allowed to map and access. [GST]
- * GTF_accept_transfer: Frame whose ownership transferred by @domid. [XEN]
+ * GTF_permit_access: GFN that @domid is allowed to map and access. [GST]
+ * GTF_accept_transfer: GFN that @domid is allowed to transfer into. [GST]
+ * GTF_transfer_completed: MFN whose ownership 

[Xen-devel] [PATCH 6/6] gnttab: make struct grant_mapping private

2015-06-12 Thread Jan Beulich
This documents that no entity outside of gnttab.c actually accesses
objects of that type.

Signed-off-by: Jan Beulich jbeul...@suse.com

--- a/xen/common/grant_table.c
+++ b/xen/common/grant_table.c
@@ -113,6 +113,16 @@ struct gnttab_unmap_common {
 goto _lbl;  \
 } while ( 0 )
 
+/*
+ * Tracks a mapping of another domain's grant reference. Each domain has a
+ * table of these, indexes into which are returned as a 'mapping handle'.
+ */
+struct grant_mapping {
+u32  ref;   /* grant ref */
+u16  flags; /* 0-4: GNTMAP_* ; 5-15: unused */
+domid_t  domid; /* granting domain */
+};
+
 #define MAPTRACK_PER_PAGE (PAGE_SIZE / sizeof(struct grant_mapping))
 #define maptrack_entry(t, e) \
 ((t)-maptrack[(e)/MAPTRACK_PER_PAGE][(e)%MAPTRACK_PER_PAGE])
--- a/xen/include/xen/grant_table.h
+++ b/xen/include/xen/grant_table.h
@@ -52,16 +52,6 @@
 /* The maximum size of a grant table. */
 extern unsigned int max_grant_frames;
 
-/*
- * Tracks a mapping of another domain's grant reference. Each domain has a
- * table of these, indexes into which are returned as a 'mapping handle'.
- */
-struct grant_mapping {
-u32  ref;   /* grant ref */
-u16  flags; /* 0-4: GNTMAP_* ; 5-15: unused */
-domid_t  domid; /* granting domain */
-};
-
 /* Per-domain grant information. */
 struct grant_table {
 /* Table size. Number of frames shared with guest */



gnttab: make struct grant_mapping private

This documents that no entity outside of gnttab.c actually accesses
objects of that type.

Signed-off-by: Jan Beulich jbeul...@suse.com

--- a/xen/common/grant_table.c
+++ b/xen/common/grant_table.c
@@ -113,6 +113,16 @@ struct gnttab_unmap_common {
 goto _lbl;  \
 } while ( 0 )
 
+/*
+ * Tracks a mapping of another domain's grant reference. Each domain has a
+ * table of these, indexes into which are returned as a 'mapping handle'.
+ */
+struct grant_mapping {
+u32  ref;   /* grant ref */
+u16  flags; /* 0-4: GNTMAP_* ; 5-15: unused */
+domid_t  domid; /* granting domain */
+};
+
 #define MAPTRACK_PER_PAGE (PAGE_SIZE / sizeof(struct grant_mapping))
 #define maptrack_entry(t, e) \
 ((t)-maptrack[(e)/MAPTRACK_PER_PAGE][(e)%MAPTRACK_PER_PAGE])
--- a/xen/include/xen/grant_table.h
+++ b/xen/include/xen/grant_table.h
@@ -52,16 +52,6 @@
 /* The maximum size of a grant table. */
 extern unsigned int max_grant_frames;
 
-/*
- * Tracks a mapping of another domain's grant reference. Each domain has a
- * table of these, indexes into which are returned as a 'mapping handle'.
- */
-struct grant_mapping {
-u32  ref;   /* grant ref */
-u16  flags; /* 0-4: GNTMAP_* ; 5-15: unused */
-domid_t  domid; /* granting domain */
-};
-
 /* Per-domain grant information. */
 struct grant_table {
 /* Table size. Number of frames shared with guest */
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 00/17] x86/hvm: I/O emulation cleanup and fix

2015-06-12 Thread Fabio Fantoni

Il 11/06/2015 17:42, Paul Durrant ha scritto:

This patch series re-works much of the code involved in emulation of port
and memory mapped I/O for HVM guests.

The code has become very convoluted and, at least by inspection, certain
emulations will apparently malfunction.

The series is broken down into 17 patches (which are also available in
my xenbits repo: http://xenbits.xen.org/gitweb/?p=people/pauldu/xen.git
on the emulation22 branch) as follows:

0001-x86-hvm-simplify-hvmemul_do_io.patch
0002-x86-hvm-re-name-struct-hvm_mmio_handler-to-hvm_mmio_.patch
0003-x86-hvm-unify-internal-portio-and-mmio-intercepts.patch
0004-x86-hvm-unify-dpci-portio-intercept-with-standard-po.patch
0005-x86-hvm-unify-stdvga-mmio-intercept-with-standard-mm.patch
0006-x86-hvm-revert-82ed8716b-fix-direct-PCI-port-I-O-emu.patch
0007-x86-hvm-only-call-hvm_io_assist-from-hvm_wait_for_io.patch
0008-x86-hvm-split-I-O-completion-handling-from-state-mod.patch
0009-x86-hvm-remove-hvm_io_pending-check-in-hvmemul_do_io.patch
0010-x86-hvm-remove-HVMIO_dispatched-I-O-state.patch
0011-x86-hvm-remove-hvm_io_state-enumeration.patch
0012-x86-hvm-use-ioreq_t-to-track-in-flight-state.patch
0013-x86-hvm-only-acquire-RAM-pages-for-emulation-when-we.patch
0014-x86-hvm-remove-extraneous-parameter-from-hvmtrace_io.patch
0015-x86-hvm-make-sure-translated-MMIO-reads-or-writes-fa.patch
0016-x86-hvm-remove-multiple-open-coded-chunking-loops.patch
0017-x86-hvm-track-large-memory-mapped-accesses-by-buffer.patch

v2:
  - Removed bogus assertion from patch 16
  - Re-worked patch 17 after basic testing of back-port onto XenServer


Thanks, I confirm that this new version solves the critical regression 
that causes dom0 insta-reboot.


I tested them on windows and linux hvm domUs, I had strange results...
On windows 7 64 bit sp1 domU (with new winpv drivers) seems there are a 
small performance regression (or probably only latency increased) but 
I not found warning/error or something userful in logs, I'm not sure is 
real regression of your patches, if needed I'll retry without.
On linux domU (fedora 21) qxl that had sse2 instructions problem still 
not working, same of latest 2 years not always qemu crash with a gdb 
showing sse2 instruction problem but xorg crash with EQ overflowing 
and/or domU remain with 100% cpu used by xorg qemu process at 100% cpu.
I posted the backtrace similar time ago and a qemu/spice developer told 
that was a conseguence of other problem if I remember good.
How can I be sure that sse2 or similar istructions (with videoram) are 
now working correctly?


If you need more informations/tests tell me and I'll post them.

Thanks for any reply and sorry for my bad english.



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC v2 11/15] vmx: Add a global wake-up vector for VT-d Posted-Interrupts

2015-06-12 Thread Jan Beulich
 On 12.06.15 at 11:40, feng...@intel.com wrote:
 From: xen-devel-boun...@lists.xen.org 
 [mailto:xen-devel-boun...@lists.xen.org] On Behalf Of Jan Beulich
 Sent: Tuesday, June 09, 2015 11:20 PM
  On 08.05.15 at 11:07, feng...@intel.com wrote:
  +{
  +struct vcpu *v;
  +unsigned int cpu = smp_processor_id();
  +
  +spin_lock(per_cpu(blocked_vcpu_lock, cpu));
  +list_for_each_entry(v, per_cpu(blocked_vcpu, cpu),
  +blocked_vcpu_list) {
 
 How long do you think such a list can grow? 
 
 It depends on how many vCPU is current blocked on this specific pCPU.
 I am not sure how long it can grow. 
 
 The basic idea here is that, when notification wakeup event happens,
 we need to find a way to find the right vCPU, and then wakeup it, do
 you have any better idea for this?

Not immediately, or else I would have spelled it out. But the list
perhaps growing to hundreds entries is something that needs to
be taken into consideration.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: zero ioreq page only one time

2015-06-12 Thread Paul Durrant
 -Original Message-
 From: Wen Congyang [mailto:we...@cn.fujitsu.com]
 Sent: 12 June 2015 11:26
 To: Paul Durrant; Yang Hongyang; Andrew Cooper; xen-devel@lists.xen.org
 Cc: Wei Liu; Ian Campbell; guijianf...@cn.fujitsu.com;
 yunhong.ji...@intel.com; Eddie Dong; rshri...@cs.ubc.ca; Ian Jackson
 Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: zero ioreq
 page only one time
 
 On 06/12/2015 03:41 PM, Paul Durrant wrote:
  -Original Message-
  From: Wen Congyang [mailto:we...@cn.fujitsu.com]
  Sent: 12 June 2015 04:22
  To: Paul Durrant; Yang Hongyang; Andrew Cooper; xen-
 de...@lists.xen.org
  Cc: Wei Liu; Ian Campbell; guijianf...@cn.fujitsu.com;
  yunhong.ji...@intel.com; Eddie Dong; rshri...@cs.ubc.ca; Ian Jackson
  Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: zero
 ioreq
  page only one time
 
  On 06/11/2015 09:25 PM, Paul Durrant wrote:
  -Original Message-
  From: Yang Hongyang [mailto:yan...@cn.fujitsu.com]
  Sent: 11 June 2015 13:59
  To: Paul Durrant; Wen Congyang; Andrew Cooper; xen-
  de...@lists.xen.org
  Cc: Wei Liu; Ian Campbell; guijianf...@cn.fujitsu.com;
  yunhong.ji...@intel.com; Eddie Dong; rshri...@cs.ubc.ca; Ian Jackson
  Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: zero
  ioreq
  page only one time
 
 
 
  On 06/11/2015 06:20 PM, Paul Durrant wrote:
  -Original Message-
  From: Wen Congyang [mailto:we...@cn.fujitsu.com]
  Sent: 11 June 2015 09:48
  To: Paul Durrant; Andrew Cooper; Yang Hongyang; xen-
  de...@lists.xen.org
  Cc: Wei Liu; Ian Campbell; guijianf...@cn.fujitsu.com;
  [...]
 
 
  In our implementation, we don't start a new emulator. The codes
 can
  work,
  but some bugs may be not triggered.
 
 
  How do you reconcile the incoming QEMU save record with the
 running
  emulator state?
 
  We introduce a qmp command xen-load-devices-
  state(libxl__qmp_restore) which
  can restore the emulator state. The step of resotre emulator state at a
  checkpoint is:
 
  1. libxl__qmp_stop- vm_stop() in qemu
  2. libxl__qmp_restore - load_vmstate() in qemu
  3. libxl__qmp_resume  - vm_start() in qemu
 
 
  Ok, that sounds like the ideal time to hook back into Xen by creating a
 new
  ioreq server.
 
  I have some questions about ioreq server:
  1. If we use old version xen and newest version qemu, is it OK? Is default
 ioreq server created when the guest is created.
 xen_create_ioreq_server()
  does
 nothing, and xen_get_ioreq_server_info() will get the default ioreq
 server
  information.
 Is it right?
 
  No. It's not compatible in that direction. A new Xen will work with an old
 QEMU but not the other way round.
 
  2. Why we create a default ioreq server when getting the hvm param if
 there
  is already a
 not default ioreq server?
 
  If something reads the 'legacy' HVM params then that is Xen's trigger to
 create the default server. Any 'new' emulator should be using the ioreq
 server hypercalls so the default server will not be needed.
 
 If there are two ioreq servers: default ioreq server, and a ioreq server
 created by emulator. The guest can work it correctly in
 this case?

You mean a secondary emulator? Yes, that's why there is the notion of default 
ioreq server... to allow a secondary emulator to be used even when an old QEMU 
is in use.

 Is there any application(not emulator) that uses the libxenctrl
 directly?
 

What do you mean by application? Toolstacks may use libxenctrl.

  Paul

 Thanks
 Wen Congyang
 
 
  3. In the far end, we will clear the ioreq page, and this ioreq page is 
  used
 for
  default
 ioreq server, is it right?
 
  Yes, AFAIK it's only the 'magic' pages that get cleared at the far end - and
 that includes the default server pages. Other ioreq servers will have their
 pages cleared on re-insertion to the P2M at the source end when the server
 is disabled.
 
Paul
 
 
  Thanks
  Wen Congyang
 
 
Paul
 
 
 Paul
 
  Thanks
  Wen Congyang
 
 
 Paul
 
 
  Thanks
  Wen Congyang
 
 
 Paul
 
  We will set to the guest to a new state, the old state should be
  dropped.
 
  Thanks
  Wen Congyang
 
 
 Paul
 
 
  Thanks
  Wen Congyang
 
 
 Paul
 
 
  Thanks
  Wen Congyang
 
 
  ~Andrew
  .
 
 
  .
 
 
  .
 
 
 
  ___
  Xen-devel mailing list
  Xen-devel@lists.xen.org
  http://lists.xen.org/xen-devel
 
  ___
  Xen-devel mailing list
  Xen-devel@lists.xen.org
  http://lists.xen.org/xen-devel
  .
 
 
  .
 
 
 
  ___
  Xen-devel mailing list
  Xen-devel@lists.xen.org
  http://lists.xen.org/xen-devel
  .
 
 
  --
  Thanks,
  Yang.
  .
 
 
  .
 


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [rumpuserxen test] 58422: regressions - FAIL

2015-06-12 Thread osstest service user
flight 58422 rumpuserxen real [real]
http://logs.test-lab.xenproject.org/osstest/logs/58422/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-amd64-rumpuserxen   5 rumpuserxen-build fail REGR. vs. 33866
 build-i386-rumpuserxen5 rumpuserxen-build fail REGR. vs. 33866

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-rumpuserxen-amd64  1 build-check(1)   blocked n/a
 test-amd64-i386-rumpuserxen-i386  1 build-check(1)   blocked  n/a

version targeted for testing:
 rumpuserxen  3b91e44996ea6ae1276bce1cc44f38701c53ee6f
baseline version:
 rumpuserxen  30d72f3fc5e35cd53afd82c8179cc0e0b11146ad


People who touched revisions under test:
  Antti Kantee po...@iki.fi
  Ian Jackson ian.jack...@eu.citrix.com
  Martin Lucina mar...@lucina.net
  Wei Liu l...@liuw.name


jobs:
 build-amd64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-i386   pass
 build-amd64-pvopspass
 build-i386-pvops pass
 build-amd64-rumpuserxen  fail
 build-i386-rumpuserxen   fail
 test-amd64-amd64-rumpuserxen-amd64   blocked 
 test-amd64-i386-rumpuserxen-i386 blocked 



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Not pushing.

(No revision log; it would be 535 lines long.)

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3 10/11] x86/intel_pstate: support the use of intel_pstate in pmstat.c

2015-06-12 Thread Julien Grall

Hi,

On 11/06/2015 23:03, Wang, Wei W wrote:

On 11/06/2015 22:02, Julien Grall wrote:

On 11/06/2015 04:31, Wei Wang wrote:

-list_for_each(pos, cpufreq_governor_list)
+if (policy-policy)


What if another cpufreq decides to use policy-policy?


What is another cpufreq? The policy is per-CPU struct.


I mean another cpufreq driver. Correct me if I'm wrong but from the name 
policy is not intel pstate specific. That means that a new cpufreq 
driver can decide to use the field his own purpose..



+gov_num = INTEL_PSTATE_INTERNAL_GOV_NUM;


Why not using cpufreq_governor_list?


That's used by the old driver. We are not going through that old governor layer.


This is common code, it's used by different cpufreq driver for both x86 
and ARM (not yet supported). We should not relying on any other cpufreq 
driver won't use the field policy.


Regards,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] Relocatable Xen early boot code

2015-06-12 Thread Daniel Kiper
Hey,

During work on multiboot2 protocol support for Xen on EFI platform
I discovered that we need relocatable Xen early boot code (which is
mostly 32-bit code). More you can find here:

http://lists.xen.org/archives/html/xen-devel/2015-02/msg01257.html

I would like to focus on solution #1 described above. I have working PoC
which implements it. We also discussed various solutions for this issue
during Xen Hackhaton in Shanghai. As Andrew and Jan asked I tried to
implement solution based on segment registers. However, after revisiting
this issue and further investigation I still have some doubts. You can
read about my findings below.

Here I do not want to discuss GRUB2 and multiboot2 protocol support details
for relocatable images. It is not needed. It is sufficient to know that it
is able to put loaded image anywhere in available memory below 4 GiB. Loaded
image is informed about its base address according to multiboot2 protocol via
special tag. This is new feature not available in upstream GRUB2. I work on
upstreaming it in parallel. Relevant patches will be posted together with
Xen patches.

1) My PoC uses %ebp (last unused register available globally in Xen early
   boot code; there is assumption that %ds == %es == %ss) as a storage for
   Xen image base address. If boot protocol do not support relocatable images
   it is filled with static value calculated during build. Otherwise it is
   taken from special multiboot2 tag.

   We need Xen image base address to access memory in Xen image area in at
   least three different cases:
 a) direct memory access, e.g.: mov %eax,sym_offset(boot_tsc_stamp)(%ebp)
 b) memory address calculation, e.g.: lea 
sym_offset(__page_tables_start)(%ebp),%edx,
 c) update static addresses calculated during build, e.g. prebuild page 
tables.

   If we have Xen image base address in a register then all mentioned operations
   are quite simple.

   This idea is based on x86_64 addressing mode which uses %rip as a reference.

2) Andrew and Jan suggested that we can use segment descriptor to store Xen
   image base address. This way all references to variables in Xen image can
   be easily calculated during build and they will be static. However, relevant
   segment descriptor must be updated during Xen start.

   At first sight, it looks that %cs, %ds, %es, %ss should be initialized as is
   (start: 0, size: 4 GiB, ...). This way we will not break references to 
trampoline
   and all other variables living outside of Xen image. Additionally, we are 
sure
   that all references to variables from C code (xen/arch/x86/boot/reloc.c) are
   pointing always to the same place regardless of x86 instruction generated by
   compiler and segment register used by this instruction for addressing. So, it
   looks that potentially we can use %fs or %gs as segment register to access
   variables living in Xen image. Looks good... Case a and b from solution #1 
seams
   easy to resolve by prefixing by chosen segment register. Case c requires an 
extra
   register to store temporarily Xen image base address. Looks quite easy... 
However,
   looking at xen/arch/x86/boot/cmdline.S (at first sight this is the worst 
thing)
   I am not so happy with segment register solution. Unfortunately all functions
   get as argument just displacement in segment. This means that we must rework
   all stuff there and pass segment(s) as additional argument(s) because most of
   code checks memory in and out of Xen image in parallel. Of course we can use
   solution similar to case c described above but I think that then whole stuff
   move closer to idea #1. This way we will have something which uses things
   from #1 and #2. I do not think that this is solution which we want.

3) There is a third solution which is a mixture of #1 and #2. We can use e.g.
   %fs:0 (e.g. located somewhere in Xen image) to store Xen image base address.
   If this value is needed then we can access it directly, e.g. add %fs:0,%esi,
   or copy to temporary register and use as required.

So, I am still in favor of #1. It is clean and easy. It does not require a lot 
of work.
#2 has a potential but requires a lot of changes in fragile cmdline.S (maybe 
others
difficult places). Is it worth? I think that #3 is a backup solution in case we 
choose
#1 and later it will appear that we need a globally unused register.

Daniel

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC v2] xSplice design

2015-06-12 Thread Martin Pohlack
On 15.05.2015 21:44, Konrad Rzeszutek Wilk wrote:
[...]
 ## Hypercalls
 
 We will employ the sub operations of the system management hypercall (sysctl).
 There are to be four sub-operations:
 
  * upload the payloads.
  * listing of payloads summary uploaded and their state.
  * getting an particular payload summary and its state.
  * command to apply, delete, or revert the payload.
 
 The patching is asynchronous therefore the caller is responsible
 to verify that it has been applied properly by retrieving the summary of it
 and verifying that there are no error codes associated with the payload.
 
 We **MUST** make it asynchronous due to the nature of patching: it requires
 every physical CPU to be lock-step with each other. The patching mechanism
 while an implementation detail, is not an short operation and as such
 the design **MUST** assume it will be an long-running operation.

I am not convinced yet, that you need an asynchronous approach here.

The experience from our prototype suggests that hotpatching itself is
not an expensive operation.  It can usually be completed well below 1ms
with the most expensive part being getting the hypervisor to a quiet state.

If we go for a barrier at hypervisor exit, combined with forcing all
other CPUs through the hypervisor with IPIs, the typical case is very quick.

The only reason why that would take some time is, if another CPU is
executing a lengthy operation in the hypervisor already.  In that case,
you probably don't want to block the whole machine waiting for the
joining of that single CPU anyway and instead re-try later, for example,
using a timeout on the barrier.  That could be signaled to the user-land
process (EAGAIN) so that he could re-attempt hotpatching after some seconds.

Martin

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3 07/11] x86/intel_pstate: changes in cpufreq_del_cpu for CPU offline

2015-06-12 Thread Julien Grall



On 11/06/2015 22:01, Wang, Wei W wrote:

On 11/06/2015 22:06, Julien Grall wrote:

On 11/06/2015 04:28, Wei Wang wrote:

cpufreq_cpu_policy is used in intel_pstate_set_pstate(), so we change
to NULL it after the call of cpufreq_driver-exit. Otherwise, a
calltrace will show up on your screen due to the reference of a NULL
pointer when you power down the system.

Signed-off-by: Wei Wang wei.w.w...@intel.com
---
   xen/drivers/cpufreq/cpufreq.c | 6 +++---
   1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/xen/drivers/cpufreq/cpufreq.c
b/xen/drivers/cpufreq/cpufreq.c index 6003a8c..a8772e8 100644
--- a/xen/drivers/cpufreq/cpufreq.c
+++ b/xen/drivers/cpufreq/cpufreq.c
@@ -335,12 +335,11 @@ int cpufreq_del_cpu(unsigned int cpu)

   /* for HW_ALL, stop gov for each core of the _PSD domain */
   /* for SW_ALL  SW_ANY, stop gov for the 1st core of the _PSD domain

*/

-if (hw_all || (cpumask_weight(cpufreq_dom-map) ==
-   perf-domain_info.num_processors))
+if (!policy-policy  (hw_all || (cpumask_weight(cpufreq_dom-map)

==

+   perf-domain_info.num_processors)))


Based on your patch #6, the field policy contains value which is defined per-
cpufreq driver (because you defined internal value). How can you be sure
that a driver will never use 0 as a valid value?


Hi Julien, what do you mean by per-cpufreq driver?

We currently have two P-state drivers. This filed is currently only used by the 
intel_pstate driver, and the four usable values are:
#define CPUFREQ_POLICY_POWERSAVE   (1)
#define CPUFREQ_POLICY_PERFORMANCE  (2)
#define CPUFREQ_POLICY_USERSPACE  (3)
#define CPUFREQ_POLICY_ONDEMAND(4)

The intel_pstate won't use 0 as a valid value, and the default value is 
CPUFREQ_POLICY_ONDEMAND.  If it's 0, it basically means the old acpi-cpufreq 
driver is being used.


You seem to rely on nobody else with use the cpufreq framework... which 
is wrong. intel_pstate won't be the only possible cpufreq driver. Some 
ARM developper are working on adding a cpufreq for ARM power management.


You said that CPUFREQ_POLICY_* is specific to the intel driver. But use 
them in the common code. If any cpufreq driver can use the value, then 
make it common. Otherwise please move this code outside of the framework.


Regards,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: zero ioreq page only one time

2015-06-12 Thread Paul Durrant
 -Original Message-
 From: Wen Congyang [mailto:we...@cn.fujitsu.com]
 Sent: 12 June 2015 12:10
 To: Paul Durrant; Yang Hongyang; Andrew Cooper; xen-devel@lists.xen.org
 Cc: Wei Liu; Ian Campbell; guijianf...@cn.fujitsu.com;
 yunhong.ji...@intel.com; Eddie Dong; rshri...@cs.ubc.ca; Ian Jackson
 Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: zero ioreq
 page only one time
 
 On 06/12/2015 06:54 PM, Paul Durrant wrote:
  -Original Message-
  From: Wen Congyang [mailto:we...@cn.fujitsu.com]
  Sent: 12 June 2015 11:26
  To: Paul Durrant; Yang Hongyang; Andrew Cooper; xen-
 de...@lists.xen.org
  Cc: Wei Liu; Ian Campbell; guijianf...@cn.fujitsu.com;
  yunhong.ji...@intel.com; Eddie Dong; rshri...@cs.ubc.ca; Ian Jackson
  Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: zero
 ioreq
  page only one time
 
  On 06/12/2015 03:41 PM, Paul Durrant wrote:
  -Original Message-
  From: Wen Congyang [mailto:we...@cn.fujitsu.com]
  Sent: 12 June 2015 04:22
  To: Paul Durrant; Yang Hongyang; Andrew Cooper; xen-
  de...@lists.xen.org
  Cc: Wei Liu; Ian Campbell; guijianf...@cn.fujitsu.com;
  yunhong.ji...@intel.com; Eddie Dong; rshri...@cs.ubc.ca; Ian Jackson
  Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: zero
  ioreq
  page only one time
 
  On 06/11/2015 09:25 PM, Paul Durrant wrote:
  -Original Message-
  From: Yang Hongyang [mailto:yan...@cn.fujitsu.com]
  Sent: 11 June 2015 13:59
  To: Paul Durrant; Wen Congyang; Andrew Cooper; xen-
  de...@lists.xen.org
  Cc: Wei Liu; Ian Campbell; guijianf...@cn.fujitsu.com;
  yunhong.ji...@intel.com; Eddie Dong; rshri...@cs.ubc.ca; Ian
 Jackson
  Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore:
 zero
  ioreq
  page only one time
 
 
 
  On 06/11/2015 06:20 PM, Paul Durrant wrote:
  -Original Message-
  From: Wen Congyang [mailto:we...@cn.fujitsu.com]
  Sent: 11 June 2015 09:48
  To: Paul Durrant; Andrew Cooper; Yang Hongyang; xen-
  de...@lists.xen.org
  Cc: Wei Liu; Ian Campbell; guijianf...@cn.fujitsu.com;
  [...]
 
 
  In our implementation, we don't start a new emulator. The codes
  can
  work,
  but some bugs may be not triggered.
 
 
  How do you reconcile the incoming QEMU save record with the
  running
  emulator state?
 
  We introduce a qmp command xen-load-devices-
  state(libxl__qmp_restore) which
  can restore the emulator state. The step of resotre emulator state at
 a
  checkpoint is:
 
  1. libxl__qmp_stop- vm_stop() in qemu
  2. libxl__qmp_restore - load_vmstate() in qemu
  3. libxl__qmp_resume  - vm_start() in qemu
 
 
  Ok, that sounds like the ideal time to hook back into Xen by creating a
  new
  ioreq server.
 
  I have some questions about ioreq server:
  1. If we use old version xen and newest version qemu, is it OK? Is
 default
 ioreq server created when the guest is created.
  xen_create_ioreq_server()
  does
 nothing, and xen_get_ioreq_server_info() will get the default ioreq
  server
  information.
 Is it right?
 
  No. It's not compatible in that direction. A new Xen will work with an old
  QEMU but not the other way round.
 
 If the xen is newest, and qemu is old, how is the default ioreq server created
 for the emulator? Will the old qemu call xc_get_hvm_param(), and the
 hypervisor
 then creates a default ioreq server?
 get is a 'readonly' operation, and creating the default ioreq server in it is
 very strange.
 

Might be strange, but that's the way it's done.

 
  2. Why we create a default ioreq server when getting the hvm param if
  there
  is already a
 not default ioreq server?
 
  If something reads the 'legacy' HVM params then that is Xen's trigger to
  create the default server. Any 'new' emulator should be using the ioreq
  server hypercalls so the default server will not be needed.
 
  If there are two ioreq servers: default ioreq server, and a ioreq server
  created by emulator. The guest can work it correctly in
  this case?
 
  You mean a secondary emulator? Yes, that's why there is the notion of
 default ioreq server... to allow a secondary emulator to be used even when
 an old QEMU is in use.
 
 No, only one emulator. Can we run more than one emulator for one hvm
 guest? How
 to do it?
 

Yes, more than one emulator can run. There's nothing in libxl to do it, but we 
do it in XenServer using XAPI.

 
  Is there any application(not emulator) that uses the libxenctrl
  directly?
 
 
  What do you mean by application? Toolstacks may use libxenctrl.
 
 For example: libvirt. I know it uses libxl now. Is there any similar 
 application
 which uses libxenctrl.

Well, XAPI uses it for one.

  Paul

 
 Thanks
 Wen Congyang
 
 
Paul
 
  Thanks
  Wen Congyang
 
 
  3. In the far end, we will clear the ioreq page, and this ioreq page is
 used
  for
  default
 ioreq server, is it right?
 
  Yes, AFAIK it's only the 'magic' pages that get cleared at the far end - 
  and
  that includes the default server pages. 

Re: [Xen-devel] [PATCH v2 COLOPre 07/13] tools/libxl: Update libxl__domain_unpause() to support qemu-xen

2015-06-12 Thread Wei Liu
On Mon, Jun 08, 2015 at 11:43:11AM +0800, Yang Hongyang wrote:
 Currently, libxl__domain_unpause() only supports
 qemu-xen-traditional. Update it to support qemu-xen.
 
 Signed-off-by: Wen Congyang we...@cn.fujitsu.com
 Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com

This looks very similar to an existing function called
libxl__domain_resume_device_model. Maybe you don't need to invent a new
function.

 ---
  tools/libxl/libxl.c | 42 +-
  1 file changed, 33 insertions(+), 9 deletions(-)
 
 diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
 index d5691dc..5c843c2 100644
 --- a/tools/libxl/libxl.c
 +++ b/tools/libxl/libxl.c
 @@ -933,10 +933,37 @@ out:
  return AO_INPROGRESS;
  }
  
 -int libxl__domain_unpause(libxl__gc *gc, uint32_t domid)
 +static int libxl__domain_unpause_device_model(libxl__gc *gc, uint32_t domid)
  {
  char *path;
  char *state;
 +
 +switch (libxl__device_model_version_running(gc, domid)) {
 +case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN_TRADITIONAL: {
 +uint32_t dm_domid = libxl_get_stubdom_id(CTX, domid);
 +
 +path = libxl__device_model_xs_path(gc, dm_domid, domid, /state);
 +state = libxl__xs_read(gc, XBT_NULL, path);
 +if (state != NULL  !strcmp(state, paused)) {

The only difference between your function and
libxl__domain_unpause_device_model is the check for state node. I
think you can just add the check to libxl__domain_resume_device_model
and use that function.

Wei.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v11 6/9] xen: Add ring 3 vmware_port support

2015-06-12 Thread Don Slutz
On 06/12/15 02:25, Jan Beulich wrote:
 On 12.06.15 at 00:10, dsl...@verizon.com wrote:
 On 06/05/15 06:54, Ian Campbell wrote:
 It would be really useful to see a comprehensive list of exactly what
 guest ring3 access to the vmware port actually enables i.e. a list of
 specific features which require it.

 Ok, I have done some testing.  Here is what I know:

 Without ring3 support:

 1) VMware tools will not install on linux and windows.
 2) open-vm-tools (https://github.com/vmware/open-vm-tools) will not
 install (how ever it is not hard to change it to do so, you need to add
 a call to iopl(3) need to be added in a few places) on linux

 However if VMware tools did get installed on the window disk bits
 somehow, the VMware mouse support works.  Linux gets this because Xorg
 detects and uses the VMware mouse under IOPL(3).
 
 Now that tells us that the tools may not work, but not what
 implications that has on the usability of the VM once migrated
 to Xen. Them not installing is a non-issue afaict, since after
 having moved the VM to Xen there shouldn't be a need to
 install them anymore - either they've been there, or I don't
 see why they would be needed _after_ the move. And you
 say that the mouse works in both cases if the tools happen
 to be there.
 

The VMware tools service will start but does not work.  It adds to the
event log:

[warning][vmusr:vmtoolsd] The vmuser service needs to run inside a
virtual machine.

And the available features of VMware tools is disabled:

   1) The ability to perform virtual machine power operations gracefully
is missing. (code to access QEMU's from Xen to do this is  missing).
I.E. get windows to shutdown when requested!

   2) Execution of VMware provided or user configured scripts in guests
during various power operations.

   3) Clock synchronization between guests and hosts or client desktops.

   4) Access to VMware guest info variables (code to access QEMU's from
Xen to do this is missing).  This can be used to customize guest
operating systems immediately after powering on virtual machines. It can
also be used to monitor the health of a guest.


The reason to install them after is to get the VMware mouse driver on
Windows.  This mouse driver works much better on Window when there is
higher network latency.

   -Don Slutz

 Jan
 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH OSSTEST v2] Arrange to upgrade microcode on x86 test hosts.

2015-06-12 Thread Ian Jackson
Ian Campbell writes ([PATCH OSSTEST v2] Arrange to upgrade microcode on x86 
test hosts.):
 Both Xen and Linux support extracting a microcode update from an
 initramfs early during boot. This requires prepending a suitable
 uncompressed cpio archive containing the necessary files to the
 initrd.
...
 Upon commit please run mg-cpu-microcode-update and set
 MicrocodeUpdateAmd64 and MicrocodeUpdateI386 to the resulting file in
 both production-config and production-config-cambridge (I will copy
 the binary to the Cambridge location)

Acked-by: Ian Jackson ian.jack...@eu.citrix.com

Please, as discussed on irc, feel free to run this new
mg-cpu-microcode-update on the production instance.

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH 4/6] gnttab: simplify page copying/clearing

2015-06-12 Thread Jan Beulich
... by making {copy,clear}_domain_page() available also on other than
x86.

Signed-off-by: Jan Beulich jbeul...@suse.com

--- a/xen/arch/x86/domain_page.c
+++ b/xen/arch/x86/domain_page.c
@@ -230,24 +230,6 @@ void unmap_domain_page(const void *ptr)
 local_irq_restore(flags);
 }
 
-void clear_domain_page(unsigned long mfn)
-{
-void *ptr = map_domain_page(mfn);
-
-clear_page(ptr);
-unmap_domain_page(ptr);
-}
-
-void copy_domain_page(unsigned long dmfn, unsigned long smfn)
-{
-const void *src = map_domain_page(smfn);
-void *dst = map_domain_page(dmfn);
-
-copy_page(dst, src);
-unmap_domain_page(dst);
-unmap_domain_page(src);
-}
-
 int mapcache_domain_init(struct domain *d)
 {
 struct mapcache_domain *dcache = d-arch.pv_domain.mapcache;
--- a/xen/common/grant_table.c
+++ b/xen/common/grant_table.c
@@ -1643,7 +1643,6 @@ gnttab_transfer(
 if ( (1UL  (max_bitsize - PAGE_SHIFT)) = mfn )
 {
 struct page_info *new_page;
-void *sp, *dp;
 
 new_page = alloc_domheap_page(e, MEMF_no_owner |
  MEMF_bits(max_bitsize));
@@ -1653,11 +1652,7 @@ gnttab_transfer(
 goto unlock_and_copyback;
 }
 
-sp = map_domain_page(mfn);
-dp = __map_domain_page(new_page);
-memcpy(dp, sp, PAGE_SIZE);
-unmap_domain_page(dp);
-unmap_domain_page(sp);
+copy_domain_page(page_to_mfn(new_page), mfn);
 
 page-count_info = ~(PGC_count_mask|PGC_allocated);
 free_domheap_page(page);
@@ -2434,7 +2429,7 @@ gnttab_set_version(XEN_GUEST_HANDLE_PARA
 /* Make sure there's no crud left over in the table from the
old version. */
 for ( i = 0; i  nr_grant_frames(gt); i++ )
-memset(gt-shared_raw[i], 0, PAGE_SIZE);
+clear_page(gt-shared_raw[i]);
 
 /* Restore the first 8 entries (toolstack reserved grants) */
 if ( gt-gt_version != 0  op.version == 1 )
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -1170,6 +1170,26 @@ long do_memory_op(unsigned long cmd, XEN
 return rc;
 }
 
+#ifdef CONFIG_DOMAIN_PAGE
+void clear_domain_page(unsigned long mfn)
+{
+void *ptr = map_domain_page(mfn);
+
+clear_page(ptr);
+unmap_domain_page(ptr);
+}
+
+void copy_domain_page(unsigned long dmfn, unsigned long smfn)
+{
+const void *src = map_domain_page(smfn);
+void *dst = map_domain_page(dmfn);
+
+copy_page(dst, src);
+unmap_domain_page(dst);
+unmap_domain_page(src);
+}
+#endif
+
 void destroy_ring_for_helper(
 void **_va, struct page_info *page)
 {
--- a/xen/include/asm-arm/page.h
+++ b/xen/include/asm-arm/page.h
@@ -264,6 +264,8 @@ static inline lpae_t mfn_to_xen_entry(un
 /* Actual cacheline size on the boot CPU. */
 extern size_t cacheline_bytes;
 
+#define copy_page(dp, sp) memcpy(dp, sp, PAGE_SIZE)
+
 /* Functions for flushing medium-sized areas.
  * if 'range' is large enough we might want to use model-specific
  * full-cache flushes. */



gnttab: simplify page copying/clearing

... by making {copy,clear}_domain_page() available also on other than
x86.

Signed-off-by: Jan Beulich jbeul...@suse.com

--- a/xen/arch/x86/domain_page.c
+++ b/xen/arch/x86/domain_page.c
@@ -230,24 +230,6 @@ void unmap_domain_page(const void *ptr)
 local_irq_restore(flags);
 }
 
-void clear_domain_page(unsigned long mfn)
-{
-void *ptr = map_domain_page(mfn);
-
-clear_page(ptr);
-unmap_domain_page(ptr);
-}
-
-void copy_domain_page(unsigned long dmfn, unsigned long smfn)
-{
-const void *src = map_domain_page(smfn);
-void *dst = map_domain_page(dmfn);
-
-copy_page(dst, src);
-unmap_domain_page(dst);
-unmap_domain_page(src);
-}
-
 int mapcache_domain_init(struct domain *d)
 {
 struct mapcache_domain *dcache = d-arch.pv_domain.mapcache;
--- a/xen/common/grant_table.c
+++ b/xen/common/grant_table.c
@@ -1643,7 +1643,6 @@ gnttab_transfer(
 if ( (1UL  (max_bitsize - PAGE_SHIFT)) = mfn )
 {
 struct page_info *new_page;
-void *sp, *dp;
 
 new_page = alloc_domheap_page(e, MEMF_no_owner |
  MEMF_bits(max_bitsize));
@@ -1653,11 +1652,7 @@ gnttab_transfer(
 goto unlock_and_copyback;
 }
 
-sp = map_domain_page(mfn);
-dp = __map_domain_page(new_page);
-memcpy(dp, sp, PAGE_SIZE);
-unmap_domain_page(dp);
-unmap_domain_page(sp);
+copy_domain_page(page_to_mfn(new_page), mfn);
 
 page-count_info = ~(PGC_count_mask|PGC_allocated);
 free_domheap_page(page);
@@ -2434,7 +2429,7 @@ gnttab_set_version(XEN_GUEST_HANDLE_PARA
 /* Make sure there's no crud left over in the table from the
old version. */
 for ( i = 0; i  nr_grant_frames(gt); i++ )
-memset(gt-shared_raw[i], 0, PAGE_SIZE);
+

Re: [Xen-devel] [PATCH] libxl: libxl_internal.h: Clarify ao rule against internal callers

2015-06-12 Thread Ian Jackson
Ian Campbell writes (Re: [PATCH] libxl: libxl_internal.h: Clarify ao rule 
against internal callers):
 On Thu, 2015-06-11 at 17:57 +0100, Ian Jackson wrote:
  Signed-off-by: Ian Jackson ian.jack...@eu.citrix.com
  CC: Ian Campbell ian.campb...@citrix.com
  CC: Wei Liu wei.l...@citrix.com
  CC: Juergen Gross jgr...@suse.com
 
 Acked-by: Ian Campbell ian.campb...@citrix.com
 
 AO_INITIATOR_ENTRY was just a stray remainder of some wip name?

Yes.

Thanks,
Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] libxl: libxl_internal.h: Clarify ao rule against internal callers

2015-06-12 Thread Ian Jackson
Juergen Gross writes (Re: [Xen-devel] [PATCH] libxl: libxl_internal.h: Clarify 
ao rule against internal callers):
 Now it's 100% clear what to avoid.
 
 Acked-by: Juergen Gross jgr...@suse.com

Thanks, pushed.

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] libxl backports for 4.4 and 4.5

2015-06-12 Thread Ian Jackson
Ian Jackson writes (Re: [Xen-devel] libxl backports for 4.4 and 4.5):
 I propose to push all of these today, unless you object.  I will
 double-check with you on irc.

Now done.

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] Backport request libxl: In libxl_set_vcpuonline check for maximum number of VCPUs against the cpumap. (Was: Re: [Bug report] Security issue in xl vcpu-set)

2015-06-12 Thread Ian Jackson
Ian Campbell writes (Backport request libxl: In libxl_set_vcpuonline check 
for maximum number of VCPUs against the cpumap. (Was: Re: [Bug report] 
Security issue in xl vcpu-set)):
 commit d83bf9d224eeb5b73b93c2703f7dba4473cfa89c
 Author: Konrad Rzeszutek Wilk konrad.w...@oracle.com
 Date:   Fri Apr 3 16:02:29 2015 -0400
 
 libxl: In libxl_set_vcpuonline check for maximum number of VCPUs against 
 the cpumap.

Now backported to staging-4.5.  I fixed up the conflict, correctly I
think.

Ian.

commit 0d8cbcad03764e42ff2f0d224aff883c3734d782
Author: Konrad Rzeszutek Wilk konrad.w...@oracle.com
Date:   Fri Apr 3 16:02:29 2015 -0400

libxl: In libxl_set_vcpuonline check for maximum number of VCPUs against 
the cpumap.

There is no sense in trying to online (or offline) CPUs when the size of
cpumap is greater than the maximum number of VCPUs the guest can go to.

As such fail the operation if the count of CPUs to online is greater
than what the guest started with. For the offline case we do not
check (as the bits are unset in the cpumap) and let it go through.

We coalesce some of the underlying libxl_set_vcpuonline code
together which was duplicated in QMP and XenStore codepaths.

Signed-off-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
Acked-by: Ian Campbell ian.campb...@citrix.com
(cherry picked from commit d83bf9d224eeb5b73b93c2703f7dba4473cfa89c)

Conflicts:
tools/libxl/libxl.c
Signed-off-by: Ian Jackson ian.jack...@eu.citrix.com

diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index 1f4dce2..489d5f8 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -5487,25 +5487,19 @@ int libxl_domain_get_nodeaffinity(libxl_ctx *ctx, 
uint32_t domid,
 }
 
 static int libxl__set_vcpuonline_xenstore(libxl__gc *gc, uint32_t domid,
- libxl_bitmap *cpumap)
+ libxl_bitmap *cpumap,
+ const libxl_dominfo *info)
 {
-libxl_dominfo info;
 char *dompath;
 xs_transaction_t t;
 int i, rc = ERROR_FAIL;
 
-libxl_dominfo_init(info);
-
-if (libxl_domain_info(CTX, info, domid)  0) {
-LOGE(ERROR, getting domain info list);
-goto out;
-}
 if (!(dompath = libxl__xs_get_dompath(gc, domid)))
 goto out;
 
 retry_transaction:
 t = xs_transaction_start(CTX-xsh);
-for (i = 0; i = info.vcpu_max_id; i++)
+for (i = 0; i = info-vcpu_max_id; i++)
 libxl__xs_write(gc, t,
libxl__sprintf(gc, %s/cpu/%u/availability, dompath, 
i),
%s, libxl_bitmap_test(cpumap, i) ? online : 
offline);
@@ -5515,24 +5509,16 @@ retry_transaction:
 } else
 rc = 0;
 out:
-libxl_dominfo_dispose(info);
 return rc;
 }
 
 static int libxl__set_vcpuonline_qmp(libxl__gc *gc, uint32_t domid,
- libxl_bitmap *cpumap)
+ libxl_bitmap *cpumap,
+ const libxl_dominfo *info)
 {
-libxl_dominfo info;
 int i;
 
-libxl_dominfo_init(info);
-
-if (libxl_domain_info(CTX, info, domid)  0) {
-LOGE(ERROR, getting domain info list);
-libxl_dominfo_dispose(info);
-return ERROR_FAIL;
-}
-for (i = 0; i = info.vcpu_max_id; i++) {
+for (i = 0; i = info-vcpu_max_id; i++) {
 if (libxl_bitmap_test(cpumap, i)) {
 /* Return value is ignore because it does not tell anything useful
  * on the completion of the command.
@@ -5542,33 +5528,53 @@ static int libxl__set_vcpuonline_qmp(libxl__gc *gc, 
uint32_t domid,
 libxl__qmp_cpu_add(gc, domid, i);
 }
 }
-libxl_dominfo_dispose(info);
 return 0;
 }
 
 int libxl_set_vcpuonline(libxl_ctx *ctx, uint32_t domid, libxl_bitmap *cpumap)
 {
 GC_INIT(ctx);
-int rc;
+int rc, maxcpus;
+libxl_dominfo info;
+
+libxl_dominfo_init(info);
+
+rc = libxl_domain_info(CTX, info, domid);
+if (rc  0) {
+LOGE(ERROR, getting domain info list);
+goto out;
+}
+
+maxcpus = libxl_bitmap_count_set(cpumap);
+if (maxcpus  info.vcpu_max_id + 1)
+{
+LOGE(ERROR, Requested %d VCPUs, however maxcpus is %d!,
+ maxcpus, info.vcpu_max_id + 1);
+rc = ERROR_FAIL;
+goto out;
+}
+
 switch (libxl__domain_type(gc, domid)) {
 case LIBXL_DOMAIN_TYPE_HVM:
 switch (libxl__device_model_version_running(gc, domid)) {
 case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN_TRADITIONAL:
-rc = libxl__set_vcpuonline_xenstore(gc, domid, cpumap);
+rc = libxl__set_vcpuonline_xenstore(gc, domid, cpumap, info);
 break;
 case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN:
-rc = libxl__set_vcpuonline_qmp(gc, domid, cpumap);
+rc = libxl__set_vcpuonline_qmp(gc, domid, cpumap, info);
   

Re: [Xen-devel] [Draft F] Xen on ARM vITS Handling

2015-06-12 Thread Ian Campbell
On Thu, 2015-06-11 at 10:40 +0100, Ian Campbell wrote:
 ## Command Queue Virtualisation
 
 The command translation/emulation in this design has been arranged to
 be as cheap as possible (e.g. in many cases the actions are NOPs),
 avoiding previous concerns about the length of time which an emulated
 write to a `CWRITER` register may block the vcpu.
 
 The vits will simply track its reader and writer pointers. On write
 to `CWRITER` it will immediately and synchronously process all
 commands in the queue and update its state accordingly.
 
 It might be possible to implement a rudimentary form of preemption by
 periodically (as determined by `hypercall_preempt_check()`) returning
 to the guest without incrementing PC but with updated internal
 `CREADR` state, meaning it will reexecute the write to `CWRITER` and
 we can pickup where we left off for another iteration. This at least
 lets us schedule other vcpus etc and prevents a monopoly.

In the presence of multiple VCPUs writing to GITS_CWRITER preemption
actually gets pretty subtle. I suggest leaving it out for now.

Ian.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 COLOPre 10/13] tools/libxl: Add back channel to allow migration target send data back

2015-06-12 Thread Wei Liu
On Mon, Jun 08, 2015 at 11:43:14AM +0800, Yang Hongyang wrote:
 From: Wen Congyang we...@cn.fujitsu.com
 
 In colo mode, slave needs to send data to master, but the io_fd
 only can be written in master, and only can be read in slave.
 Save recv_fd in domain_suspend_state, and send_fd in
 domain_create_state.
 

You failed to mention in commit message new structures are introduced in
IDL.

 Signed-off-by: Wen Congyang we...@cn.fujitsu.com
 Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com
 ---
  tools/libxl/libxl.c  |  2 +-
  tools/libxl/libxl_create.c   | 14 ++
  tools/libxl/libxl_internal.h |  2 ++
  tools/libxl/libxl_types.idl  |  7 +++
  tools/libxl/xl_cmdimpl.c |  7 +++

You also need to add LIBXL_HAVE in libxl.h.

  5 files changed, 27 insertions(+), 5 deletions(-)
 
 diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
 index 5c843c2..36b97fe 100644
 --- a/tools/libxl/libxl.c
 +++ b/tools/libxl/libxl.c
 @@ -832,7 +832,7 @@ int libxl_domain_remus_start(libxl_ctx *ctx, 
 libxl_domain_remus_info *info,
  dss-callback = remus_failover_cb;
  dss-domid = domid;
  dss-fd = send_fd;
 -/* TODO do something with recv_fd */
 +dss-recv_fd = recv_fd;
  dss-type = type;
  dss-live = 1;
  dss-debug = 0;
 diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
 index 86384d2..bd8149c 100644
 --- a/tools/libxl/libxl_create.c
 +++ b/tools/libxl/libxl_create.c
 @@ -1577,8 +1577,8 @@ static void domain_create_cb(libxl__egc *egc,
   int rc, uint32_t domid);
  
  static int do_domain_create(libxl_ctx *ctx, libxl_domain_config *d_config,
 -uint32_t *domid,
 -int restore_fd, int checkpointed_stream,
 +uint32_t *domid, int restore_fd,
 +int send_fd, int checkpointed_stream,
  const libxl_asyncop_how *ao_how,
  const libxl_asyncprogress_how *aop_console_how)
  {
 @@ -1591,6 +1591,7 @@ static int do_domain_create(libxl_ctx *ctx, 
 libxl_domain_config *d_config,
  libxl_domain_config_init(cdcs-dcs.guest_config_saved);
  libxl_domain_config_copy(ctx, cdcs-dcs.guest_config_saved, d_config);
  cdcs-dcs.restore_fd = restore_fd;
 +cdcs-dcs.send_fd = send_fd;
  cdcs-dcs.callback = domain_create_cb;
  cdcs-dcs.checkpointed_stream = checkpointed_stream;
  libxl__ao_progress_gethow(cdcs-dcs.aop_console_how, aop_console_how);
 @@ -1619,7 +1620,7 @@ int libxl_domain_create_new(libxl_ctx *ctx, 
 libxl_domain_config *d_config,
  const libxl_asyncop_how *ao_how,
  const libxl_asyncprogress_how *aop_console_how)
  {
 -return do_domain_create(ctx, d_config, domid, -1, 0,
 +return do_domain_create(ctx, d_config, domid, -1, -1, 0,
  ao_how, aop_console_how);
  }
  
 @@ -1629,7 +1630,12 @@ int libxl_domain_create_restore(libxl_ctx *ctx, 
 libxl_domain_config *d_config,
  const libxl_asyncop_how *ao_how,
  const libxl_asyncprogress_how 
 *aop_console_how)
  {
 -return do_domain_create(ctx, d_config, domid, restore_fd,
 +int send_fd = -1;
 +
 +if (params-checkpointed_stream == LIBXL_CHECKPOINTED_STREAM_COLO)
 +send_fd = params-send_fd;
 +
 +return do_domain_create(ctx, d_config, domid, restore_fd, send_fd,
  params-checkpointed_stream, ao_how, 
 aop_console_how);
  }
  
 diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
 index fbbae93..6d214b5 100644
 --- a/tools/libxl/libxl_internal.h
 +++ b/tools/libxl/libxl_internal.h
 @@ -2874,6 +2874,7 @@ struct libxl__domain_save_state {
  
  uint32_t domid;
  int fd;
 +int recv_fd;
  libxl_domain_type type;
  int live;
  int debug;
 @@ -3143,6 +3144,7 @@ struct libxl__domain_create_state {
  libxl_domain_config *guest_config;
  libxl_domain_config guest_config_saved; /* vanilla config */
  int restore_fd;
 +int send_fd;
  libxl__domain_create_cb *callback;
  libxl_asyncprogress_how aop_console_how;
  /* private to domain_create */
 diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
 index 23f27d4..8a3d7ba 100644
 --- a/tools/libxl/libxl_types.idl
 +++ b/tools/libxl/libxl_types.idl
 @@ -198,6 +198,12 @@ libxl_viridian_enlightenment = 
 Enumeration(viridian_enlightenment, [
  (3, reference_tsc),
  ])
  
 +libxl_checkpointed_stream = Enumeration(checkpointed_stream, [
 +(0, NONE),
 +(1, REMUS),
 +(2, COLO),
 +], init_val = 0)

The default init_val is 0 so you don't need to write it down.

 +
  #
  # Complex libxl types
  #
 @@ -346,6 +352,7 @@ libxl_domain_create_info = Struct(domain_create_info,[
  
  libxl_domain_restore_params = Struct(domain_restore_params, [
  (checkpointed_stream, 

Re: [Xen-devel] [PATCH v2] tools/libxc: Batch memory allocations for PV guests

2015-06-12 Thread Wei Liu
On Thu, Jun 11, 2015 at 09:52:44AM +0100, Ross Lagerwall wrote:
 The current code for allocating memory for PV guests batches the
 hypercalls to allocate memory by allocating 1024*1024 extents of order 0
 at a time. To make this faster, first try allocating extents of order 9
 (2 MiB) before falling back to the order 0 allocating if the order 9
 allocation fails.
 
 On my test machine this reduced the time to start a 128 GiB PV guest by
 about 60 seconds.
 
 Signed-off-by: Ross Lagerwall ross.lagerw...@citrix.com
 ---
 
 Changed in v2: Batched hypercalls for order 9 allocations.
 
  tools/libxc/xc_dom_x86.c | 26 --
  1 file changed, 24 insertions(+), 2 deletions(-)
 
 diff --git a/tools/libxc/xc_dom_x86.c b/tools/libxc/xc_dom_x86.c
 index 783f749..2d461e3 100644
 --- a/tools/libxc/xc_dom_x86.c
 +++ b/tools/libxc/xc_dom_x86.c
 @@ -761,7 +761,7 @@ int arch_setup_meminit(struct xc_dom_image *dom)
  {
  int rc;
  xen_pfn_t pfn, allocsz, mfn, total, pfn_base;
 -int i, j;
 +int i, j, k;
  
  rc = x86_compat(dom-xch, dom-guest_domid, dom-guest_type);
  if ( rc )
 @@ -869,6 +869,8 @@ int arch_setup_meminit(struct xc_dom_image *dom)
  unsigned int memflags;
  uint64_t pages;
  unsigned int pnode = dom-vnode_to_pnode[dom-vmemranges[i].nid];
 +int count = dom-total_pages  SUPERPAGE_PFN_SHIFT;
 +xen_pfn_t extents[count];
  

Sorry, I forgot to mention not to follow this practice.

We don't want unbound growth of the stack, which can lead to process
getting killed. The hypervisor would preempt long running operation
anyway so the overly large array doesn't necessarily reduce the number
of hypercall issued.

Could you define a batch size (say, 512 or 1024) and then start from
there?

Sorry again for not having mentioned that earlier.

Wei.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC v2 08/15] Update IRTE according to guest interrupt config changes

2015-06-12 Thread Jan Beulich
 On 12.06.15 at 11:40, feng...@intel.com wrote:
 From: Jan Beulich [mailto:jbeul...@suse.com]
 Sent: Tuesday, June 09, 2015 11:06 PM
  On 08.05.15 at 11:07, feng...@intel.com wrote:
  +static bool_t pi_find_dest_vcpu(struct domain *d, uint8_t dest_id,
  +uint8_t dest_mode, uint8_t
 delivery_mode,
  +uint8_t gvec, struct vcpu
 **dest_vcpu)
  +{
  +struct vcpu *v, **dest_vcpu_array;
  +unsigned int dest_vcpu_num = 0;
  +int ret;
 
 This, being given as operand to return, should match in type with
 the function's return type.
 
  +dest_vcpu_array = xzalloc_array(struct vcpu *, d-max_vcpus);
 
 You realize that this can be quite big an allocation? (You could at
 least halve it by storing vCPU IDs instead of pointers, but if I'm
 not mistaken this could even be a simple bitmap.)
 
 If we use vCPU IDs or bitmap, we need to iterate all the vCPUs in the
 domain to find the right vCPU from the vcpu_id, right?

Why? You can index d-vcpu[].

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] xen/arm: Propagate clock-frequency to DOMU if present in the DT timer node

2015-06-12 Thread Julien Grall



On 11/06/2015 17:43, Chris (Christopher) Brand wrote:

Hi Julien,


Hi Chris,


The patch does work exactly as advertised.

When I used dtc to convert CONFIG_DTB_FILE from dtb to dts, I could see that it didn't in 
fact have a timer clock-frequency node. After re-creating the dtb and rebuilding Xen, 
ls /proc/device-tree/timer/ shows a clock-frequency file. When I then fire up 
DomU and do the same command, it too has a clock-frequency node.


Can I get your Tested-by on this patch?

Regards,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] PCI Passthrough ARM Design : Draft1

2015-06-12 Thread Julien Grall



On 12/06/2015 04:32, Ian Campbell wrote:

On Thu, 2015-06-11 at 14:38 -0700, Manish Jaggi wrote:


On Wednesday 10 June 2015 12:21 PM, Julien Grall wrote:

Hi,

On 10/06/2015 08:45, Ian Campbell wrote:

4. DomU access / assignment PCI device
--
When a device is attached to a domU, provision has to be made such that
it can
access the MMIO space of the device and xen is able to identify the
mapping
between guest bdf and system bdf. Two hypercalls are introduced


I don't think we want/need new hypercalls here, the same existing
hypercalls which are used on x86 should be suitable.

I think both the hypercalls are necessary
a) the mapping of guest bdf to actual sbdf is required as domU accesses
for GIC are trapped and not handled by pciback.
A device say 1:0:0.3 is assigned in domU at 0:0:0.3. This is the bestway
I could find that works.

b) map_mmio call is issued just after the device is added on the pcu bus
(in case of domU)
The function register_xen_pci_notifier (drivers/xen/pci.c) is modified
such that notification is received in domU and dom0.


In which please please add to the document a discussion of the current
interfaces and why they are not suitable.


Beware that the 1:1 mapping doesn't fit with the current guest memory
layout which is pre-defined at Xen build time. So you would also have
to make it dynamically or decide to use the same memory layout as the
host.

If same layout as host used, would there be any issue?


I'm not sure that a 1:1 mapping is any different to the host layout. But
in any case, the host layout also doesn't match the guest layout, so it
has the same issues.


I was suggesting to expose the host layout to the guest layout (similar 
to e820).


Although, this is not my preferred way and a non 1:1 mapping would be 
the best.


Regards,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH 2/6] gnttab: limit mapcount() looping

2015-06-12 Thread Jan Beulich
The function also doesn't need to return counts; all its callers are
after is whether at least one entry of a certain kind exists. With that
there's no point for that loop to continue once the looked for
condition was found to be met by one entry. Rename the function to
match the changed behavior.

Signed-off-by: Jan Beulich jbeul...@suse.com

--- a/xen/common/grant_table.c
+++ b/xen/common/grant_table.c
@@ -541,24 +541,28 @@ static int grant_map_exists(const struct
 return -EINVAL;
 }
 
-static void mapcount(
-struct grant_table *lgt, struct domain *rd, unsigned long mfn,
-unsigned int *wrc, unsigned int *rdc)
+#define MAPKIND_READ 1
+#define MAPKIND_WRITE 2
+static unsigned int mapkind(
+const struct grant_table *lgt, const struct domain *rd, unsigned long mfn)
 {
 struct grant_mapping *map;
 grant_handle_t handle;
+unsigned int kind = 0;
 
-*wrc = *rdc = 0;
-
-for ( handle = 0; handle  lgt-maptrack_limit; handle++ )
+for ( handle = 0; !(kind  MAPKIND_WRITE) 
+  handle  lgt-maptrack_limit; handle++ )
 {
 map = maptrack_entry(lgt, handle);
 if ( !(map-flags  (GNTMAP_device_map|GNTMAP_host_map)) ||
  map-domid != rd-domain_id )
 continue;
 if ( active_entry(rd-grant_table, map-ref).frame == mfn )
-(map-flags  GNTMAP_readonly) ? (*rdc)++ : (*wrc)++;
+kind |= map-flags  GNTMAP_readonly ?
+MAPKIND_READ : MAPKIND_WRITE;
 }
+
+return kind;
 }
 
 /*
@@ -782,21 +786,21 @@ __gnttab_map_grant_ref(
 
 if ( gnttab_need_iommu_mapping(ld) )
 {
-unsigned int wrc, rdc;
+unsigned int kind = mapkind(lgt, rd, frame);
 int err = 0;
+
 /* We're not translated, so we know that gmfns and mfns are
the same things, so the IOMMU entry is always 1-to-1. */
-mapcount(lgt, rd, frame, wrc, rdc);
 if ( (act_pin  (GNTPIN_hstw_mask|GNTPIN_devw_mask)) 
  !(old_pin  (GNTPIN_hstw_mask|GNTPIN_devw_mask)) )
 {
-if ( wrc == 0 )
+if ( !(kind  MAPKIND_WRITE) )
 err = iommu_map_page(ld, frame, frame,
  IOMMUF_readable|IOMMUF_writable);
 }
 else if ( act_pin  !old_pin )
 {
-if ( (wrc + rdc) == 0 )
+if ( !kind )
 err = iommu_map_page(ld, frame, frame, IOMMUF_readable);
 }
 if ( err )
@@ -990,12 +994,12 @@ __gnttab_unmap_common(
 
 if ( gnttab_need_iommu_mapping(ld) )
 {
-unsigned int wrc, rdc;
+unsigned int kind = mapkind(lgt, rd, op-frame);
 int err = 0;
-mapcount(lgt, rd, op-frame, wrc, rdc);
-if ( (wrc + rdc) == 0 )
+
+if ( !kind )
 err = iommu_unmap_page(ld, op-frame);
-else if ( wrc == 0 )
+else if ( !(kind  MAPKIND_WRITE) )
 err = iommu_map_page(ld, op-frame, op-frame, IOMMUF_readable);
 if ( err )
 {



gnttab: limit mapcount() looping

The function also doesn't need to return counts; all its callers are
after is whether at least one entry of a certain kind exists. With that
there's no point for that loop to continue once the looked for
condition was found to be met by one entry. Rename the function to
match the changed behavior.

Signed-off-by: Jan Beulich jbeul...@suse.com

--- a/xen/common/grant_table.c
+++ b/xen/common/grant_table.c
@@ -541,24 +541,28 @@ static int grant_map_exists(const struct
 return -EINVAL;
 }
 
-static void mapcount(
-struct grant_table *lgt, struct domain *rd, unsigned long mfn,
-unsigned int *wrc, unsigned int *rdc)
+#define MAPKIND_READ 1
+#define MAPKIND_WRITE 2
+static unsigned int mapkind(
+const struct grant_table *lgt, const struct domain *rd, unsigned long mfn)
 {
 struct grant_mapping *map;
 grant_handle_t handle;
+unsigned int kind = 0;
 
-*wrc = *rdc = 0;
-
-for ( handle = 0; handle  lgt-maptrack_limit; handle++ )
+for ( handle = 0; !(kind  MAPKIND_WRITE) 
+  handle  lgt-maptrack_limit; handle++ )
 {
 map = maptrack_entry(lgt, handle);
 if ( !(map-flags  (GNTMAP_device_map|GNTMAP_host_map)) ||
  map-domid != rd-domain_id )
 continue;
 if ( active_entry(rd-grant_table, map-ref).frame == mfn )
-(map-flags  GNTMAP_readonly) ? (*rdc)++ : (*wrc)++;
+kind |= map-flags  GNTMAP_readonly ?
+MAPKIND_READ : MAPKIND_WRITE;
 }
+
+return kind;
 }
 
 /*
@@ -782,21 +786,21 @@ __gnttab_map_grant_ref(
 
 if ( gnttab_need_iommu_mapping(ld) )
 {
-unsigned int wrc, rdc;
+unsigned int kind = mapkind(lgt, rd, frame);
 int err = 0;
+
 /* We're not translated, so we know that gmfns and mfns are
the same things, so the IOMMU entry is always 1-to-1. */
-mapcount(lgt, rd, frame, wrc, rdc);
 if 

[Xen-devel] [PATCH 3/6] gnttab: simplify shared entry v1 vs v2 handling

2015-06-12 Thread Jan Beulich
In a number of places both v1 and v2 pointers are being obtained when
none or just one suffices. Additionally in __acquire_grant_for_copy()
the flow of if/else-if can be slightly improved by re-ordering.

Signed-off-by: Jan Beulich jbeul...@suse.com

--- a/xen/common/grant_table.c
+++ b/xen/common/grant_table.c
@@ -588,8 +588,6 @@ __gnttab_map_grant_ref(
 unsigned int   cache_flags;
 struct active_grant_entry *act = NULL;
 struct grant_mapping *mt;
-grant_entry_v1_t *sha1;
-grant_entry_v2_t *sha2;
 grant_entry_header_t *shah;
 uint16_t *status;
 
@@ -645,15 +643,7 @@ __gnttab_map_grant_ref(
 
 act = active_entry(rgt, op-ref);
 shah = shared_entry_header(rgt, op-ref);
-if (rgt-gt_version == 1) {
-sha1 = shared_entry_v1(rgt, op-ref);
-sha2 = NULL;
-status = shah-flags;
-} else {
-sha2 = shared_entry_v2(rgt, op-ref);
-sha1 = NULL;
-status = status_entry(rgt, op-ref);
-}
+status = rgt-gt_version == 1 ? shah-flags : status_entry(rgt, op-ref);
 
 /* If already pinned, check the active domid and avoid refcnt overflow. */
 if ( act-pin 
@@ -676,8 +666,10 @@ __gnttab_map_grant_ref(
 if ( !act-pin )
 {
 unsigned long frame;
+unsigned long gfn = rgt-gt_version == 1 ?
+shared_entry_v1(rgt, op-ref).frame :
+shared_entry_v2(rgt, op-ref).full_page.frame;
 
-unsigned long gfn = sha1 ? sha1-frame : sha2-full_page.frame;
 rc = __get_paged_frame(gfn, frame, pg, 
 !!(op-flags  GNTMAP_readonly), rd);
 if ( rc != GNTST_okay )
@@ -1882,7 +1874,6 @@ __acquire_grant_for_copy(
 uint16_t *page_off, uint16_t *length, unsigned allow_transitive)
 {
 struct grant_table *rgt = rd-grant_table;
-grant_entry_v1_t *sha1;
 grant_entry_v2_t *sha2;
 grant_entry_header_t *shah;
 struct active_grant_entry *act;
@@ -1909,13 +1900,11 @@ __acquire_grant_for_copy(
 shah = shared_entry_header(rgt, gref);
 if ( rgt-gt_version == 1 )
 {
-sha1 = shared_entry_v1(rgt, gref);
 sha2 = NULL;
 status = shah-flags;
 }
 else
 {
-sha1 = NULL;
 sha2 = shared_entry_v2(rgt, gref);
 status = status_entry(rgt, gref);
 }
@@ -1937,7 +1926,19 @@ __acquire_grant_for_copy(
 
 td = rd;
 trans_gref = gref;
-if ( sha2  (shah-flags  GTF_type_mask) == GTF_transitive )
+if ( !sha2 )
+{
+unsigned long gfn = shared_entry_v1(rgt, gref).frame;
+
+rc = __get_paged_frame(gfn, grant_frame, page, readonly, rd);
+if ( rc != GNTST_okay )
+goto unlock_out_clear;
+act-gfn = gfn;
+is_sub_page = 0;
+trans_page_off = 0;
+trans_length = PAGE_SIZE;
+}
+else if ( (shah-flags  GTF_type_mask) == GTF_transitive )
 {
 if ( !allow_transitive )
 PIN_FAIL(unlock_out_clear, GNTST_general_error,
@@ -1999,16 +2000,6 @@ __acquire_grant_for_copy(
 is_sub_page = 1;
 act-gfn = -1ul;
 }
-else if ( sha1 )
-{
-rc = __get_paged_frame(sha1-frame, grant_frame, page, readonly, 
rd);
-if ( rc != GNTST_okay )
-goto unlock_out_clear;
-act-gfn = sha1-frame;
-is_sub_page = 0;
-trans_page_off = 0;
-trans_length = PAGE_SIZE;
-}
 else if ( !(sha2-hdr.flags  GTF_sub_page) )
 {
 rc = __get_paged_frame(sha2-full_page.frame, grant_frame, page, 
readonly, rd);
@@ -3154,8 +3145,6 @@ static void gnttab_usage_print(struct do
 {
 struct active_grant_entry *act;
 struct grant_entry_header *sha;
-grant_entry_v1_t *sha1;
-grant_entry_v2_t *sha2;
 uint16_t status;
 uint64_t frame;
 
@@ -3167,16 +3156,12 @@ static void gnttab_usage_print(struct do
 
 if ( gt-gt_version == 1 )
 {
-sha1 = shared_entry_v1(gt, ref);
-sha2 = NULL;
 status = sha-flags;
-frame = sha1-frame;
+frame = shared_entry_v1(gt, ref).frame;
 }
 else
 {
-sha2 = shared_entry_v2(gt, ref);
-sha1 = NULL;
-frame = sha2-full_page.frame;
+frame = shared_entry_v2(gt, ref).full_page.frame;
 status = status_entry(gt, ref);
 }
 


gnttab: simplify shared entry v1 vs v2 handling

In a number of places both v1 and v2 pointers are being obtained when
none or just one suffices. Additionally in __acquire_grant_for_copy()
the flow of if/else-if can be slightly improved by re-ordering.

Signed-off-by: Jan Beulich jbeul...@suse.com

--- a/xen/common/grant_table.c
+++ b/xen/common/grant_table.c
@@ -588,8 +588,6 @@ __gnttab_map_grant_ref(
 

[Xen-devel] [PATCH 1/6] gnttab: eliminate several explicit version checks

2015-06-12 Thread Jan Beulich
By having nr_grant_entries() return zero when the grant table version
is still unset we can reduce the number of error paths and at once fix
grant_map_exists() running into the being removed ASSERT() when called
for a page owned by a domain not having its grant table set up yet.

Signed-off-by: Jan Beulich jbeul...@suse.com
---
Said ASSERT() was pointless in the grant_map_exists() case, since the
function only looks at active entries, which are version independent.
Hence this wasn't a security issue.

--- a/xen/common/grant_table.c
+++ b/xen/common/grant_table.c
@@ -319,11 +319,15 @@ get_maptrack_handle(
 /* Number of grant table entries. Caller must hold d's grant table lock. */
 static unsigned int nr_grant_entries(struct grant_table *gt)
 {
-ASSERT(gt-gt_version != 0);
-if (gt-gt_version == 1)
+switch ( gt-gt_version )
+{
+case 1:
 return (nr_grant_frames(gt)  PAGE_SHIFT) / sizeof(grant_entry_v1_t);
-else
+case 2:
 return (nr_grant_frames(gt)  PAGE_SHIFT) / sizeof(grant_entry_v2_t);
+}
+
+return 0;
 }
 
 static int _set_status_v1(domid_t  domid,
@@ -631,10 +635,6 @@ __gnttab_map_grant_ref(
 rgt = rd-grant_table;
 spin_lock(rgt-lock);
 
-if ( rgt-gt_version == 0 )
-PIN_FAIL(unlock_out, GNTST_general_error,
- remote grant table not yet set up\n);
-
 /* Bounds check on the grant ref */
 if ( unlikely(op-ref = nr_grant_entries(rgt)))
 PIN_FAIL(unlock_out, GNTST_bad_gntref, Bad ref (%d).\n, op-ref);
@@ -1502,14 +1502,6 @@ gnttab_prepare_for_transfer(
 
 spin_lock(rgt-lock);
 
-if ( rgt-gt_version == 0 )
-{
-gdprintk(XENLOG_INFO,
- Grant table not ready for transfer to domain(%d).\n,
- rd-domain_id);
-goto fail;
-}
-
 if ( unlikely(ref = nr_grant_entries(rgt)) )
 {
 gdprintk(XENLOG_INFO,
@@ -1905,10 +1897,6 @@ __acquire_grant_for_copy(
 
 spin_lock(rgt-lock);
 
-if ( rgt-gt_version == 0 )
-PIN_FAIL(unlock_out, GNTST_general_error,
- remote grant table not ready\n);
-
 if ( unlikely(gref = nr_grant_entries(rgt)) )
 PIN_FAIL(unlock_out, GNTST_bad_gntref,
  Bad grant reference %ld\n, gref);
@@ -2395,20 +2383,16 @@ gnttab_set_version(XEN_GUEST_HANDLE_PARA
change the version number, except for the first 8 entries which
are allowed to be in use (xenstore/xenconsole keeps them mapped).
(You need to change the version number for e.g. kexec.) */
-if ( gt-gt_version != 0 )
+for ( i = GNTTAB_NR_RESERVED_ENTRIES; i  nr_grant_entries(gt); i++ )
 {
-for ( i = GNTTAB_NR_RESERVED_ENTRIES; i  nr_grant_entries(gt); i++ )
+act = active_entry(gt, i);
+if ( act-pin != 0 )
 {
-act = active_entry(gt, i);
-if ( act-pin != 0 )
-{
-gdprintk(XENLOG_WARNING,
- tried to change grant table version from %d to %d, 
but some grant entries still in use\n,
- gt-gt_version,
- op.version);
-res = -EBUSY;
-goto out_unlock;
-}
+gdprintk(XENLOG_WARNING,
+ tried to change grant table version from %d to %d, but 
some grant entries still in use\n,
+ gt-gt_version, op.version);
+res = -EBUSY;
+goto out_unlock;
 }
 }
 
@@ -2592,9 +2576,6 @@ __gnttab_swap_grant_ref(grant_ref_t ref_
 
 spin_lock(gt-lock);
 
-if ( gt-gt_version == 0 )
-PIN_FAIL(out, GNTST_general_error, grant table not yet set up\n);
-
 /* Bounds check on the grant refs */
 if ( unlikely(ref_a = nr_grant_entries(d-grant_table)))
 PIN_FAIL(out, GNTST_bad_gntref, Bad ref-a (%d).\n, ref_a);
@@ -3165,9 +3146,6 @@ static void gnttab_usage_print(struct do
 
 spin_lock(gt-lock);
 
-if ( gt-gt_version == 0 )
-goto out;
-
 for ( ref = 0; ref != nr_grant_entries(gt); ref++ )
 {
 struct active_grant_entry *act;
@@ -3211,7 +3189,6 @@ static void gnttab_usage_print(struct do
sha-domid, frame, status);
 }
 
- out:
 spin_unlock(gt-lock);
 
 if ( first )


gnttab: eliminate several explicit version checks

By having nr_grant_entries() return zero when the grant table version
is still unset we can reduce the number of error paths and at once fix
grant_map_exists() running into the being removed ASSERT() when called
for a page owned by a domain not having its grant table set up yet.

Signed-off-by: Jan Beulich jbeul...@suse.com
---
Said ASSERT() was pointless in the grant_map_exists() case, since the
function only looks at active entries, which are version independent.
Hence this wasn't a security issue.

--- a/xen/common/grant_table.c
+++ b/xen/common/grant_table.c
@@ -319,11 +319,15 @@ get_maptrack_handle(
 /* Number of grant table entries. Caller must hold 

  1   2   >