date:20150811

Re: [Xen-devel] [PATCH V6 2/7] libxl_read_file_contents: add new entry to read sysfs file

2015-08-11 Thread Chun Yan Liu



 On 8/11/2015 at 07:26 PM, in message
2015082655.ge7...@zion.uk.xensource.com, Wei Liu wei.l...@citrix.com
wrote: 
 On Mon, Aug 10, 2015 at 06:35:23PM +0800, Chunyan Liu wrote: 
  Sysfs file has size=4096 but actual file content is less than that. 
  Current libxl_read_file_contents will treat it as error when file size 
  and actual file content differs, so reading sysfs file content with 
  this function always fails. 
   
  Add a new entry libxl_read_sysfs_file_contents to handle sysfs file 
  specially. It would be used in later pvusb work. 
   
  Signed-off-by: Chunyan Liu cy...@suse.com 
   
  --- 
  Changes: 
- read one more byte to check bigger size problem. 
   
   tools/libxl/libxl_internal.h |  2 ++ 
   tools/libxl/libxl_utils.c| 51 
  ++-- 
   2 files changed, 42 insertions(+), 11 deletions(-) 
   
  diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h 
  index 6013628..f98f089 100644 
  --- a/tools/libxl/libxl_internal.h 
  +++ b/tools/libxl/libxl_internal.h 
  @@ -4001,6 +4001,8 @@ void libxl__bitmap_copy_best_effort(libxl__gc *gc,  
 libxl_bitmap *dptr, 

   int libxl__count_physical_sockets(libxl__gc *gc, int *sockets); 
   #endif 
  +_hidden int libxl_read_sysfs_file_contents(libxl_ctx *ctx, const char  
 *filename, 
  +   void **data_r, int *datalen_r); 
  
 Indentation looks wrong. 
  

   /* 
* Local variables: 
  diff --git a/tools/libxl/libxl_utils.c b/tools/libxl/libxl_utils.c 
  index bfc9699..9234efb 100644 
  --- a/tools/libxl/libxl_utils.c 
  +++ b/tools/libxl/libxl_utils.c 
  @@ -322,8 +322,10 @@ out: 
   return rc; 
   } 

  -int libxl_read_file_contents(libxl_ctx *ctx, const char *filename, 
  - void **data_r, int *datalen_r) { 
  +static int libxl_read_file_contents_core(libxl_ctx *ctx, const char  
 *filename, 
  + void **data_r, int *datalen_r, 
  + bool tolerate_shrinking_file) 
  +{ 
   GC_INIT(ctx); 
   FILE *f = 0; 
   uint8_t *data = 0; 
  @@ -359,20 +361,34 @@ int libxl_read_file_contents(libxl_ctx *ctx, const  
 char *filename, 
   datalen = stab.st_size; 

   if (stab.st_size  data_r) { 
  -data = malloc(datalen); 
  +data = malloc(datalen + 1); 
   if (!data) goto xe; 

  -rs = fread(data, 1, datalen, f); 
  -if (rs != datalen) { 
  -if (ferror(f)) 
  +rs = fread(data, 1, datalen + 1, f); 
  +if (rs  datalen) { 
  +LOG(ERROR, %s increased size while we were reading it, 
  +filename); 
  +goto xe; 
  +} 
  + 
  +if (rs  datalen) { 
  +if (ferror(f)) { 
   LOGE(ERROR, failed to read %s, filename); 
  -else if (feof(f)) 
  -LOG(ERROR, %s changed size while we were reading it, 
  -   filename); 
  -else 
  +goto xe; 
  +} else if (feof(f)) { 
  +if (tolerate_shrinking_file) { 
  +datalen = rs; 
  +} else { 
  +LOG(ERROR, %s shrunk size while we were reading it, 
  +filename); 
  +goto xe; 
  +} 
  +} else { 
   abort(); 
  -goto xe; 
  +} 
  
 This is a bit bikeshedding, but you can leave goto xe out of two `if' 
 to reduce patch size. 

I guess you mean if (ferror(f)) and if (feof(f)) ? We can't leave 'goto xe' 
outside,
since in if (feof(f))  if (tolerate_shrinking_file), it's not error but an 
expected
result in sysfs case.   

   } 
  + 
  +data = realloc(data, datalen); 
  
 Should check return value of realloc.

Will add a check:
if (!data) goto xe; 

Thanks,
Chunyan
  
 The logic of this function reflects what has been discussed so far. 
  
 Wei. 
  
   } 

   if (fclose(f)) { 
  @@ -396,6 +412,19 @@ int libxl_read_file_contents(libxl_ctx *ctx, const 
  char  
 *filename, 
   return e; 
   } 

  +int libxl_read_file_contents(libxl_ctx *ctx, const char *filename, 
  + void **data_r, int *datalen_r) 
  +{ 
  +return libxl_read_file_contents_core(ctx, filename, data_r, datalen_r, 
   
 0); 
  +} 
  + 
  +int libxl_read_sysfs_file_contents(libxl_ctx *ctx, const char *filename, 
  +   void **data_r, int *datalen_r) 
  +{ 
  +return libxl_read_file_contents_core(ctx, filename, data_r, datalen_r, 
   
 1); 
  +} 
  + 
  + 
   #define READ_WRITE_EXACTLY(rw, zero_is_eof, constdata) 
 \ 
  
 \ 
 int libxl_##rw##_exactly(libxl_ctx *ctx, int fd, \ 
  --  
  2.1.4

Re: [Xen-devel] Design doc of adding ACPI support for arm64 on Xen

2015-08-11 Thread Shannon Zhao



On 2015/8/12 4:43, Konrad Rzeszutek Wilk wrote:
 On Wed, Aug 05, 2015 at 09:03:06PM +0800, Shannon Zhao wrote:


 On 2015/8/5 20:48, Julien Grall wrote:
 On 05/08/15 12:49, Shannon Zhao wrote:
 That's great!
 Keep in mind that many ARM platforms have non-PCI busses, so I think
 we'll need an amba and a platform bus_notifier too, in addition to the
 existing pci bus notifier.


 Thanks for your reminding. I thought about amba. Since ACPI of current
 linux kernel doesn't support probe amba bus devices, so this
 bus_notifier will not be used at the moment. But there are some voice
 that we need to make ACPI support amba on the linux arm kernel mail
 list. And to me it doesn't matter to add the amba bus_notifier.
 This comment raised one question. What happen if the hardware has MMIO
 region not described in the ACPI?

 This sounds weird. If a device is described in ACPI table, it will not
 describe the MMIO region which the driver will use? Does this situation
 exist?

 If the hardware has mmio region not described in the ACPI, how does the
 driver know the region and use it?
 
 On the x86 world we would query the PCI configuration registers and read
 the device BAR registers. Those would contain the MMIO regions the
 device uses.
 
 But x86 is funny and you do say 'many .. ARM .. have non-PCI buses' - which
 would imply you have not hit this yet.
 
 Are PCI devices interrogated differently on ARM? No configuration registers?


For PCI devices, on ARM it will reuse the existing bus_notifier
xen_pci_notifier to call hypercall to map mmio regions. And other
operates are same with X86.

Thanks,

-- 
Shannon


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] Design doc of adding ACPI support for arm64 on Xen - version 2

2015-08-11 Thread Shannon Zhao

Hi Julien,

On 2015/8/12 0:19, Julien Grall wrote:
 Hi Shannon,
 
 On 07/08/15 03:11, Shannon Zhao wrote:
 2. Create minimal DT to pass required information to Dom0
 --
 The minimal DT mainly passes Dom0 bootargs, address and size of initrd
 (if available), address and size of uefi system table, address and size
 of uefi memory table, uefi-mmap-desc-size and uefi-mmap-desc-ver.

 An example of the minimal DT:
 / {
 #address-cells = 2;
 #size-cells = 1;
 chosen {
 bootargs = kernel=Image console=hvc0 earlycon=pl011,0x1c09
 root=/dev/vda2 rw rootfstype=ext4 init=/bin/sh acpi=force;
 linux,initrd-start = 0x;
 linux,initrd-end = 0x;
 linux,uefi-system-table = 0x;
 linux,uefi-mmap-start = 0x;
 linux,uefi-mmap-size = 0x;
 linux,uefi-mmap-desc-size = 0x;
 linux,uefi-mmap-desc-ver = 0x;
 };
 };

 For details loook at
 https://github.com/torvalds/linux/blob/master/Documentation/arm/uefi.txt
 
 AFAICT, the device tree properties in this documentation are only used
 in order to communicate between the UEFI stub and Linux.
 
 This means that those properties are not standardize and can change at
 any time by Linux folks. They don't even live in Documentation/devicetree/
 
 I would also expect to see the same needs for FreeBSD running as DOM0
 with ACPI.

I'm not very clear about how FreeBSD communicates with UEFI. And when
booting with DT, how does FreeBSD communicate with UEFI? Not through
these properties?

 So it looks like to me that a generic name would be better for all those
 properties.
 
If we change these name, it needs change some functions in Linux. Will
it impact the use of Linux with UEFI not on Xen?

Thanks,
-- 
Shannon


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH for-4.6] libxl: fix libxl__build_hvm error code return path

2015-08-11 Thread Wei Liu

On Fri, Aug 07, 2015 at 06:08:25PM +0200, Roger Pau Monne wrote:
 This is a simple fix to make sure libxl__build_hvm returns an error code in
 case of failure.

 Signed-off-by: Roger Pau Monné roger@citrix.com
 Cc: Ian Jackson ian.jack...@eu.citrix.com
 Cc: Stefano Stabellini stefano.stabell...@eu.citrix.com
 Cc: Ian Campbell ian.campb...@citrix.com
 Cc: Wei Liu wei.l...@citrix.com

Acked-by: Wei Liu wei.l...@citrix.com

Though I would like to make commit message clearer.

In 25652f23 (tools/libxl: detect and avoid conflicts with RDM), new
code was added to use rc to store libxl function call return value,
which complied to libxl coding style. That patch, however, didn't change
other locations where return value was stored in ret. In the end
libxl__build_hvm could return 0 when it failed.

Explicitly set rc to ERROR_FAIL in error path to fix this. A more
comprehensive fix would be changing all ret to rc, which should be done
when next development window opens.

 ---
 I would rather prefer to have it fixed in a proper way like it's done in my
 libxl: fix libxl__build_hvm error handling as part of the HVMlite series,
 but I understand that given the current status of the tree and the
 willingness to backport this to stable branches the other approach is going
 to be much harder.
 ---
  tools/libxl/libxl_dom.c | 1 +
  1 file changed, 1 insertion(+)

 diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
 index e1f11a3..668ce11 100644
 --- a/tools/libxl/libxl_dom.c
 +++ b/tools/libxl/libxl_dom.c
 @@ -1019,6 +1019,7 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,

  return 0;
  out:
 +rc = ERROR_FAIL;
  return rc;
  }

 -- 
 1.9.5 (Apple Git-50.3)

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [distros-debian-squeeze test] 37818: all pass

2015-08-11 Thread Platform Team regression test user

flight 37818 distros-debian-squeeze real [real]
http://osstest.xs.citrite.net/~osstest/testlogs/logs/37818/

Perfect :-)
All tests in this flight passed
baseline version:
 flight   37776

jobs:
 build-amd64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-pvopspass
 build-armhf-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-amd64-squeeze-netboot-pygrubpass
 test-amd64-i386-amd64-squeeze-netboot-pygrub pass
 test-amd64-amd64-i386-squeeze-netboot-pygrub pass
 test-amd64-i386-i386-squeeze-netboot-pygrub  pass



sg-report-flight on osstest.xs.citrite.net
logs: /home/osstest/logs
images: /home/osstest/images

Logs, config files, etc. are available at
http://osstest.xs.citrite.net/~osstest/testlogs/logs

Test harness code can be found at
http://xenbits.xensource.com/gitweb?p=osstest.git;a=summary


Push not applicable.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH V3 5/6] x86/xsaves: support compact format for hvm save/restore

2015-08-11 Thread Andrew Cooper

On 11/08/15 09:01, Shuai Ruan wrote:

 +
 +/*
 + * The FP xstates and SSE xstates are legacy states. They are always
 + * in the fixed offsets in the xsave area in either compacted form
 + * or standard form.
 + */
 +xstate_comp_offsets[0] = 0;
 +xstate_comp_offsets[1] = XSAVE_SSE_OFFSET;
 +
 +xstate_comp_offsets[2] = FXSAVE_SIZE + XSAVE_HDR_SIZE;
 +
 +for (i = 2; i  xstate_features; i++)
 This loop will run off the end of xstate_comp_sizes[] for any processor
 supporting AVX512 or greater.

 For the length of xsate_comp_sizes is 64, I think the case you mentioned
 above will not happen.

xstate_features is a bitmap.  The comparison i  xstate_features is
bogus, and loops many more times than you intend.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH RFC v2 0/5] Multi-queue support for xen-blkfront and xen-blkback

2015-08-11 Thread Bob Liu


On 08/10/2015 11:52 PM, Jens Axboe wrote:
 On 08/10/2015 05:03 AM, Rafal Mielniczuk wrote:
 On 01/07/15 04:03, Jens Axboe wrote:
 On 06/30/2015 08:21 AM, Marcus Granado wrote:
 Hi,

 Our measurements for the multiqueue patch indicate a clear improvement
 in iops when more queues are used.

 The measurements were obtained under the following conditions:

 - using blkback as the dom0 backend with the multiqueue patch applied to
 a dom0 kernel 4.0 on 8 vcpus.

 - using a recent Ubuntu 15.04 kernel 3.19 with multiqueue frontend
 applied to be used as a guest on 4 vcpus

 - using a micron RealSSD P320h as the underlying local storage on a Dell
 PowerEdge R720 with 2 Xeon E5-2643 v2 cpus.

 - fio 2.2.7-22-g36870 as the generator of synthetic loads in the guest.
 We used direct_io to skip caching in the guest and ran fio for 60s
 reading a number of block sizes ranging from 512 bytes to 4MiB. Queue
 depth of 32 for each queue was used to saturate individual vcpus in the
 guest.

 We were interested in observing storage iops for different values of
 block sizes. Our expectation was that iops would improve when increasing
 the number of queues, because both the guest and dom0 would be able to
 make use of more vcpus to handle these requests.

 These are the results (as aggregate iops for all the fio threads) that
 we got for the conditions above with sequential reads:

 fio_threads  io_depth  block_size   1-queue_iops  8-queue_iops
   8   32   512   158K 264K
   8   321K   157K 260K
   8   322K   157K 258K
   8   324K   148K 257K
   8   328K   124K 207K
   8   32   16K84K 105K
   8   32   32K50K  54K
   8   32   64K24K  27K
   8   32  128K11K  13K

 8-queue iops was better than single queue iops for all the block sizes.
 There were very good improvements as well for sequential writes with
 block size 4K (from 80K iops with single queue to 230K iops with 8
 queues), and no regressions were visible in any measurement performed.
 Great results! And I don't know why this code has lingered for so long,
 so thanks for helping get some attention to this again.

 Personally I'd be really interested in the results for the same set of
 tests, but without the blk-mq patches. Do you have them, or could you
 potentially run them?

 Hello,

 We rerun the tests for sequential reads with the identical settings but with 
 Bob Liu's multiqueue patches reverted from dom0 and guest kernels.
 The results we obtained were *better* than the results we got with 
 multiqueue patches applied:

 fio_threads  io_depth  block_size   1-queue_iops  8-queue_iops  
 *no-mq-patches_iops*
   8   32   512   158K 264K 321K
   8   321K   157K 260K 328K
   8   322K   157K 258K 336K
   8   324K   148K 257K 308K
   8   328K   124K 207K 188K
   8   32   16K84K 105K 82K
   8   32   32K50K  54K 36K
   8   32   64K24K  27K 16K
   8   32  128K11K  13K 11K

 We noticed that the requests are not merged by the guest when the multiqueue 
 patches are applied,
 which results in a regression for small block sizes (RealSSD P320h's optimal 
 block size is around 32-64KB).

 We observed similar regression for the Dell MZ-5EA1000-0D3 100 GB 2.5 
 Internal SSD

 As I understand blk-mq layer bypasses I/O scheduler which also effectively 
 disables merges.
 Could you explain why it is difficult to enable merging in the blk-mq layer?
 That could help closing the performance gap we observed.

 Otherwise, the tests shows that the multiqueue patches does not improve the 
 performance,
 at least when it comes to sequential read/writes operations.
 
 blk-mq still provides merging, there should be no difference there. Does the 
 xen patches set BLK_MQ_F_SHOULD_MERGE?
 

Yes.
Is it possible that xen-blkfront driver dequeue requests too fast after we have 
multiple hardware queues?
Because new requests don't have the chance merging with old requests which were 
already dequeued and issued.

-- 
Regards,
-Bob

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 1/2] Differentiate IO/mem resources tracked by ioreq server

2015-08-11 Thread Yu, Zhang

On 8/10/2015 6:57 PM, Paul Durrant wrote:

-Original Message-
From: Andrew Cooper [mailto:andrew.coop...@citrix.com]
Sent: 10 August 2015 11:56
To: Paul Durrant; Wei Liu; Yu Zhang
Cc: xen-devel@lists.xen.org; Ian Jackson; Stefano Stabellini; Ian Campbell;
Keir (Xen.org); jbeul...@suse.com; Kevin Tian; zhiyuan...@intel.com
Subject: Re: [PATCH v3 1/2] Differentiate IO/mem resources tracked by ioreq
server

On 10/08/15 09:33, Paul Durrant wrote:

-Original Message-
From: Wei Liu [mailto:wei.l...@citrix.com]
Sent: 10 August 2015 09:26
To: Yu Zhang
Cc: xen-devel@lists.xen.org; Paul Durrant; Ian Jackson; Stefano Stabellini;

Ian

Campbell; Wei Liu; Keir (Xen.org); jbeul...@suse.com; Andrew Cooper;
Kevin Tian; zhiyuan...@intel.com
Subject: Re: [PATCH v3 1/2] Differentiate IO/mem resources tracked by

ioreq

server

On Mon, Aug 10, 2015 at 11:33:40AM +0800, Yu Zhang wrote:

Currently in ioreq server, guest write-protected ram pages are
tracked in the same rangeset with device mmio resources. Yet
unlike device mmio, which can be in big chunks, the guest write-
protected pages may be discrete ranges with 4K bytes each.

This patch uses a seperate rangeset for the guest ram pages.
And a new ioreq type, IOREQ_TYPE_MEM, is defined.

Note: Previously, a new hypercall or subop was suggested to map
write-protected pages into ioreq server. However, it turned out
handler of this new hypercall would be almost the same with the
existing pair - HVMOP_[un]map_io_range_to_ioreq_server, and there's
already a type parameter in this hypercall. So no new hypercall
defined, only a new type is introduced.

Signed-off-by: Yu Zhang yu.c.zh...@linux.intel.com
---
  tools/libxc/include/xenctrl.h| 39 +++---
  tools/libxc/xc_domain.c  | 59

++--

FWIW the hypercall wrappers look correct to me.

diff --git a/xen/include/public/hvm/hvm_op.h

b/xen/include/public/hvm/hvm_op.h

index 014546a..9106cb9 100644
--- a/xen/include/public/hvm/hvm_op.h
+++ b/xen/include/public/hvm/hvm_op.h
@@ -329,8 +329,9 @@ struct xen_hvm_io_range {
  ioservid_t id;   /* IN - server id */
  uint32_t type;   /* IN - type of range */
  # define HVMOP_IO_RANGE_PORT   0 /* I/O port range */
-# define HVMOP_IO_RANGE_MEMORY 1 /* MMIO range */
+# define HVMOP_IO_RANGE_MMIO   1 /* MMIO range */
  # define HVMOP_IO_RANGE_PCI2 /* PCI segment/bus/dev/func

range

*/

+# define HVMOP_IO_RANGE_MEMORY 3 /* MEMORY range */

This looks problematic. Maybe you can get away with this because this is
a toolstack-only interface?

Indeed, the old name is a bit problematic. Presumably re-use like this

would require an interface version change and some if-defery.

I assume it is an interface used by qemu, so this patch in its currently
state will break things.

If QEMU were re-built against the updated header, yes.

Thank you, Andrew  Paul. :)
Are you referring to the xen_map/unmap_memory_section routines in QEMU?
I noticed they are called by xen_region_add/del in QEMU. And I wonder,
are these 2 routines used to track a memory region or to track a MMIO
region? If the region to be added is a MMIO, I guess the new interface
should be fine, but if it is memory region to be added into ioreq
server, maybe a patch in QEMU is necessary(e.g. use some if-defery for
this new interface version you suggested)?

Thanks
Yu

   Paul

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] xen/xenbus: Don't leak memory when unmapping the ring on HVM backend

2015-08-11 Thread David Vrabel

On 10/08/15 19:10, Julien Grall wrote:
 The commit ccc9d90a9a8b5c4ad7e9708ec41f75ff9e98d61d xenbus_client:
 Extend interface to support multi-page ring removes the call to
 free_xenballooned_pages in xenbus_unmap_ring_vfree_hvm.
 
 This will result to not give back the pages to Linux and loose them
 forever. It only happens when the backends are running in HVM domains.

Applied to for-linus-4.2 and tagged for stable, thanks.

David

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH V3 1/6] x86/xsaves: enable xsaves/xrstors for pv guest

2015-08-11 Thread Shuai Ruan

On Fri, Aug 07, 2015 at 01:44:41PM +0100, Andrew Cooper wrote:
 On 07/08/15 09:00, Shuai Ruan wrote:
 
  +goto skip;
  +}
  +
  +if ( !guest_kernel_mode(v, regs) || (regs-edi  0x3f) )
  What does edi have to do with xsaves?  only edx:eax are special
  according to the manual.
 
  regs-edi is the guest_linear_address
 
 Whyso?  xsaves takes an unconditional memory parameter,  not a pointer
 in %rdi.  (regs-edi is only correct for ins/outs because the pointer is
 architecturally required to be in %rdi.)
You are right. The linear_address should be decoded from the instruction.
 
 There is nothing currently in emulate_privileged_op() which does ModRM
 decoding for memory references, nor SIB decoding.  xsaves/xrstors would
 be the first such operations.
 
 I am also not sure that adding arbitrary memory decode here is sensible.
 
 In an ideal world, we would have what is currently x86_emulate() split
 in 3 stages.
 
 Stage 1 does straight instruction decode to some internal representation.
 
 Stage 2 does an audit to see whether the decoded instruction is
 plausible for the reason why an emulation was needed.  We have had a
 number of security issues with emulation in the past where guests cause
 one instruction to trap for emulation, then rewrite the instruction to
 be something else, and exploit a bug in the emulator.
 
 Stage 3 performs the actions required for emulation.
 
 Currently, x86_emulate() is limited to instructions which might
 legitimately fault for emulation, but with the advent of VM
 introspection, this is proving to be insufficient.  With my x86
 maintainers hat on, I would like to avoid the current situation we have
 with multiple bits of code doing x86 instruction decode and emulation
 (which are all different).
 
 I think the 3-step approach above caters suitably to all usecases, but
 it is a large project itself.  It allows the introspection people to
 have a full and complete x86 emulation infrastructure, while also
 preventing areas like the shadow paging from being opened up to
 potential vulnerabilities in unrelated areas of the x86 architecture.
 
 I would even go so far as to say that it is probably ok not to support
 xsaves/xrestors in PV guests until something along the above lines is
 sorted.  The first feature in XSS is processor trace which a PV guest
 couldn't use anyway.  I suspect the same applies to most of the other
Why PV guest couldn't use precessor trace?
 XSS features, or they wouldn't need to be privileged in the first place.
 
Thanks for your such detail suggestions.
For xsaves/xrstors would also bring other benefits for PV guest such as
saving memory of XSAVE area. If we do not support xsaves/xrstors in PV , 
PV guest would lose these benefits. What's your opinions toward this?
 
  +
  +if ( !cpu_has_xsaves || !(v-arch.pv_vcpu.ctrlreg[4] 
  +  X86_CR4_OSXSAVE))
  +{
  +do_guest_trap(TRAP_invalid_op, regs, 0);
  +goto skip;
  +}
  +
  +if ( v-arch.pv_vcpu.ctrlreg[0]  X86_CR0_TS )
  +{
  +do_guest_trap(TRAP_nmi, regs, 0);
  +goto skip;
  +}
  +
  +if ( !guest_kernel_mode(v, regs) || (regs-edi  0x3f) )
  +goto fail;
  +
  +if ( (rc = copy_from_user(guest_xsave_area, (void *) 
  regs-edi,
  +  sizeof(struct xsave_struct))) 
  !=0 )
  +{
  +propagate_page_fault(regs-edi +
  +  sizeof(struct xsave_struct) - rc, 0);
  +goto skip;
  Surely you just need the xstate_bv and xcomp_bv ?
 
  I will dig into SDM to see whether I missing some checkings.
 
 What I mean by this is that xstate_bv and xcomp_bv are all that you are
 checking, so you just need two uint64_t's, rather than a full xsave_struct.
 
Sorry to misunderstand your meaning.
 
   
   default:
  diff --git a/xen/arch/x86/x86_64/mm.c b/xen/arch/x86/x86_64/mm.c
  index 98310f3..de94ac1 100644
  --- a/xen/arch/x86/x86_64/mm.c
  +++ b/xen/arch/x86/x86_64/mm.c
  @@ -48,6 +48,58 @@ l2_pgentry_t __section(.bss.page_aligned) 
  l2_bootmap[L2_PAGETABLE_ENTRIES];
   
   l2_pgentry_t *compat_idle_pg_table_l2;
   
  +unsigned long do_page_walk_mfn(struct vcpu *v, unsigned long addr)
  What is this function?  Why is it useful?  Something like this belongs
  in its own patch along with a description of why it is being introduced.
 
  The fucntion is used for getting the mfn related to guest linear address.
  Is there an another existing function I can use that can do the same
  thing? Can you give me a suggestion.
 
 do_page_walk() and use virt_to_mfn() on the result?  (I am just
 guessing, but
 
 
  +{
  +asm volatile ( .byte 0x48,0x0f,0xc7,0x2f
  +: =m

Re: [Xen-devel] Second regression due to libxl: Remove linux udev rules (2ba368d13893402b2f1fb3c283ddcc714659dd9b)

2015-08-11 Thread Ian Campbell

On Fri, 2015-08-07 at 10:54 -0400, Konrad Rzeszutek Wilk wrote:
 
  I've looked into this, and AFAICT you were probably using the udev 
  rules (you have run_hotplug_scripts=0 in xl.conf?) before 2ba368, and 
 
 Correct. I think I needed that for driver domains and had left it in 
 there.

The intention was that xl devd would be run in the driver domain too, I
added an initscript for that purpose last week (or was it two weeks ago?)
but you could also just arrange for it to happen in /etc/rc or something.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 1/2] Differentiate IO/mem resources tracked by ioreq server

2015-08-11 Thread Paul Durrant

 -Original Message-
 From: Yu, Zhang [mailto:yu.c.zh...@linux.intel.com]
 Sent: 11 August 2015 09:41
 To: Paul Durrant; Andrew Cooper; Wei Liu
 Cc: Kevin Tian; Keir (Xen.org); Ian Campbell; xen-devel@lists.xen.org;
 Stefano Stabellini; zhiyuan...@intel.com; jbeul...@suse.com; Ian Jackson
 Subject: Re: [Xen-devel] [PATCH v3 1/2] Differentiate IO/mem resources
 tracked by ioreq server

 On 8/11/2015 4:25 PM, Paul Durrant wrote:
  -Original Message-
  From: Yu, Zhang [mailto:yu.c.zh...@linux.intel.com]
  Sent: 11 August 2015 08:57
  To: Paul Durrant; Andrew Cooper; Wei Liu
  Cc: xen-devel@lists.xen.org; Ian Jackson; Stefano Stabellini; Ian Campbell;
  Keir (Xen.org); jbeul...@suse.com; Kevin Tian; zhiyuan...@intel.com
  Subject: Re: [PATCH v3 1/2] Differentiate IO/mem resources tracked by
 ioreq
  server

  On 8/10/2015 6:57 PM, Paul Durrant wrote:
  -Original Message-
  From: Andrew Cooper [mailto:andrew.coop...@citrix.com]
  Sent: 10 August 2015 11:56
  To: Paul Durrant; Wei Liu; Yu Zhang
  Cc: xen-devel@lists.xen.org; Ian Jackson; Stefano Stabellini; Ian
 Campbell;
  Keir (Xen.org); jbeul...@suse.com; Kevin Tian; zhiyuan...@intel.com
  Subject: Re: [PATCH v3 1/2] Differentiate IO/mem resources tracked by
  ioreq
  server

  On 10/08/15 09:33, Paul Durrant wrote:
  -Original Message-
  From: Wei Liu [mailto:wei.l...@citrix.com]
  Sent: 10 August 2015 09:26
  To: Yu Zhang
  Cc: xen-devel@lists.xen.org; Paul Durrant; Ian Jackson; Stefano
  Stabellini;
  Ian
  Campbell; Wei Liu; Keir (Xen.org); jbeul...@suse.com; Andrew
 Cooper;
  Kevin Tian; zhiyuan...@intel.com
  Subject: Re: [PATCH v3 1/2] Differentiate IO/mem resources tracked
 by
  ioreq
  server

  On Mon, Aug 10, 2015 at 11:33:40AM +0800, Yu Zhang wrote:
  Currently in ioreq server, guest write-protected ram pages are
  tracked in the same rangeset with device mmio resources. Yet
  unlike device mmio, which can be in big chunks, the guest write-
  protected pages may be discrete ranges with 4K bytes each.

  This patch uses a seperate rangeset for the guest ram pages.
  And a new ioreq type, IOREQ_TYPE_MEM, is defined.

  Note: Previously, a new hypercall or subop was suggested to map
  write-protected pages into ioreq server. However, it turned out
  handler of this new hypercall would be almost the same with the
  existing pair - HVMOP_[un]map_io_range_to_ioreq_server, and
  there's
  already a type parameter in this hypercall. So no new hypercall
  defined, only a new type is introduced.

  Signed-off-by: Yu Zhang yu.c.zh...@linux.intel.com
  ---
 tools/libxc/include/xenctrl.h| 39 +++-
 --
 tools/libxc/xc_domain.c  | 59
  ++--

  FWIW the hypercall wrappers look correct to me.

  diff --git a/xen/include/public/hvm/hvm_op.h
  b/xen/include/public/hvm/hvm_op.h
  index 014546a..9106cb9 100644
  --- a/xen/include/public/hvm/hvm_op.h
  +++ b/xen/include/public/hvm/hvm_op.h
  @@ -329,8 +329,9 @@ struct xen_hvm_io_range {
 ioservid_t id;   /* IN - server id */
 uint32_t type;   /* IN - type of range */
 # define HVMOP_IO_RANGE_PORT   0 /* I/O port range */
  -# define HVMOP_IO_RANGE_MEMORY 1 /* MMIO range */
  +# define HVMOP_IO_RANGE_MMIO   1 /* MMIO range */
 # define HVMOP_IO_RANGE_PCI2 /* PCI
 segment/bus/dev/func
  range
  */
  +# define HVMOP_IO_RANGE_MEMORY 3 /* MEMORY range */
  This looks problematic. Maybe you can get away with this because
 this
  is
  a toolstack-only interface?

  Indeed, the old name is a bit problematic. Presumably re-use like this
  would require an interface version change and some if-defery.

  I assume it is an interface used by qemu, so this patch in its currently
  state will break things.

  If QEMU were re-built against the updated header, yes.

  Thank you, Andrew  Paul. :)
  Are you referring to the xen_map/unmap_memory_section routines in
  QEMU?
  I noticed they are called by xen_region_add/del in QEMU. And I wonder,
  are these 2 routines used to track a memory region or to track a MMIO
  region? If the region to be added is a MMIO, I guess the new interface
  should be fine, but if it is memory region to be added into ioreq
  server, maybe a patch in QEMU is necessary(e.g. use some if-defery for
  this new interface version you suggested)?

  I was forgetting that QEMU uses libxenctrl so your change to
 xc_hvm_map_io_range_to_ioreq_server() means everything will continue
 to work as before. There is still the (admittedly academic) problem of some
 unknown emulator out there that rolls its own hypercalls and blindly updates
 to the new version of hvm_op.h suddenly starting to register memory ranges
 rather than mmio ranges though. I would leave the existing definitions as-is
 and come up with a new name.

 So, how about we keep the  HVMOP_IO_RANGE_MEMORY name for MMIO,
 and use
 a new one, say HVMOP_IO_RANGE_WP_MEM, for

Re: [Xen-devel] About Xen bridged pci devices and suspend/resume for the X10SAE motherboard

2015-08-11 Thread M. Ivanov

On Mon, 2015-08-10 at 10:47 -0400, Konrad Rzeszutek Wilk wrote:
 On Mon, Aug 10, 2015 at 05:14:28PM +0300, M. Ivanov wrote:
  On Mon, 2015-08-10 at 09:58 -0400, Konrad Rzeszutek Wilk wrote:
   On Mon, Aug 10, 2015 at 02:11:38AM +0300, M. Ivanov wrote:
Hello,

excuse me for bothering you, but I've read an old thread on a mailing
list about X10SAE compatibility. 
http://lists.xen.org/archives/html/xen-devel/2014-02/msg02111.html
   
   CC-ing Xen devel.

Currently I own this board and am trying to use it with Xen and be able
to suspend and resume.

But I am getting errors from the USB 3 Renesas controller about parity
in my bios event log, and my system hangs on resume,
so I was wondering if that is connected to the bridge(tundra) you've
mentioned.
   
   Did you update the BIOS to the latest version?
  Will updating to version 3 solve my issue?
  Can you do a suspend/resume on your X10SAE?
 
 It did work at some point. I will find out when I am at home later today.
 
Looking forward to your reply and am really thankful for your time,
so far I've tried changing many of the settings in the bios,
fiddling with Xen's kernel params,
blacklisting the xhci driver, doing a xl detach.

The only thing I haven't done yet is updating the bios,
but Supermicro's support couldn't give me a changelog:

The primary objective for ver3.0 BIOS release is to support Intel
Broadwell CPUs
We do not know if BIOS update will fix the issue you are seeing as we
never tested it with Xen.

I will be very glad if you could share any information regarding this
matter. 

Best regards,
M. Ivanov
   
   
  
 
 



signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] About Xen bridged pci devices and suspend/resume for the X10SAE motherboard

2015-08-11 Thread M. Ivanov

On Mon, 2015-08-10 at 09:58 -0400, Konrad Rzeszutek Wilk wrote:
 On Mon, Aug 10, 2015 at 02:11:38AM +0300, M. Ivanov wrote:
  Hello,
  
  excuse me for bothering you, but I've read an old thread on a mailing
  list about X10SAE compatibility. 
  http://lists.xen.org/archives/html/xen-devel/2014-02/msg02111.html
 
 CC-ing Xen devel.
  
  Currently I own this board and am trying to use it with Xen and be able
  to suspend and resume.
  
  But I am getting errors from the USB 3 Renesas controller about parity
  in my bios event log, and my system hangs on resume,
  so I was wondering if that is connected to the bridge(tundra) you've
  mentioned.
 
 Did you update the BIOS to the latest version?
Will updating to version 3 solve my issue?
Can you do a suspend/resume on your X10SAE?
  
  I will be very glad if you could share any information regarding this
  matter. 
  
  Best regards,
  M. Ivanov
 
 



signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 3/4] x86/pvh: Handle hypercalls for 32b PVH guests

2015-08-11 Thread Jan Beulich

 On 24.07.15 at 20:35, boris.ostrov...@oracle.com wrote:
 On 07/23/2015 10:21 AM, Jan Beulich wrote:
 On 11.07.15 at 00:20, boris.ostrov...@oracle.com wrote:
 Signed-off-by: Boris Ostrovsky boris.ostrov...@oracle.com
 ---
 Changes in v3:
 * Defined compat_mmuext_op(). (XEN_GUEST_HANDLE_PARAM(mmuext_op_compat_t)
is not defined in header files so I used 'void' type.
 How is it not? It's in compat/xen.h (which is a generated header).
 
 compat/xen.h has DEFINE_COMPAT_HANDLE(mmuext_op_compat_t) (which is 
 __compat_handle_mmuext_op_compat_t).
 
 We need XEN_GUEST_HANDLE(mmuext_op_compat_t), which is 
 __guest_handle_mmuext_op_compat_t. And I wasn't sure it's worth 
 explicitly adding it to a header file (like I think what we do for 
 vcpu_runstate_info_compat_t in sched.h);

Hmm, indeed all other compat_..._op()-s use void handles (albeit in
most if not all of the cases their native counterparts do too). So I
guess using void here is fine then, or using COMPAT_HANDLE()
instead. It's not really relevant anyway since COMPAT_CALL()
casts the function pointer to the intended type anyway.

 @@ -4981,7 +5003,7 @@ int hvm_do_hypercall(struct cpu_user_regs *regs)
   return viridian_hypercall(regs);
   
   if ( (eax = NR_hypercalls) ||
 - (is_pvh_domain(currd) ? !pvh_hypercall64_table[eax]
 + (is_pvh_domain(currd) ? !pvh_hypercall32_table[eax]
  : !hvm_hypercall32_table[eax]) )
 ... this will break (as we're assuming 32- and 64-bit tables to be fully
 in sync here; there's still the pending work item of constructing these
 tables so that this has a better chance of not getting broken).
 
 So you prefer to have full check --- explicitly for both 32- and 64-bit, 
 right?

No. Just adding the missing operation to the table will deal with it.
I wouldn't like to see more conditionals to be added to this code
path when we can avoid doing so. What we could do is add a
respective ASSERT() to the 64-bit path, albeit the NULL deref
would be observable as a fault without the ASSERT() too (and
adding one wouldn't help release builds [and their security]).

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] Design doc of adding ACPI support for arm64 on Xen - version 2

2015-08-11 Thread Julien Grall

On 11/08/15 03:09, Shannon Zhao wrote:
 Hi Julien,

Hi Shannon,

 On 2015/8/7 18:33, Julien Grall wrote:
 Hi Shannon,

 Just some clarification questions.

 On 07/08/15 03:11, Shannon Zhao wrote:
 3. Dom0 gets grant table and event channel irq information
 ---
 As said above, we assign the hypervisor_id be XenVMM to tell Dom0 that
 it runs on Xen hypervisor.

 For grant table, add two new HVM_PARAMs: HVM_PARAM_GNTTAB_START_ADDRESS
 and HVM_PARAM_GNTTAB_SIZE.

 For event channel irq, reuse HVM_PARAM_CALLBACK_IRQ and add a new
 delivery type:
 val[63:56] == 3: val[15:8] is flag: val[7:0] is a PPI (ARM and ARM64
 only)

 Can you describe the content of flag?

 
 This needs definition as well. I think it could use the definition of
 xenv table. Bit 0 stands interrupt mode and bit 1 stands interrupt
 polarity. And explain it in the comment of HVM_PARAM_CALLBACK_IRQ.

That would be fine for me.

 When constructing Dom0 in Xen, save these values. Then Dom0 could get
 them through hypercall HVMOP_get_param.

 4. Map MMIO regions
 ---
 Register a bus_notifier for platform and amba bus in Linux. Add a new
 XENMAPSPACE XENMAPSPACE_dev_mmio. Within the register, check if the
 device is newly added, then call hypercall XENMEM_add_to_physmap to map
 the mmio regions.

 5. Route device interrupts to Dom0
 --
 Route all the SPI interrupts to Dom0 before Dom0 booting.

 Not all the SPI will be routed to DOM0. Some are used by Xen and should
 never be used by any guest. I have in mind the UART and SMMU interrupts.

 You will have to find away to skip them nicely. Note that not all the
 IRQs used by Xen are properly registered when we build DOM0 (see the SMMU).

 To uart, we can get the interrupt information from SPCR table and hide
 it from Dom0.

Can you clarify your meaning of hide from DOM0? Did you mean avoid to
route the SPI to DOM0?

 IIUC, currently Xen (as well as Linux) doesn't support use SMMU when
 booting with ACPI. When it supports, it could read the interrupts
 information from IORT table and Hide them from Dom0.

Well for Xen we don't even have ACPI supported upstream ;). For Linux
there is some on-going work. Anyway, this is not important right now.

-- 
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 2/4] x86/compat: Test both PV and PVH guests for compat mode

2015-08-11 Thread Jan Beulich

 On 24.07.15 at 19:54, boris.ostrov...@oracle.com wrote:
 On 07/23/2015 10:07 AM, Jan Beulich wrote:
 Plus - is this in line with what the tools are doing? Aren't they
 assuming !PV = native format context? I.e. don't you need
 to treat differently v-domain == current-domain and its
 opposite? Roger btw. raised a similar question on IRC earlier
 today...
 
 Not sure I understand this. You mean for copying 64-bit guest's info 
 into 32-bit dom0?

Basically yes - tool stack and guest invocations may need to
behave differently.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [RFC 3/4] HVM x86 deprivileged mode: Code for switching into/out of deprivileged mode

2015-08-11 Thread Tim Deegan

At 11:14 +0100 on 10 Aug (1439205273), Andrew Cooper wrote:
 On 10/08/15 10:49, Tim Deegan wrote:
  Hi,
 
  At 17:45 +0100 on 06 Aug (1438883118), Ben Catterall wrote:
  The process to switch into and out of deprivileged mode can be likened to
  setjmp/longjmp.
 
  To enter deprivileged mode, we take a copy of the stack from the guest's
  registers up to the current stack pointer.
  This copy is pretty unfortunate, but I can see that avoiding it will
  be a bit complex.  Could we do something with more stacks?  AFAICS
  there have to be three stacks anyway:
 
   - one to hold the depriv execution context;
   - one to hold the privileged execution context; and
   - one to take interrupts on.
 
  So maybe we could do some fiddling to make Xen take interrupts on a
  different stack while we're depriv'd?
 
 That should happen naturally by virtue of the privilege level change
 involved in taking the interrupt.

Right, and this is why we need a third stack - so interrupts don't
trash the existing priv state on the 'normal' Xen stack.  And so we
either need to copy the priv stack out (and maybe copy it back), or
tell the CPU to use a different stack.

If we had enough headroom, we could try to be clever and tell the CPU
to take interrupts on the priv stack _below_ the existing state.  That
would avoid the first of your problems below.

 * Under this model, PV exception handlers should copy themselves onto
 the privileged execution stack.
 * Currently, the IST handlers  copy themselves onto the primary stack if
 they interrupt guest context.
 * AMD Task Register on vmexit.  (this old gem)

Gah, this thing. :(

Tim.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v4 02/11] x86/intel_pstate: add some calculation related support

2015-08-11 Thread Jan Beulich

 On 27.07.15 at 07:48, wei.w.w...@intel.com wrote:
  +/*
  + * clamp_t - return a value clamped to a given range using a given
  +type
  + * @type: the type of variable to use
  + * @val: current value
  + * @lo: minimum allowable value
  + * @hi: maximum allowable value
  + *
  + * This macro does no typechecking and uses temporary variables of
  +type
  + * 'type' to make all the comparisons.
  + */
  +#define clamp_t(type, val, lo, hi) min_t(type, max_t(type, val, lo),
  +hi)
 
 Shouldn't you also add a type checking variant then (which ought to be used
 instead of the one above wherever possible)?
 
 Hi Jan, I think the max_t() and min_t() have handled the typechecking thing, 
 maybe we do not need to do it again here.  
 If you have a different opinion, how should we do a typechecking here? Is 
 the following what you expected?
 #define clamp_t(type, val, lo, hi)\
 ({ type _val = (val);   \
 type _lo = (lo);  \
 type _hi = (hi);  \
min_t(type, max_t(type, _val, _lo), _hi)
 })

I don't think you understood: I asked for a clamp() to accompany
clamp_t(), just like e.g. max_t() is a less preferred sibling of max().

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] Clarification regarding xen toolstack for booting a pv guest

2015-08-11 Thread Wei Liu

On Tue, Aug 11, 2015 at 01:51:10AM +0100, Wei Liu wrote:
 On Mon, Aug 10, 2015 at 05:00:51PM -0700, sainath grandhi wrote:
  Hello all,
  
  
  
  I was measuring amount of time taken on host by the Xen toolstack
  while launching a PV guest.
  
  I notice that there is around 2-3 seconds of time spent on dom0 by
  toolstack before guest starts executing. Significant amount of time is
  taken in the function xc_dom_boot_mem_init, around 2 seconds for a
  guest memory of 2 GB
  
  This code does allocate guest memory and mappings and amount of time
  this function takes increases proportionally with increasing requested
  guest memory in the guest config file.
  
  Has anyone noticed similar thing? Is ~2 seconds of wall clock time
  reasonable for guest memory mapping?
 
 I guess it is because Dom0 has to balloon down to free up memory for
 guest.
 
 What is your Xen command line? Have you tried putting
 dom0_mem=512M,max:512M there?
 

And I forgot to mention there is a patch to make PV guest creation
faster by using superpage and batching.

See 415b58c1 and 826ca36fa3 in xen-unstable tree.

Wei.

 Wei.
 
  
  Thanks
  
  ___
  Xen-devel mailing list
  Xen-devel@lists.xen.org
  http://lists.xen.org/xen-devel

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH V3 3/6] x86/xsaves: enable xsaves/xrstors for hvm guest

2015-08-11 Thread Shuai Ruan

On Fri, Aug 07, 2015 at 02:04:51PM +0100, Andrew Cooper wrote:
 On 07/08/15 09:22, Shuai Ruan wrote:
 
   void hvm_cpuid(unsigned int input, unsigned int *eax, unsigned int *ebx,
  unsigned int *ecx, unsigned int *edx)
   {
  @@ -4456,6 +4460,34 @@ void hvm_cpuid(unsigned int input, unsigned int 
  *eax, unsigned int *ebx,
   *ebx = _eax + _ebx;
   }
   }
  +if ( count == 1 )
  +{
  +if ( cpu_has_xsaves )
  +{
  +*ebx = XSTATE_AREA_MIN_SIZE;
  +if ( v-arch.xcr0 | v-arch.msr_ia32_xss )
  +for ( sub_leaf = 2; sub_leaf  63; sub_leaf++ )
  +{
  +if ( !((v-arch.xcr0 | v-arch.msr_ia32_xss)
  + (1ULL  sub_leaf)) )
  +continue;
  +domain_cpuid(d, input, sub_leaf, _eax, _ebx, 
  _ecx,
  + _edx);
  +*ebx =  *ebx + _eax;
  +}
  +}
  +else
  +{
  +*eax = ~XSAVES;
  +*ebx = *ecx = *edx = 0;
  +}
  +if ( !cpu_has_xgetbv1 )
  +*eax = ~XGETBV1;
  +if ( !cpu_has_xsavec )
  +*eax = ~XSAVEC;
  +if ( !cpu_has_xsaveopt )
  +*eax = ~XSAVEOPT;
  +}
  Urgh - I really need to get domain cpuid fixed in Xen.  This is
  currently making a very bad situation a little worse.
 
  In patch 4, I expose the xsaves/xsavec/xsaveopt and need to check
  whether the hardware supoort it. What's your suggestion about this?
 
 Calling into domain_cpuid() in the loop is not useful as nothing will
 set the subleaves up.  As a first pass, reading from
 xstate_{offsets,sizes} will be better than nothing, as it will at least
What do you mean by xstate_{offsets,sizes}?
 match reality until the domain is migrated.
 
For CPUID(eax=0dh) with subleaf 1, the value of ebx will change
according to the v-arch.xcr0 | v-arch.msr_ia32_xss. So add
code in hvm_cpuid function is the best way I can think of. Your
suggestions :)?
 Longterm, I plan to overhaul the cpuid infrastructure to allow it to
 properly represent per-core and per-package data, as well as move it
 into the Xen architectural migration state, to avoid any host specific
 values leaking into guest state.  This however is also a lot of work,
 which you don't want to dependent on.
 
 
   static int construct_vmcs(struct vcpu *v)
   {
   struct domain *d = v-domain;
  @@ -1204,6 +1206,9 @@ static int construct_vmcs(struct vcpu *v)
   __vmwrite(GUEST_PAT, guest_pat);
   }
   
  +if ( cpu_has_vmx_xsaves )
  +__vmwrite(XSS_EXIT_BITMAP, VMX_XSS_EXIT_BITMAP);
  +
   vmx_vmcs_exit(v);
   
   /* PVH: paging mode is updated by arch_set_info_guest(). */
  diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
  index d3183a8..64ff63b 100644
  --- a/xen/arch/x86/hvm/vmx/vmx.c
  +++ b/xen/arch/x86/hvm/vmx/vmx.c
  @@ -2708,6 +2708,16 @@ static int vmx_handle_apic_write(void)
   return vlapic_apicv_write(current, exit_qualification  0xfff);
   }
   
  +static void vmx_handle_xsaves(void)
  +{
  +WARN();
  +}
  +
  +static void vmx_handle_xrstors(void)
  +{
  +WARN();
  +}
  +
  What is these supposed to do?  They are not an appropriate handlers.
 
  These two handlers do nothing here. Perform xsaves in HVM guest will 
  not trap in hypersior in this patch (by setting XSS_EXIT_BITMAP zero). 
  However it may trap in the future. See SDM Volume 3 Section 25.1.3 
  for detail information.
 
 in which case use domain_crash().  WARN() here will allow a guest to DoS
 Xen.
I will change this in next version.
 
 ~Andrew
 
Thanks for your review ,Andrew.
 ___
 Xen-devel mailing list
 Xen-devel@lists.xen.org
 http://lists.xen.org/xen-devel
 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH V3 3/6] x86/xsaves: enable xsaves/xrstors for hvm guest

2015-08-11 Thread Andrew Cooper

On 11/08/15 08:59, Shuai Ruan wrote:
 On Fri, Aug 07, 2015 at 02:04:51PM +0100, Andrew Cooper wrote:
 On 07/08/15 09:22, Shuai Ruan wrote:
  void hvm_cpuid(unsigned int input, unsigned int *eax, unsigned int *ebx,
 unsigned int *ecx, unsigned int *edx)
  {
 @@ -4456,6 +4460,34 @@ void hvm_cpuid(unsigned int input, unsigned int 
 *eax, unsigned int *ebx,
  *ebx = _eax + _ebx;
  }
  }
 +if ( count == 1 )
 +{
 +if ( cpu_has_xsaves )
 +{
 +*ebx = XSTATE_AREA_MIN_SIZE;
 +if ( v-arch.xcr0 | v-arch.msr_ia32_xss )
 +for ( sub_leaf = 2; sub_leaf  63; sub_leaf++ )
 +{
 +if ( !((v-arch.xcr0 | v-arch.msr_ia32_xss)
 + (1ULL  sub_leaf)) )
 +continue;
 +domain_cpuid(d, input, sub_leaf, _eax, _ebx, 
 _ecx,
 + _edx);
 +*ebx =  *ebx + _eax;
 +}
 +}
 +else
 +{
 +*eax = ~XSAVES;
 +*ebx = *ecx = *edx = 0;
 +}
 +if ( !cpu_has_xgetbv1 )
 +*eax = ~XGETBV1;
 +if ( !cpu_has_xsavec )
 +*eax = ~XSAVEC;
 +if ( !cpu_has_xsaveopt )
 +*eax = ~XSAVEOPT;
 +}
 Urgh - I really need to get domain cpuid fixed in Xen.  This is
 currently making a very bad situation a little worse.

 In patch 4, I expose the xsaves/xsavec/xsaveopt and need to check
 whether the hardware supoort it. What's your suggestion about this?
 Calling into domain_cpuid() in the loop is not useful as nothing will
 set the subleaves up.  As a first pass, reading from
 xstate_{offsets,sizes} will be better than nothing, as it will at least
 What do you mean by xstate_{offsets,sizes}?

Shorthand for xstate_offsets xstate_sizes, per the standard shell expansion.

 match reality until the domain is migrated.

 For CPUID(eax=0dh) with subleaf 1, the value of ebx will change
 according to the v-arch.xcr0 | v-arch.msr_ia32_xss. So add
 code in hvm_cpuid function is the best way I can think of. Your
 suggestions :)?

Which is liable to change on different hardware.  Once a vm has
migrated, Xen may not legitimately execute another cpuid instruction as
part of emulating guest cpuid, as it is not necessarily accurate.

Xen currently does not currently have proper cpuid encapsulation, which
causes host-specific details to leak into guests across migrate.  I have
a longterm plan to fix it, but it is not simple or quick to do.

In this case, reading from xstate_{offsets,sizes} is better than
nothing, but will need fixing in the longterm.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [RFC 3/4] HVM x86 deprivileged mode: Code for switching into/out of deprivileged mode

2015-08-11 Thread Ian Campbell

On Thu, 2015-08-06 at 21:55 +0100, Andrew Cooper wrote:
 On 06/08/15 17:45, Ben Catterall wrote:
  The process to switch into and out of deprivileged mode can be likened 
  to
  setjmp/longjmp.
  
  To enter deprivileged mode, we take a copy of the stack from the 
  guest's
  registers up to the current stack pointer. This allows us to restore 
  the stack
  when we have finished the deprivileged mode operation, meaning we can 
  continue
  execution from that point. This is similar to if a context switch had 
  happened.
  
  To exit deprivileged mode, we copy the stack back, replacing the 
  current stack.
  We can then continue execution from where we left off, which will 
  unwind the
  stack and free up resources. This method means that we do not need to
  change any other code paths and its invocation will be transparent to 
  callers.
  This should allow the feature to be more easily deployed to different 
  parts
  of Xen.
  
  Note that this copy of the stack is per-vcpu but, it will contain per
  -pcpu data.
  Extra work is needed to properly migrate vcpus between pcpus.
 
 Under what circumstances do you see there being persistent state in the
 depriv area between calls, given that the calls are synchronous from VM
 actions?

Would we not want to keep (some of) the device model's state in a depriv
area? e.g. anything which is purely internal to the DM which is therefore
only accessed from depriv-land?

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 07/32] xen/x86: fix arch_set_info_guest for HVM guests

2015-08-11 Thread Jan Beulich

 On 04.08.15 at 20:08, andrew.coop...@citrix.com wrote:
 On 03/08/15 18:31, Roger Pau Monné wrote:
 struct vcpu_hvm_x86_16 {
 uint16_t ax;
 uint16_t cx;
 uint16_t dx;
 uint16_t bx;
 uint16_t sp;
 uint16_t bp;
 uint16_t si;
 uint16_t di;
 uint16_t ip;

 uint32_t cr[8];
 
 Definitely no need for anything other than cr0 and 4 in 16 bit mode.
 

 uint32_t cs_base;
 uint32_t ds_base;
 uint32_t ss_base;
 uint32_t cs_limit;
 uint32_t ds_limit;
 uint32_t ss_limit;
 uint16_t cs_ar;
 uint16_t ds_ar;
 uint16_t ss_ar;
 };

 struct vcpu_hvm_x86_32 {
 uint32_t eax;
 uint32_t ecx;
 uint32_t edx;
 uint32_t ebx;
 uint32_t esp;
 uint32_t ebp;
 uint32_t esi;
 uint32_t edi;
 uint32_t eip;

 uint32_t cr[8];
 
 Don't need cr's 5-8.

I disagree with a number of things discussed so far (like the statement
above), but I guess I'll better comment on v4 than continue this thread.

Jan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 1/2] Differentiate IO/mem resources tracked by ioreq server

2015-08-11 Thread Yu, Zhang




On 8/10/2015 4:26 PM, Wei Liu wrote:

On Mon, Aug 10, 2015 at 11:33:40AM +0800, Yu Zhang wrote:

Currently in ioreq server, guest write-protected ram pages are
tracked in the same rangeset with device mmio resources. Yet
unlike device mmio, which can be in big chunks, the guest write-
protected pages may be discrete ranges with 4K bytes each.

This patch uses a seperate rangeset for the guest ram pages.
And a new ioreq type, IOREQ_TYPE_MEM, is defined.

Note: Previously, a new hypercall or subop was suggested to map
write-protected pages into ioreq server. However, it turned out
handler of this new hypercall would be almost the same with the
existing pair - HVMOP_[un]map_io_range_to_ioreq_server, and there's
already a type parameter in this hypercall. So no new hypercall
defined, only a new type is introduced.

Signed-off-by: Yu Zhang yu.c.zh...@linux.intel.com
---
  tools/libxc/include/xenctrl.h| 39 +++---
  tools/libxc/xc_domain.c  | 59 ++--


FWIW the hypercall wrappers look correct to me.


diff --git a/xen/include/public/hvm/hvm_op.h b/xen/include/public/hvm/hvm_op.h
index 014546a..9106cb9 100644
--- a/xen/include/public/hvm/hvm_op.h
+++ b/xen/include/public/hvm/hvm_op.h
@@ -329,8 +329,9 @@ struct xen_hvm_io_range {
  ioservid_t id;   /* IN - server id */
  uint32_t type;   /* IN - type of range */
  # define HVMOP_IO_RANGE_PORT   0 /* I/O port range */
-# define HVMOP_IO_RANGE_MEMORY 1 /* MMIO range */
+# define HVMOP_IO_RANGE_MMIO   1 /* MMIO range */
  # define HVMOP_IO_RANGE_PCI2 /* PCI segment/bus/dev/func range */
+# define HVMOP_IO_RANGE_MEMORY 3 /* MEMORY range */


This looks problematic. Maybe you can get away with this because this is
a toolstack-only interface?


Thanks Wei.
Well, I believe this interface could be used both by the backend device
driver and qemu as well(which I neglected).  :-)

Yu


Wei.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel




___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] how can I find hypercall page address?

2015-08-11 Thread Andrew Cooper

On 11/08/15 03:44, big strong wrote:
 My goal is to intercept hyprcalls to detect malicious calls. So I need
 firstly find where the hypercalls are.

As I have said before, a guest may have an arbitrary number of hypercall
pages.  Furthermore, the hypercall page is merely a convenience; nothing
prevents a guest manually issuing hypercalls.

 My plan is to locate hypercall page first, then walk through the
 hypercall page to get address of hyperccalls. If there is any other
 solutions, please let me know. Thanks very much.

It sounds like you want VM introspection, but it doesn't work like
this.  try http://libvmi.com/ as a starting point.

~Andrew


 2015-08-10 23:04 GMT+08:00 Dario Faggioli dario.faggi...@citrix.com
 mailto:dario.faggi...@citrix.com:

 On Sat, 2015-08-08 at 08:02 +0800, big strong wrote:
  I think I've stated clearly what I want to do.
 
 Well...
 
  |I want to locate the hypercall page address when creating a new
 domU,
  so as to locate hypercalls.
 
 Ok. What for?

 Dario

 --
 This happens because I choose it to happen! (Raistlin Majere)
 -
 Dario Faggioli, Ph.D, http://about.me/dario.faggioli
 Senior Software Engineer, Citrix Systems RD Ltd., Cambridge (UK)



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [xen-unstable test] 60647: tolerable FAIL

2015-08-11 Thread osstest service owner

flight 60647 xen-unstable real [real]
http://logs.test-lab.xenproject.org/osstest/logs/60647/

Failures :-/ but no regressions.

Tests which are failing intermittently (not blocking):
 test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsm 13 guest-localmigrate 
fail pass in 60639
 test-amd64-amd64-pygrub   6 xen-bootfail pass in 60639

Regressions which are regarded as allowable (not blocking):
 test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsm 16 
guest-localmigrate/x10 fail in 60639 like 60624
 test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm 9 debian-hvm-install fail 
like 60639
 test-armhf-armhf-xl-rtds 11 guest-start  fail   like 60639
 test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop  fail like 60639
 test-amd64-amd64-xl-qemut-win7-amd64 17 guest-stop fail like 60639

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-pvh-amd  11 guest-start  fail   never pass
 test-armhf-armhf-xl-vhd   9 debian-di-installfail   never pass
 test-armhf-armhf-libvirt-raw  9 debian-di-installfail   never pass
 test-amd64-amd64-xl-pvh-intel 11 guest-start  fail  never pass
 test-amd64-amd64-libvirt-pair 21 guest-migrate/src_host/dst_host fail never 
pass
 test-armhf-armhf-xl-qcow2 9 debian-di-installfail   never pass
 test-armhf-armhf-libvirt-qcow2  9 debian-di-installfail never pass
 test-amd64-amd64-libvirt 12 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-pair 21 guest-migrate/src_host/dst_host fail never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check 
fail never pass
 test-armhf-armhf-xl-arndale  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  13 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qcow2 11 migrate-support-checkfail never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-raw  11 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-xsm  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-xsm  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  12 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-qcow2 11 migrate-support-checkfail  never pass
 test-amd64-amd64-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-xsm 14 guest-saverestorefail   never pass
 test-armhf-armhf-xl-multivcpu 13 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 12 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-cubietruck 12 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 13 saverestore-support-checkfail never pass
 test-amd64-i386-libvirt-xsm  12 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  12 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-vhd  11 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  13 saverestore-support-checkfail   never pass
 test-amd64-i386-xl-qemut-win7-amd64 17 guest-stop  fail never pass
 test-armhf-armhf-libvirt-vhd  9 debian-di-installfail   never pass
 test-armhf-armhf-xl-raw   9 debian-di-installfail   never pass
 test-amd64-amd64-libvirt-raw 11 migrate-support-checkfail   never pass
 test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stop fail never pass
 test-amd64-amd64-libvirt-vhd 11 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt 14 guest-saverestorefail   never pass
 test-armhf-armhf-libvirt 12 migrate-support-checkfail   never pass

version targeted for testing:
 xen  201eac83831d94ba2e9a63a7eed4c128633fafb1
baseline version:
 xen  201eac83831d94ba2e9a63a7eed4c128633fafb1

Last test of basis60647  2015-08-10 08:59:04 Z0 days
Testing same since0  1970-01-01 00:00:00 Z 16658 days0 attempts

jobs:
 build-amd64-xsm  pass
 build-armhf-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-armhf-libvirt

Re: [Xen-devel] [PATCH v3 1/2] Differentiate IO/mem resources tracked by ioreq server

2015-08-11 Thread Yu, Zhang

On 8/11/2015 4:25 PM, Paul Durrant wrote:

-Original Message-
From: Yu, Zhang [mailto:yu.c.zh...@linux.intel.com]
Sent: 11 August 2015 08:57
To: Paul Durrant; Andrew Cooper; Wei Liu
Cc: xen-devel@lists.xen.org; Ian Jackson; Stefano Stabellini; Ian Campbell;
Keir (Xen.org); jbeul...@suse.com; Kevin Tian; zhiyuan...@intel.com
Subject: Re: [PATCH v3 1/2] Differentiate IO/mem resources tracked by ioreq
server

On 8/10/2015 6:57 PM, Paul Durrant wrote:

-Original Message-
From: Andrew Cooper [mailto:andrew.coop...@citrix.com]
Sent: 10 August 2015 11:56
To: Paul Durrant; Wei Liu; Yu Zhang
Cc: xen-devel@lists.xen.org; Ian Jackson; Stefano Stabellini; Ian Campbell;
Keir (Xen.org); jbeul...@suse.com; Kevin Tian; zhiyuan...@intel.com
Subject: Re: [PATCH v3 1/2] Differentiate IO/mem resources tracked by

ioreq

server

On 10/08/15 09:33, Paul Durrant wrote:

-Original Message-
From: Wei Liu [mailto:wei.l...@citrix.com]
Sent: 10 August 2015 09:26
To: Yu Zhang
Cc: xen-devel@lists.xen.org; Paul Durrant; Ian Jackson; Stefano

Stabellini;

Ian

Campbell; Wei Liu; Keir (Xen.org); jbeul...@suse.com; Andrew Cooper;
Kevin Tian; zhiyuan...@intel.com
Subject: Re: [PATCH v3 1/2] Differentiate IO/mem resources tracked by

ioreq

server

On Mon, Aug 10, 2015 at 11:33:40AM +0800, Yu Zhang wrote:

Currently in ioreq server, guest write-protected ram pages are
tracked in the same rangeset with device mmio resources. Yet
unlike device mmio, which can be in big chunks, the guest write-
protected pages may be discrete ranges with 4K bytes each.

This patch uses a seperate rangeset for the guest ram pages.
And a new ioreq type, IOREQ_TYPE_MEM, is defined.

Note: Previously, a new hypercall or subop was suggested to map
write-protected pages into ioreq server. However, it turned out
handler of this new hypercall would be almost the same with the
existing pair - HVMOP_[un]map_io_range_to_ioreq_server, and

there's

already a type parameter in this hypercall. So no new hypercall
defined, only a new type is introduced.

Signed-off-by: Yu Zhang yu.c.zh...@linux.intel.com
---
   tools/libxc/include/xenctrl.h| 39 +++---
   tools/libxc/xc_domain.c  | 59

++--

FWIW the hypercall wrappers look correct to me.

diff --git a/xen/include/public/hvm/hvm_op.h

b/xen/include/public/hvm/hvm_op.h

index 014546a..9106cb9 100644
--- a/xen/include/public/hvm/hvm_op.h
+++ b/xen/include/public/hvm/hvm_op.h
@@ -329,8 +329,9 @@ struct xen_hvm_io_range {
   ioservid_t id;   /* IN - server id */
   uint32_t type;   /* IN - type of range */
   # define HVMOP_IO_RANGE_PORT   0 /* I/O port range */
-# define HVMOP_IO_RANGE_MEMORY 1 /* MMIO range */
+# define HVMOP_IO_RANGE_MMIO   1 /* MMIO range */
   # define HVMOP_IO_RANGE_PCI2 /* PCI segment/bus/dev/func

range

*/

+# define HVMOP_IO_RANGE_MEMORY 3 /* MEMORY range */

This looks problematic. Maybe you can get away with this because this

is

a toolstack-only interface?

Indeed, the old name is a bit problematic. Presumably re-use like this

would require an interface version change and some if-defery.

I assume it is an interface used by qemu, so this patch in its currently
state will break things.

If QEMU were re-built against the updated header, yes.

Thank you, Andrew  Paul. :)
Are you referring to the xen_map/unmap_memory_section routines in
QEMU?
I noticed they are called by xen_region_add/del in QEMU. And I wonder,
are these 2 routines used to track a memory region or to track a MMIO
region? If the region to be added is a MMIO, I guess the new interface
should be fine, but if it is memory region to be added into ioreq
server, maybe a patch in QEMU is necessary(e.g. use some if-defery for
this new interface version you suggested)?

I was forgetting that QEMU uses libxenctrl so your change to 
xc_hvm_map_io_range_to_ioreq_server() means everything will continue to work as 
before. There is still the (admittedly academic) problem of some unknown 
emulator out there that rolls its own hypercalls and blindly updates to the new 
version of hvm_op.h suddenly starting to register memory ranges rather than 
mmio ranges though. I would leave the existing definitions as-is and come up 
with a new name.

So, how about we keep the  HVMOP_IO_RANGE_MEMORY name for MMIO, and use
a new one, say HVMOP_IO_RANGE_WP_MEM, for write-protected rams only? :)

Thanks
Yu

   Paul

Thanks
Yu

Paul

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v4 for Xen 4.6 1/4] xen: enable per-VCPU parameter settings for RTDS scheduler

2015-08-11 Thread Jan Beulich

 On 09.08.15 at 17:45, lichong...@gmail.com wrote:
 On Mon, Jul 13, 2015 at 3:37 AM, Jan Beulich jbeul...@suse.com wrote:
 On 11.07.15 at 06:52, lichong...@gmail.com wrote:
 @@ -1162,8 +1176,82 @@ rt_dom_cntl(
  }
  spin_unlock_irqrestore(prv-lock, flags);
  break;
 +case XEN_DOMCTL_SCHEDOP_getvcpuinfo:
 +spin_lock_irqsave(prv-lock, flags);
 +for ( index = 0; index  op-u.v.nr_vcpus; index++ )
 +{
 +if ( copy_from_guest_offset(local_sched,
 +  op-u.v.vcpus, index, 1) )
 +{
 +rc = -EFAULT;
 +break;
 +}
 +if ( local_sched.vcpuid = d-max_vcpus ||
 +  d-vcpu[local_sched.vcpuid] == NULL )
 +{
 +rc = -EINVAL;
 +break;
 +}
 +svc = rt_vcpu(d-vcpu[local_sched.vcpuid]);
 +
 +local_sched.s.rtds.budget = svc-budget / MICROSECS(1);
 +local_sched.s.rtds.period = svc-period / MICROSECS(1);
 +
 +if ( __copy_to_guest_offset(op-u.v.vcpus, index,
 +local_sched, 1) )
 +{
 +rc = -EFAULT;
 +break;
 +}
 +if( hypercall_preempt_check() )
 +{
 +rc = -ERESTART;
 +break;
 +}

 I still don't see how this is supposed to work.
 
 I return -ERESTART here, and the upper layer function (do_domctl) will
 handle this error code by calling hypercall_create_continuation.

I have no idea where you found the upper layer (i.e. the
XEN_DOMCTL_scheduler_op case of do_domctl() to take care of
this).

 +} xen_domctl_schedparam_vcpu_t;
 +DEFINE_XEN_GUEST_HANDLE(xen_domctl_schedparam_vcpu_t);
 +
  /* Set or get info? */
  #define XEN_DOMCTL_SCHEDOP_putinfo 0
  #define XEN_DOMCTL_SCHEDOP_getinfo 1
 +#define XEN_DOMCTL_SCHEDOP_putvcpuinfo 2
 +#define XEN_DOMCTL_SCHEDOP_getvcpuinfo 3
  struct xen_domctl_scheduler_op {
  uint32_t sched_id;  /* XEN_SCHEDULER_* */
  uint32_t cmd;   /* XEN_DOMCTL_SCHEDOP_* */
  union {
 -struct xen_domctl_sched_sedf {
 -uint64_aligned_t period;
 -uint64_aligned_t slice;
 -uint64_aligned_t latency;
 -uint32_t extratime;
 -uint32_t weight;
 -} sedf;
 -struct xen_domctl_sched_credit {
 -uint16_t weight;
 -uint16_t cap;
 -} credit;
 -struct xen_domctl_sched_credit2 {
 -uint16_t weight;
 -} credit2;
 -struct xen_domctl_sched_rtds {
 -uint32_t period;
 -uint32_t budget;
 -} rtds;
 +xen_domctl_sched_sedf_t sedf;
 +xen_domctl_sched_credit_t credit;
 +xen_domctl_sched_credit2_t credit2;
 +xen_domctl_sched_rtds_t rtds;
 +struct {
 +XEN_GUEST_HANDLE_64(xen_domctl_schedparam_vcpu_t) vcpus;
 +uint16_t nr_vcpus;
 +} v;

 And there's still no explicit padding here at all (nor am I convinced
 that uint16_t is really a good choice for nr_vcpus - uint32_t would
 seem more natural without causing any problems or structure
 growth).
 
 I think the size of union u is equal to the size of
 xen_domctl_sched_sedf_t, which is 64*4 bits (if vcpus in struct v is
 just a pointer).

Which doesn't in any way address to complaint about missing explicit
padding - I'm not asking you to pad to the size of the union, but to the
size of the unnamed structure you add.

Jan

 The nr_vcpus is indeed better to be uint32_t. I'll change it in the
 next version.
 
 Chong



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 1/2] Differentiate IO/mem resources tracked by ioreq server

2015-08-11 Thread Paul Durrant

 -Original Message-
 From: Yu, Zhang [mailto:yu.c.zh...@linux.intel.com]
 Sent: 11 August 2015 08:57
 To: Paul Durrant; Andrew Cooper; Wei Liu
 Cc: xen-devel@lists.xen.org; Ian Jackson; Stefano Stabellini; Ian Campbell;
 Keir (Xen.org); jbeul...@suse.com; Kevin Tian; zhiyuan...@intel.com
 Subject: Re: [PATCH v3 1/2] Differentiate IO/mem resources tracked by ioreq
 server

 On 8/10/2015 6:57 PM, Paul Durrant wrote:
  -Original Message-
  From: Andrew Cooper [mailto:andrew.coop...@citrix.com]
  Sent: 10 August 2015 11:56
  To: Paul Durrant; Wei Liu; Yu Zhang
  Cc: xen-devel@lists.xen.org; Ian Jackson; Stefano Stabellini; Ian Campbell;
  Keir (Xen.org); jbeul...@suse.com; Kevin Tian; zhiyuan...@intel.com
  Subject: Re: [PATCH v3 1/2] Differentiate IO/mem resources tracked by
 ioreq
  server

  On 10/08/15 09:33, Paul Durrant wrote:
  -Original Message-
  From: Wei Liu [mailto:wei.l...@citrix.com]
  Sent: 10 August 2015 09:26
  To: Yu Zhang
  Cc: xen-devel@lists.xen.org; Paul Durrant; Ian Jackson; Stefano
 Stabellini;
  Ian
  Campbell; Wei Liu; Keir (Xen.org); jbeul...@suse.com; Andrew Cooper;
  Kevin Tian; zhiyuan...@intel.com
  Subject: Re: [PATCH v3 1/2] Differentiate IO/mem resources tracked by
  ioreq
  server

  On Mon, Aug 10, 2015 at 11:33:40AM +0800, Yu Zhang wrote:
  Currently in ioreq server, guest write-protected ram pages are
  tracked in the same rangeset with device mmio resources. Yet
  unlike device mmio, which can be in big chunks, the guest write-
  protected pages may be discrete ranges with 4K bytes each.

  This patch uses a seperate rangeset for the guest ram pages.
  And a new ioreq type, IOREQ_TYPE_MEM, is defined.

  Note: Previously, a new hypercall or subop was suggested to map
  write-protected pages into ioreq server. However, it turned out
  handler of this new hypercall would be almost the same with the
  existing pair - HVMOP_[un]map_io_range_to_ioreq_server, and
 there's
  already a type parameter in this hypercall. So no new hypercall
  defined, only a new type is introduced.

  Signed-off-by: Yu Zhang yu.c.zh...@linux.intel.com
  ---
tools/libxc/include/xenctrl.h| 39 +++---
tools/libxc/xc_domain.c  | 59
  ++--

  FWIW the hypercall wrappers look correct to me.

  diff --git a/xen/include/public/hvm/hvm_op.h
  b/xen/include/public/hvm/hvm_op.h
  index 014546a..9106cb9 100644
  --- a/xen/include/public/hvm/hvm_op.h
  +++ b/xen/include/public/hvm/hvm_op.h
  @@ -329,8 +329,9 @@ struct xen_hvm_io_range {
ioservid_t id;   /* IN - server id */
uint32_t type;   /* IN - type of range */
# define HVMOP_IO_RANGE_PORT   0 /* I/O port range */
  -# define HVMOP_IO_RANGE_MEMORY 1 /* MMIO range */
  +# define HVMOP_IO_RANGE_MMIO   1 /* MMIO range */
# define HVMOP_IO_RANGE_PCI2 /* PCI segment/bus/dev/func
  range
  */
  +# define HVMOP_IO_RANGE_MEMORY 3 /* MEMORY range */
  This looks problematic. Maybe you can get away with this because this
 is
  a toolstack-only interface?

  Indeed, the old name is a bit problematic. Presumably re-use like this
  would require an interface version change and some if-defery.

  I assume it is an interface used by qemu, so this patch in its currently
  state will break things.

  If QEMU were re-built against the updated header, yes.

 Thank you, Andrew  Paul. :)
 Are you referring to the xen_map/unmap_memory_section routines in
 QEMU?
 I noticed they are called by xen_region_add/del in QEMU. And I wonder,
 are these 2 routines used to track a memory region or to track a MMIO
 region? If the region to be added is a MMIO, I guess the new interface
 should be fine, but if it is memory region to be added into ioreq
 server, maybe a patch in QEMU is necessary(e.g. use some if-defery for
 this new interface version you suggested)?

I was forgetting that QEMU uses libxenctrl so your change to 
xc_hvm_map_io_range_to_ioreq_server() means everything will continue to work as 
before. There is still the (admittedly academic) problem of some unknown 
emulator out there that rolls its own hypercalls and blindly updates to the new 
version of hvm_op.h suddenly starting to register memory ranges rather than 
mmio ranges though. I would leave the existing definitions as-is and come up 
with a new name.

  Paul

 Thanks
 Yu

 Paul

  ~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH RFC v2 0/5] Multi-queue support for xen-blkfront and xen-blkback

2015-08-11 Thread Rafal Mielniczuk

On 11/08/15 07:08, Bob Liu wrote:
 On 08/10/2015 11:52 PM, Jens Axboe wrote:
 On 08/10/2015 05:03 AM, Rafal Mielniczuk wrote:
 On 01/07/15 04:03, Jens Axboe wrote:
 On 06/30/2015 08:21 AM, Marcus Granado wrote:
 Hi,

 Our measurements for the multiqueue patch indicate a clear improvement
 in iops when more queues are used.

 The measurements were obtained under the following conditions:

 - using blkback as the dom0 backend with the multiqueue patch applied to
 a dom0 kernel 4.0 on 8 vcpus.

 - using a recent Ubuntu 15.04 kernel 3.19 with multiqueue frontend
 applied to be used as a guest on 4 vcpus

 - using a micron RealSSD P320h as the underlying local storage on a Dell
 PowerEdge R720 with 2 Xeon E5-2643 v2 cpus.

 - fio 2.2.7-22-g36870 as the generator of synthetic loads in the guest.
 We used direct_io to skip caching in the guest and ran fio for 60s
 reading a number of block sizes ranging from 512 bytes to 4MiB. Queue
 depth of 32 for each queue was used to saturate individual vcpus in the
 guest.

 We were interested in observing storage iops for different values of
 block sizes. Our expectation was that iops would improve when increasing
 the number of queues, because both the guest and dom0 would be able to
 make use of more vcpus to handle these requests.

 These are the results (as aggregate iops for all the fio threads) that
 we got for the conditions above with sequential reads:

 fio_threads  io_depth  block_size   1-queue_iops  8-queue_iops
   8   32   512   158K 264K
   8   321K   157K 260K
   8   322K   157K 258K
   8   324K   148K 257K
   8   328K   124K 207K
   8   32   16K84K 105K
   8   32   32K50K  54K
   8   32   64K24K  27K
   8   32  128K11K  13K

 8-queue iops was better than single queue iops for all the block sizes.
 There were very good improvements as well for sequential writes with
 block size 4K (from 80K iops with single queue to 230K iops with 8
 queues), and no regressions were visible in any measurement performed.
 Great results! And I don't know why this code has lingered for so long,
 so thanks for helping get some attention to this again.

 Personally I'd be really interested in the results for the same set of
 tests, but without the blk-mq patches. Do you have them, or could you
 potentially run them?

 Hello,

 We rerun the tests for sequential reads with the identical settings but 
 with Bob Liu's multiqueue patches reverted from dom0 and guest kernels.
 The results we obtained were *better* than the results we got with 
 multiqueue patches applied:

 fio_threads  io_depth  block_size   1-queue_iops  8-queue_iops  
 *no-mq-patches_iops*
   8   32   512   158K 264K 321K
   8   321K   157K 260K 328K
   8   322K   157K 258K 336K
   8   324K   148K 257K 308K
   8   328K   124K 207K 188K
   8   32   16K84K 105K 82K
   8   32   32K50K  54K 36K
   8   32   64K24K  27K 16K
   8   32  128K11K  13K 11K

 We noticed that the requests are not merged by the guest when the 
 multiqueue patches are applied,
 which results in a regression for small block sizes (RealSSD P320h's 
 optimal block size is around 32-64KB).

 We observed similar regression for the Dell MZ-5EA1000-0D3 100 GB 2.5 
 Internal SSD

 As I understand blk-mq layer bypasses I/O scheduler which also effectively 
 disables merges.
 Could you explain why it is difficult to enable merging in the blk-mq layer?
 That could help closing the performance gap we observed.

 Otherwise, the tests shows that the multiqueue patches does not improve the 
 performance,
 at least when it comes to sequential read/writes operations.
 blk-mq still provides merging, there should be no difference there. Does the 
 xen patches set BLK_MQ_F_SHOULD_MERGE?

 Yes.
 Is it possible that xen-blkfront driver dequeue requests too fast after we 
 have multiple hardware queues?
 Because new requests don't have the chance merging with old requests which 
 were already dequeued and issued.


For some reason we don't see merges even when we set multiqueue to 1.
Below are some stats from the guest system when doing sequential 4KB reads:

$ fio --name=test --ioengine=libaio --direct=1 --rw=read --numjobs=8
  --iodepth=32 --time_based=1 --runtime=300 --bs=4KB
--filename=/dev/xvdb

$ iostat -xt 5 /dev/xvdb
avg-cpu:  %user   %nice %system %iowait

Re: [Xen-devel] Enormous size of libvirt libxl-driver.log with Xen 4.2 and 4.3

2015-08-11 Thread Ian Campbell

On Mon, 2015-08-03 at 11:47 +0100, Ian Campbell wrote:
 After the initial expected logging the file is simply full of:
 
 2015-08-02 19:12:12 UTC libxl: debug: 
 libxl.c:1004:domain_death_xswatch_callback: [evg=0x7f3cc44fa3f0:3] from 
 domid=0 nentries=1 rc=1
 2015-08-02 19:12:12 UTC libxl: debug: 
 libxl.c:1015:domain_death_xswatch_callback: [evg=0x7f3cc44fa3f0:3]   
 got=domaininfos[0] got-domain=0
 2015-08-02 19:12:12 UTC libxl: debug: 
 libxl.c:1015:domain_death_xswatch_callback: [evg=0x7f3cc44fa3f0:3]   
 got=domaininfos[1] got-domain=-1
 2015-08-02 19:12:12 UTC libxl: debug: 
 libxl.c:1023:domain_death_xswatch_callback:  got==gotend
 
 Repeated at around 51KHz.

This sounds a lot like 4783c99aab8 (see below for full log message), which
perhaps ought to be backported to the effected branches, i.e. 4.2 and 4.3. 

Looks like it was backported to 4.5 (as 0b19348f3cd1) and 4.4 (as
13623d5d8e85) already.

Ian?

Ian.


commit 4783c99aab866f470bd59368cfbf5ad5f677b0ec
Author: Ian Jackson ian.jack...@eu.citrix.com
Date:   Tue Mar 17 09:30:57 2015 -0600

libxl: In domain death search, start search at first domid we want

From: Ian Jackson ian.jack...@eu.citrix.com

When domain_death_xswatch_callback needed a further call to
xc_domain_getinfolist it would restart it with the last domain it
found rather than the first one it wants.

If it only wants one it will also only ask for one domain.  The result
would then be that it gets the previous domain again (ie, the previous
one to the one it wants), which still doesn't reveal the answer to the
question, and it would therefore loop again.

It's completely unclear to me why I thought it was a good idea to
start the xc_domain_getinfolist with the last domain previously found
rather than the first one left un-confirmed.  The code has been that
way since it was introduced.

Instead, start each xc_domain_getinfolist at the next domain whose
status we need to check.

We also need to move the test for !evg into the loop, we now need evg
to compute the arguments to getinfolist.

Signed-off-by: Ian Jackson ian.jack...@eu.citrix.com
Reported-by: Jim Fehlig jfeh...@suse.com
Reviewed-by: Jim Fehlig jfeh...@suse.com
Tested-by: Jim Fehlig jfeh...@suse.com
Acked-by: Wei Liu wei.l...@citrix.com
Acked-by: Ian Campbell ian.campb...@citrix.com

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [URGENT RFC] Branching and reopening -unstable

2015-08-11 Thread Wei Liu

Hi all

RC1 is going to be tagged this week (maybe today). We need to figure
out when to branch / reopen -unstable for committing and what rules
should be applied until 4.6 is out of the door.

Ian, Ian and I had a conversation IRL. We discussed several things,
but figured it is necessary to have more people involved before making
any decision.

Here is my recollection of the conversation.

Branching should be done at one of the RC tags. It might not be enough
time for us to reach consensus before tagging RC1, so I would say lets
branch at RC2 if we don't observe blocker bugs.

Maintainers should be responsible for both 4.6 branch and -unstable
branch.

As for bug fixes, here are two options.

Option 1: bug fixes go into -unstable, backport / cherry-pick bug
fixes back to 4.6. This seems to leave the tree in half frozen status
because we need to reject refactoring patches in case they cause
backporting failure.

Option 2: bug fixes go into 4.6, merge them to -unstable. If merge has
conflict and maintainers can't deal with that, the authors of those
changes in -unstable which cause conflict is responsible for fixing up
the conflict.

Ian and Ian, anything I miss? Anything to add?

Others, thoughts?

Wei.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [RFC PATCH 0/7] domain snapshot implementation

2015-08-11 Thread Chunyan Liu

Add vm snapshot implementation. Support snapshot-create and
snapshot-revert.

Current Limitations:

About disk snapshot create, there are many cases:
 - qdisk, internal, should calls qmp command to do the work.
 - qdisk, external, should calls qmp command to do the work, qemu
   will replace disk backend file after creating external snapshot.
 - nonqdisk, internal, should call 'qemu-img snapshot' to do the work.
 - nonqdisk, external, should call 'qemu-img create' to create a new file
   with the original disk file as backing file. And libxl should replace
   domain disk from original disk to the new file.

 To the last case, during domain snapshot, between domain suspend and
 resume, how to replace the disk backend file from libxl? Especially if
 disk file format is changed (original disk backend file is 'raw', new file
 is 'qcow2')?

 Considering this, currently I exclude the non-qdisk cases, let the API
 support qdisk only. About the non-qdisk and external case, any suggestion?

About disk snapshot revert:

 Reverting from external disk snapshot is actually starting domain from a
 specified backing file, since backing file should be kept read-only, that
 will involve block copy operation. Currently this case is not supported.

 Only support reverting from internal disk snapshot.

Design document:

Latest design document is just posted.


Chunyan Liu (7):
  libxl_types.idl: add definitions for vm snapshot
  qmp: add qmp handlers to create disk snapshots
  libxl: save disk format to xenstore
  libxl: add snapshot APIs
  xl: add domain snapshot commands
  qmp: add qmp handlers to delete internal/external disk snapshot
  libxl: add APIs to delete internal/external disk snapshots

 Config.mk|   2 +-
 config/Paths.mk.in   |   1 +
 configure|   3 +
 docs/man/xl.snapshot.conf.pod.5  |  59 +++
 m4/paths.m4  |   3 +
 tools/configure  |   3 +
 tools/examples/snapshot.cfg.external |   4 +
 tools/examples/snapshot.cfg.internal |   4 +
 tools/libxl/Makefile |   2 +
 tools/libxl/libxl.c  |  10 +-
 tools/libxl/libxl.h  |  51 +++
 tools/libxl/libxl_internal.h |  38 ++
 tools/libxl/libxl_qmp.c  | 224 
 tools/libxl/libxl_snapshot.c | 321 +
 tools/libxl/libxl_types.idl  |  31 ++
 tools/libxl/libxl_types_internal.idl |   8 +
 tools/libxl/libxl_utils.c|  16 +
 tools/libxl/libxl_utils.h|   1 +
 tools/libxl/xl.h |   2 +
 tools/libxl/xl_cmdimpl.c | 677 +++
 tools/libxl/xl_cmdtable.c|  16 +
 21 files changed, 1474 insertions(+), 2 deletions(-)
 create mode 100644 docs/man/xl.snapshot.conf.pod.5
 create mode 100644 tools/examples/snapshot.cfg.external
 create mode 100644 tools/examples/snapshot.cfg.internal
 create mode 100644 tools/libxl/libxl_snapshot.c

-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [RFC Doc V11 0/4] domain snapshot document

2015-08-11 Thread Chunyan Liu

Changes to V10:
  - several updates to xl design and libxl design to address comments on V10. 
  - few updates to keep consitent with code implementation

V10:
http://lists.xenproject.org/archives/html/xen-devel/2015-01/msg03071.html

Codes implementation is posted right after.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [RFC PATCH 5/7] xl: add domain snapshot commands

2015-08-11 Thread Chunyan Liu

Add domain snapshot create/revert commands implementation.

Since xl is expected to not maintain domain snapshot info itself,
it has no idea about how many snapshots and snapshot related
files and info, xl won't supply snapshot delete command. It'll
depend on users to delete things.

Signed-off-by: Chunyan Liu cy...@suse.com
---
 Config.mk|   2 +-
 config/Paths.mk.in   |   1 +
 configure|   3 +
 docs/man/xl.snapshot.conf.pod.5  |  59 +++
 m4/paths.m4  |   3 +
 tools/configure  |   3 +
 tools/examples/snapshot.cfg.external |   4 +
 tools/examples/snapshot.cfg.internal |   4 +
 tools/libxl/Makefile |   1 +
 tools/libxl/xl.h |   2 +
 tools/libxl/xl_cmdimpl.c | 677 +++
 tools/libxl/xl_cmdtable.c|  16 +
 12 files changed, 774 insertions(+), 1 deletion(-)
 create mode 100644 docs/man/xl.snapshot.conf.pod.5
 create mode 100644 tools/examples/snapshot.cfg.external
 create mode 100644 tools/examples/snapshot.cfg.internal

diff --git a/Config.mk b/Config.mk
index e9a7097..aa4884f 100644
--- a/Config.mk
+++ b/Config.mk
@@ -159,7 +159,7 @@ endef
 
 BUILD_MAKE_VARS := sbindir bindir LIBEXEC LIBEXEC_BIN libdir SHAREDIR \
XENFIRMWAREDIR XEN_CONFIG_DIR XEN_SCRIPT_DIR XEN_LOCK_DIR \
-   XEN_RUN_DIR XEN_PAGING_DIR XEN_DUMP_DIR
+   XEN_RUN_DIR XEN_PAGING_DIR XEN_DUMP_DIR XEN_SNAPSHOT_DIR
 
 buildmakevars2file = $(eval $(call buildmakevars2file-closure,$(1)))
 define buildmakevars2file-closure
diff --git a/config/Paths.mk.in b/config/Paths.mk.in
index d36504f..8e7d2a8 100644
--- a/config/Paths.mk.in
+++ b/config/Paths.mk.in
@@ -49,6 +49,7 @@ BASH_COMPLETION_DIR  := $(CONFIG_DIR)/bash_completion.d
 XEN_LOCK_DIR := @XEN_LOCK_DIR@
 XEN_PAGING_DIR   := @XEN_PAGING_DIR@
 XEN_DUMP_DIR := @XEN_DUMP_DIR@
+XEN_SNAPSHOT_DIR := @XEN_SNAPSHOT_DIR@
 
 XENFIRMWAREDIR   := @XENFIRMWAREDIR@
 
diff --git a/configure b/configure
index 80b27d6..e283d17 100755
--- a/configure
+++ b/configure
@@ -595,6 +595,7 @@ tools
 xen
 subdirs
 XEN_DUMP_DIR
+XEN_SNAPSHOT_DIR
 XEN_PAGING_DIR
 XEN_LOCK_DIR
 XEN_SCRIPT_DIR
@@ -1984,6 +1985,8 @@ XEN_PAGING_DIR=$localstatedir/lib/xen/xenpaging
 XEN_DUMP_DIR=$xen_dumpdir_path
 
 
+XEN_SNAPSHOT_DIR=$localstatedir/lib/xen/snapshot
+
 
 case $host_cpu in
 i[3456]86|x86_64)
diff --git a/docs/man/xl.snapshot.conf.pod.5 b/docs/man/xl.snapshot.conf.pod.5
new file mode 100644
index 000..28c2196
--- /dev/null
+++ b/docs/man/xl.snapshot.conf.pod.5
@@ -0,0 +1,59 @@
+=head1 NAME
+
+xl.snapshot.cfg - XL Domain Snapshot Configuration File Syntax
+
+=head1 DESCRIPTION
+
+Snapshot configuration file will be used in xl snapshot-create
+and xl snapshot-revert.
+
+Without snapshot configuration file, xl snapshot-create could create
+domain snapshot with default value. To create a user-defined domain
+snapshot, xl requires a domain snapshot config file.
+
+For snapshot-revert, it's mandatory, each item should be specified.
+
+Two examples for internal domain snapshot and external domain snapshot
+could be found in:
+/etc/xen/examples/snapshot.cfg.internal
+/etc/xen/examples/snapshot.cfg.external
+
+=head1 SYNTAX
+
+A domain config file consists of a series of CKEY=VALUE pairs. It
+shares the same rules with xl.cfg
+
+=head1 OPTIONS
+
+=over 4
+
+=item Bname=NAME
+
+Specifies the name of the domain snapshot. If ignored, it will be the
+epoch second from 1, Jan 1970. It will be used for taking internal
+disk snapshot, generate memory state file name and generate external
+disk snapshot file name.
+
+=item Bmemory=0|1
+
+Indicates whether to save memory state file. If not, it will take a
+disk-only snapshot. Currently xl doesn't support disk-only snapshot,
+so it can only be '1'.
+
+=item Bmemory_path=PATHNAME
+
+Location of memory state file. This state file is as same as the file
+in xl save. The value is the full directory of the location of memory
+state file. If ignored, it will be generated by default:
+snapshot path/snapshot name.save
+
+=item Bdisks=[ DISK_SPEC_STRING, DISK_SPEC_STRING, ...]
+
+Disk snapshot description.
+DISK_SPEC_STRING syntax is:
+'external path, external format, target device'
+If taking a internal disk snapshot, keep 'external path' and
+'external format' to be '', e.g. [',,xvda',].
+
+=back
+
diff --git a/m4/paths.m4 b/m4/paths.m4
index 63e0f6b..abd89d2 100644
--- a/m4/paths.m4
+++ b/m4/paths.m4
@@ -122,4 +122,7 @@ AC_SUBST(XEN_PAGING_DIR)
 
 XEN_DUMP_DIR=$xen_dumpdir_path
 AC_SUBST(XEN_DUMP_DIR)
+
+XEN_SNAPSHOT_DIR=$localstatedir/lib/xen/snapshot
+AC_SUBST(XEN_SNAPSHOT_DIR)
 ])
diff --git a/tools/configure b/tools/configure
index 1098f1f..cd604bb 100755
--- a/tools/configure
+++ b/tools/configure
@@ -716,6 +716,7 @@ monitors
 githttp
 rpath
 XEN_DUMP_DIR
+XEN_SNAPSHOT_DIR

[Xen-devel] [RFC PATCH 1/7] add definitions for vm snapshot

2015-08-11 Thread Chunyan Liu

Define libxl_disk_snapshot_type and libxl_disk_snapshot for VM
snapshot usage.

Signed-off-by: Chunyan Liu cy...@suse.com
---
 tools/libxl/libxl_types.idl  | 31 +++
 tools/libxl/libxl_types_internal.idl |  8 
 2 files changed, 39 insertions(+)

diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index ef346e7..f7a4c3e 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -793,3 +793,34 @@ libxl_psr_cat_info = Struct(psr_cat_info, [
 (cos_max, uint32),
 (cbm_len, uint32),
 ])
+
+libxl_disk_snapshot_type = Enumeration(disk_snapshot_type, [
+(0, INVALID),
+(1, INTERNAL),
+(2, EXTERNAL),
+])
+
+libxl_disk_snapshot = Struct(disk_snapshot,[
+# target disk
+(disk, libxl_device_disk),
+
+# disk snapshot name
+(name, string),
+
+(u, KeyedUnion(None, libxl_disk_snapshot_type, type,
+ [(external, Struct(None, [
+
+# disk format for external files. Since external disk snapshot is
+# implemented with backing file mechanism, the external file disk
+# format must support backing file. This field can be NULL, then
+# a proper disk format will be used by default according to the
+# orignal disk format.
+(external_format, libxl_disk_format),
+
+# external file path. This field should be non-NULL and a new path.
+(external_path,   string),
+])),
+  (internal, None),
+  (invalid, None),
+ ])),
+])
diff --git a/tools/libxl/libxl_types_internal.idl 
b/tools/libxl/libxl_types_internal.idl
index 5e55685..60dce1d 100644
--- a/tools/libxl/libxl_types_internal.idl
+++ b/tools/libxl/libxl_types_internal.idl
@@ -45,3 +45,11 @@ libxl__device_action = Enumeration(device_action, [
 (1, ADD),
 (2, REMOVE),
 ])
+
+libxl_disk_snapshot_op = Enumeration(disk_snapshot_op, [
+(1, CREATE),
+(2, DELETE),
+(3, REVERT),
+(4, LIST),
+])
+
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [RFC Doc V11 2/5] domain snapshot introduction

2015-08-11 Thread Chunyan Liu


1. Introduction

There are several types of snapshots:

 disk snapshot
 Contents of disks are saved at a given point of time, and can be
 restored back to that state.

 On a running guest, a disk snapshot is likely to be only
 crash-consistent rather than clean (that is, it represents the
 state of the disk on a sudden power outage, and may need fsck or
 journal replays to be made consistent).

 On a paused guest, with mechanism of quiesing disks (that is,
 all cached data written to disk), a disk snapshot is clean.

 On an inactive guest, a disk snapshot is clean if the disks were
 clean when the guest was last shut down.

 Disk snapshots exist in two forms: internal (file formats such as
 qcow2 track both the snapshot and changes since the snapshot in a
 single file) and external (the snapshot is one file, and the
 changes since the snapshot are in another file).

 memory state (or VM state)
 Tracks only the state of RAM and all other resources in use by
 the VM. If the disks are unmodified between the time a VM state
 snapshot is taken and restored, then the guest will resume in a
 consistent state; but if the disks are modified externally in
 the meantime, this is likely to lead to data corruption.

 system checkpoint (domain snapshot)
 A combination of disk snapshots for all disks as well as VM
 memory state, which can be used to resume the guest from where
 it left off with symptoms similar to hibernation (that is, TCP
 connections in the guest may have timed out, but no files or
 processes are lost).

 A system checkpoint can contain disk snapshots + VM state; or
 contains disk snapshots only without VM state, in this case,
 it should quiesce all disks before taking disk snapshots. The
 latter case is also referred as 'disk-only domain snapshot'.

VM state (memory) snapshots are created by 'domain save', and restore
via 'domain restore'.

Disk snapshot can be created by many external tools, like qemu-img,
vhd-util and lvm, etc.

Domain snapshot (including disk-only domain snapshot) will be handled
by 'domain snapshot' functionality.

Domain snapshot with memory state (as VM state) includes live and
non-live mode according to the VM downtime difference. Live mode will
try best to reduce downtime of the guest, but as a result will increase
size of the memory dump file.


2. Domain Snapshot User Cases

Domain snapshot can be used in following cases:

* Domain snapshot can be used as a domain backup. It can preserve the
  VM status at a certain point and able to roll back to it.

* Domain snapshot can support 'gold image' type deployments, i.e.
  where you create one baseline single disk image and then clone it
  multiple times to deploy lots of guests; when you create a domain
  snapshot, with it as gold domain snapshot (duplicate multiple times),
  one can restore from the gold domain snapshot mulitple times for
  different reasons.

* Disk-only domain snapshot can be used as backup out of domain,
  i.e. taking a disk-only domain snapshot and then run you usual backup
  software on the disk snapshots (which is now unchanging, which
  is handy); one can backup that static version of the disk out of band
  from the domain itself (e.g. can attach it to a separate backup VM).

3. Domain Snapshot Operations

Generally, domain snapshot includes 4 kinds of operations:

* create a domain snapshot

   create domain snapshot under different conditions:
   - domain is live, save vm state (live), disk snapshot
   - domain is live, save vm state (non-live), disk snapshot
   - domain is live, disk-only snapshot (need quiecing disks)
   - domain is offline, disk-only snapshot
   (under each above condition, disk snapshot can be
internal/external.)

* revert (roll back to) a domain snapshot

   revert domain snapshot under different conditions:
   - domain is live, has vm state, all internal disk snapshots
   - domain is live, has vm state, has external disk snapshots
   - domain is live, no vm state, all internal disk snapshots
   - domain is live, no vm state, has external disk snapshots
   - domain is offline, has vm state, all internal disk snapshots
   - domain is offline, has vm state, has external disk snapshots
   - domain is offline, no vm state, all internal disk snapshots
   - domain is offline, no vm state, has external disk snapshots

* delete a domain snapshot

   delete domain snapshot under following conditions:
   - domain is live, not in a snapshot chain
   - domain is live, in a snapshot chain
   - domain is offline, not in a snapshot chain
   - domain is offline, in a snapshot chain

* list domain snapshot(s)
   list domain snapshot(s) contains:
   - list a single domain snapshot
   - list all domain snapshots
   - list snapshot(s) in details


4. Disk Snapshot operations

Also 4 kinds:
 * Create disk snapshot
 * Delete disk snapshot
 * Revert (apply) disk snapshot
 * List disk

[Xen-devel] [RFC PATCH 4/7] libxl: add snapshot APIs

2015-08-11 Thread Chunyan Liu

Add snapshot related APIs for xl, including:
create disk snapshots, revert disk snapshots.

Together with existing memory save/restore APIs, xl can create domain
snapshot and revert from a domain snapshot.

Limitations:

About disk snapshot create, there are many cases:
- qdisk, internal, should calls qmp command to do the work.
- qdisk, external, should calls qmp command to do the work, qemu
  will replace disk backend file after creating external snapshot.
- nonqdisk, internal, should call 'qemu-img snapshot' to do the work.
- nonqdisk, external, should call 'qemu-img create' to create a new file
  with the original disk file as backing file. And libxl should replace
  domain disk from original disk to the new file.

Problem is: to the last case, during domain snapshot, between domain suspend
and resume, how to replace the disk backend file from libxl? Especially if
disk file format is changed (original disk backend file is 'raw', new file
is 'qcow2') ?

Considering this, currently the API only support qdisk, non-qdisk cases
are not included.

About disk snapshot revert:

Reverting from external disk snapshot is actually starting domain from a
specified backing file, since backing file should be kept read-only, that
will involve block copy operation. Currently this case is not supported.

Only support reverting from internal disk snapshot.

Signed-off-by: Chunyan Liu cy...@suse.com
---
 tools/libxl/Makefile |   1 +
 tools/libxl/libxl.h  |   6 ++
 tools/libxl/libxl_internal.h |   6 ++
 tools/libxl/libxl_snapshot.c | 219 +++
 4 files changed, 232 insertions(+)
 create mode 100644 tools/libxl/libxl_snapshot.c

diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
index 9036076..0917326 100644
--- a/tools/libxl/Makefile
+++ b/tools/libxl/Makefile
@@ -105,6 +105,7 @@ LIBXL_OBJS = flexarray.o libxl.o libxl_create.o libxl_dm.o 
libxl_pci.o \
libxl_qmp.o libxl_event.o libxl_fork.o \
libxl_dom_suspend.o $(LIBXL_OBJS-y)
 LIBXL_OBJS += libxl_genid.o
+LIBXL_OBJS += libxl_snapshot.o
 LIBXL_OBJS += _libxl_types.o libxl_flask.o _libxl_types_internal.o
 
 LIBXL_TESTS += timedereg
diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index 5f9047c..d60f139 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -1729,6 +1729,12 @@ int libxl_psr_cat_get_l3_info(libxl_ctx *ctx, 
libxl_psr_cat_info **info,
 void libxl_psr_cat_info_list_free(libxl_psr_cat_info *list, int nr);
 #endif
 
+/* Domain snapshot related APIs */
+int libxl_disk_snapshot_create(libxl_ctx *ctx, uint32_t domid,
+   libxl_disk_snapshot *snapshot, int nb);
+int libxl_disk_snapshot_revert(libxl_ctx *ctx, uint32_t domid,
+   libxl_disk_snapshot *snapshot, int nb);
+
 /* misc */
 
 /* Each of these sets or clears the flag according to whether the
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index c3dec85..f24e0af 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -1749,6 +1749,12 @@ _hidden void libxl__qmp_cleanup(libxl__gc *gc, uint32_t 
domid);
 _hidden int libxl__qmp_initializations(libxl__gc *gc, uint32_t domid,
const libxl_domain_config 
*guest_config);
 
+typedef struct libxl__ao_snapshot libxl__ao_snapshot;
+struct libxl__ao_snapshot {
+libxl__ao *ao;
+libxl__ev_child child;
+};
+
 /* on failure, logs */
 int libxl__sendmsg_fds(libxl__gc *gc, int carrier,
const void *data, size_t datalen,
diff --git a/tools/libxl/libxl_snapshot.c b/tools/libxl/libxl_snapshot.c
new file mode 100644
index 000..34d36ef
--- /dev/null
+++ b/tools/libxl/libxl_snapshot.c
@@ -0,0 +1,219 @@
+/*
+ * libxl_snapshot.c: code domain snapshot related APIs
+ *
+ * Copyright (C) 2015 SUSE LINUX Products GmbH, Nuernberg, Germany.
+ * Author Chunyan Liu cy...@suse.com
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published
+ * by the Free Software Foundation; version 2.1 only. with the special
+ * exception on linking described in file LICENSE.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU Lesser General Public License for more details.
+ */
+
+#include libxl_osdeps.h /* must come before any other headers */
+
+#include libxl_internal.h
+
+/* Replace domain disk to external path after taking external disk
+ * snapshot, since original disk becomes backing file. It will need
+ * to update xenstore information as well as domain config.
+ */
+static int libxl__update_disk_configuration(libxl__gc *gc, uint32_t domid,
+libxl_disk_snapshot snapshot)
+{
+char *backend_path, *path, *value;

[Xen-devel] [RFC PATCH 7/7] libxl: add APIs to delete internal/external disk snapshots

2015-08-11 Thread Chunyan Liu

Currently this group of APIs are not used by xl toolstack since xl
doesn't maintain domain snapshot info and so depends on user to
delete things.

But for libvirt, they are very useful since libvirt maintains domain
snapshot info itself and needs these APIs to delete
internal/external disk snapshots.

Signed-off-by: Chunyan Liu cy...@suse.com
---
 tools/libxl/libxl.h  |  28 
 tools/libxl/libxl_snapshot.c | 102 +++
 2 files changed, 130 insertions(+)

diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index 412a42f..1383b92 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -1752,6 +1752,34 @@ int libxl_disk_snapshot_create(libxl_ctx *ctx, uint32_t 
domid,
 int libxl_disk_snapshot_revert(libxl_ctx *ctx, uint32_t domid,
libxl_disk_snapshot *snapshot, int nb);
 
+/* delete internal disk snapshot */
+int libxl_disk_snapshot_delete(libxl_ctx *ctx, uint32_t domid,
+   libxl_disk_snapshot *snapshot, int nb);
+/* next 4 functions deal with external disk snapshot */
+/* shorten backing file chain. Merge from top to base */
+int libxl_domain_block_rebase(libxl_ctx *ctx, uint32_t domid,
+  libxl_device_disk *disk,
+  const char *base,
+  const char *backing_file,
+  unsigned long long bandwidth);
+/* shorten backing file chain. Merge from base to top */
+int libxl_domain_block_commit(libxl_ctx *ctx, uint32_t domid,
+  libxl_device_disk *disk,
+  const char *top,
+  const char *base,
+  const char *backing_file,
+  unsigned long long bandwidth);
+/* query a block job status, can get job type, speed, progress status */
+int libxl_domain_block_job_query(libxl_ctx *ctx, uint32_t domid,
+ libxl_device_disk *disk,
+ libxl_block_job_info *info);
+/* abort a block job. If the job is finished, complete it.
+ * otherwise, cancel it.
+ */
+int libxl_domain_block_job_abort(libxl_ctx *ctx, uint32_t domid,
+ libxl_device_disk *disk,
+ bool force);
+
 /* misc */
 
 /* Each of these sets or clears the flag according to whether the
diff --git a/tools/libxl/libxl_snapshot.c b/tools/libxl/libxl_snapshot.c
index 34d36ef..9b139e6 100644
--- a/tools/libxl/libxl_snapshot.c
+++ b/tools/libxl/libxl_snapshot.c
@@ -217,3 +217,105 @@ int libxl_disk_snapshot_revert(libxl_ctx *ctx, uint32_t 
domid,
 }
 return rc;
 }
+
+int libxl_disk_snapshot_delete(libxl_ctx *ctx, uint32_t domid,
+   libxl_disk_snapshot *snapshot, int nb)
+{
+int rc = 0;
+int i;
+
+GC_INIT(ctx);
+for(i = 0; i  nb; i++ ) {
+if (snapshot[i].type == LIBXL_DISK_SNAPSHOT_TYPE_EXTERNAL) {
+LOG(WARN, libxl_disk_snapshot_delete: external disk snapshot 
+cannot be deleted. Please use libxl_domain_block_commit and 
+libxl_domain_block_rebase to handle that.);
+continue;
+}
+
+rc = libxl__qmp_disk_snapshot_delete(gc, domid, snapshot[i]);
+if ( rc )
+goto err;
+}
+err:
+if (rc)
+LOG(ERROR, domain disk snapshot delete fail\n);
+
+GC_FREE;
+return rc;
+}
+
+int libxl_domain_block_rebase(libxl_ctx *ctx, uint32_t domid,
+  libxl_device_disk *disk,
+  const char *base,
+  const char *backing_file,
+  unsigned long long bandwidth)
+{
+GC_INIT(ctx);
+int rc;
+
+rc = libxl__qmp_block_stream(gc, domid, disk, base,
+ backing_file, bandwidth, NULL);
+GC_FREE;
+return rc;
+}
+
+int libxl_domain_block_commit(libxl_ctx *ctx, uint32_t domid,
+  libxl_device_disk *disk,
+  const char *top,
+  const char *base,
+  const char *backing_file,
+  unsigned long long bandwidth)
+
+{
+GC_INIT(ctx);
+int rc;
+
+rc = libxl__qmp_block_commit(gc, domid, disk, top, base,
+ backing_file, bandwidth);
+
+GC_FREE;
+return rc;
+}
+
+/* query block job status */
+int libxl_domain_block_job_query(libxl_ctx *ctx, uint32_t domid,
+ libxl_device_disk *disk,
+ libxl_block_job_info *info)
+{
+GC_INIT(ctx);
+int rc;
+
+rc = libxl__qmp_query_block_job(gc, domid, disk, info);
+
+GC_FREE;
+return rc;
+}
+
+/* Abort block job:
+ * If block job is already finished, call block_job_complete qmp;
+ * otherwise, call

[Xen-devel] [RFC Doc V11 4/5] domain snapshot libxl design

2015-08-11 Thread Chunyan Liu


libxl Design

1. New Structures

libxl_disk_snapshot_type = Enumeration(disk_snapshot_type, [
(0, invalid),
(1, internal),
(2, external),
])

libxl_disk_snapshot = Struct(disk_snapshot,[
# target disk
(disk, libxl_device_disk),

# disk snapshot name
(name, string),

(u, KeyedUnion(None, libxl_disk_snapshot_type, type,
 [(external, Struct(None, [

# disk format for external files. Since external disk snapshot is
# implemented with backing file mechanism, the external file disk
# format must support backing file. This field can be NULL, then
# a proper disk format will be used by default according to the
# orignal disk format.
(external_format, libxl_disk_format),

# external file path. This field should be non-NULL and a new path.
(external_path,   string),
])),
  (internal, None),
  (invalid, None),
 ])),
])


2. New Functions

Since there're already APIs for saving memory (libxl_domain_suspend)
and restoring domain from saved memory (libxl_domain_create_restore), to
xl domain snapshot tasks, the missing part is disk snapshot functionality.
And the disk snapshot functionality would be used by libvirt too.

## disk snapshot create

/**
 * libxl_disk_snaphost_create:
 * @ctx: libxl context
 * @domid: domain id
 * @snapshot: array of disk snapshot configuration. Has nb members.
 * - libxl_device_disk:
 * structure to represent which disk.
 * - name:
 * snapshot name.
 * - type:
 *disk snapshot type: internal or external.
 * - u.external.external_format:
 * Format of external file.
 * After disk snapshot, original file will become a backing
 * file, while external file will keep the delta, so
 * external_format should support backing file, like: cow,
 * qcow, qcow2, etc.
 * If it is NULL, then it will use proper format by default
 * according to original disk format.
 * - u.external.external_path:
 * path to external file. non-NULL.
 * @nb: number of disks that need to take disk snapshot.
 *
 * createing internal/external disk snapshot
 *
 * Taking disk snapshots to a group of domain disks according to
 * configuration. Support both internal disk snapshot and external
 * disk snapshot. For qdisk backend type, it will call qmp
 * transaction command to do the work. For other disk backend types,
 * might call other external commands.
 *
 * Returns 0 on success, 0 on failure.
 */
int libxl_disk_snapshot_create(libxl_ctx *ctx, uint32_t domid,
   libxl_disk_snapshot *snapshot, int nb);


## disk snapshot revert

/**
 * libxl_disk_snapshot_revert:
 * @snapshot: array of disk snapshot configuration. Has nb members.
 * @nb: number of disks.
 *
 * Revert disks to specified snapshot according to configuration. To
 * different disk backend types, call different external commands to do
 * the work.
 *
 * Returns 0 on success, 0 on failure.
 */
int libxl_disk_snapshot_revert(libxl_disk_snapshot *snapshot, int nb);

For disk snapshot revert, since domain snapshot revert is essentially
destroy, revert disks and restore from RAM. There is no qemu process
to speak to during reverting disks. So, it always calls external
commands to finish the work:


## disk snapshot delete

Since xl won't supply domain snapshot delete functionality, this group
of functions won't be used by xl, but will be used by libvirt.

/**
 * libxl_disk_snaphost_delete:
 * @ctx: libxl context
 * @domid: domain id
 * @snapshot: array of disk snapshot configuration. Has nb members.
 * @nb: number of disks.
 *
 * Delete disk snapshot of a group of domain disks according to
 * configuration. Can only handle internal disk snapshot. Currently
 * only valid for 'qcow2' disk, by calling qmp command if it is qdisk
 * backend or by calling qemu-img if it is other backend type.
 * 
 * To delete external disk snapshots, means shorten backing file chain
 * and merge snapshot data, must know snapshot chain info. Functions
 * libxl_domain_block_rebase and libxl_domain_block_commit would help.
 *
 * Returns 0 on success, 0 on failure.
 */
int libxl_disk_snapshot_delete(libxl_ctx *ctx, uint32_t domid,
   libxl_disk_snapshot *snapshot, int nb);


Following functions would help to delete external disk snapshots.
They are actually two directions to shorten backing file chain. One is
from base to top merge, the other is from top to base merge.
Both need caller to know the backing file chain information.

/**
 * libxl_domain_block_rebase:
 * @ctx: libxl context
 * @domid: domain id
 * @disk: path to the block device
 * @base: path to backing file to keep, or NULL for no backing file
 * @bandwidth: (optional) bandwidth limit in B/s, 0 for no limit.
 *
 * Merge data from base to top
 *
 * Populate a disk image with data from its backing

[Xen-devel] [RFC PATCH 6/7] qmp: add qmp handlers to delete internal/external disk snapshot

2015-08-11 Thread Chunyan Liu

Xl doesn't maintain domain snapshot info and has no idea of snapshot
info and related files after creation, so it doesn't supply domain
snapshot delete command. These qmp handlers won't be used by xl.

But for libvirt, it maintains domain snapshot info itself and supplies
snapshot delete commands, and so needs APIs from libxl to delete
internal/external disk snapshots. To libvirt, these qmp handlers
are userful.

Signed-off-by: Chunyan Liu cy...@suse.com
---
 tools/libxl/libxl.h  |  17 +
 tools/libxl/libxl_internal.h |  28 
 tools/libxl/libxl_qmp.c  | 158 +++
 3 files changed, 203 insertions(+)

diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index d60f139..412a42f 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -1730,6 +1730,23 @@ void libxl_psr_cat_info_list_free(libxl_psr_cat_info 
*list, int nr);
 #endif
 
 /* Domain snapshot related APIs */
+
+/* structure to retrieve qmp block job status */
+typedef struct libxl_block_job_info
+{
+ char *disk_vdev;
+ const char *type;
+ unsigned long speed;
+ /* The following fields provide an indication of block job progress.
+  * @current indicates the current position and will be between 0 and @end.
+  * @end is the final cursor position for this operation and represents
+  * completion.
+  * To approximate progress, divide @cur by @end.
+  */
+  unsigned long long current;
+  unsigned long long end;
+} libxl_block_job_info;
+
 int libxl_disk_snapshot_create(libxl_ctx *ctx, uint32_t domid,
libxl_disk_snapshot *snapshot, int nb);
 int libxl_disk_snapshot_revert(libxl_ctx *ctx, uint32_t domid,
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index f24e0af..a6456e8 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -1739,6 +1739,34 @@ _hidden int libxl__qmp_cpu_add(libxl__gc *gc, int domid, 
int index);
 _hidden int libxl__qmp_disk_snapshot_transaction(libxl__gc *gc, int domid,
 libxl_disk_snapshot *snapshot,
 int nb);
+/* Delete a disk snapshot */
+_hidden int libxl__qmp_disk_snapshot_delete(libxl__gc *gc, int domid,
+   libxl_disk_snapshot *snapshot);
+/* shorten backing file chain. Merge base to top */
+_hidden int libxl__qmp_block_commit(libxl__gc *gc, uint32_t domid,
+libxl_device_disk *disk,
+const char *base, const char *top,
+const char *backing_file,
+unsigned long bandwidth);
+/* shorten backing file chain. Merge top to base */
+_hidden int libxl__qmp_block_stream(libxl__gc *gc, uint32_t domid,
+libxl_device_disk *disk,
+const char *base,
+const char *backing_file,
+unsigned long long bandwidth,
+const char *error);
+/* query qmp block job status */
+_hidden int libxl__qmp_query_block_job(libxl__gc *gc, uint32_t domid,
+   libxl_device_disk *disk,
+   libxl_block_job_info *info);
+/* cancel a qmp block job */
+_hidden int libxl__qmp_block_job_cancel(libxl__gc *gc, uint32_t domid,
+libxl_device_disk *disk,
+bool force);
+
+/* complete a qmp block job */
+_hidden int libxl__qmp_block_job_complete(libxl__gc *gc, uint32_t domid,
+  libxl_device_disk *disk);
 /* close and free the QMP handler */
 _hidden void libxl__qmp_close(libxl__qmp_handler *qmp);
 /* remove the socket file, if the file has already been removed,
diff --git a/tools/libxl/libxl_qmp.c b/tools/libxl/libxl_qmp.c
index 2216511..09cb628 100644
--- a/tools/libxl/libxl_qmp.c
+++ b/tools/libxl/libxl_qmp.c
@@ -1034,6 +1034,164 @@ int libxl__qmp_disk_snapshot_transaction(libxl__gc *gc, 
int domid,
 return qmp_run_command(gc, domid, transaction, args, NULL, NULL);
 }
 
+int libxl__qmp_disk_snapshot_delete(libxl__gc *gc, int domid,
+   libxl_disk_snapshot *snapshot)
+{
+libxl__json_object *args = NULL;
+
+if (snapshot-type == LIBXL_DISK_SNAPSHOT_TYPE_EXTERNAL) {
+LOG(ERROR, QMP doesn't support deleting external disk snapshot);
+return -1;
+}
+
+if (snapshot-type != LIBXL_DISK_SNAPSHOT_TYPE_INTERNAL) {
+LOG(ERROR, Invalid disk snapshot type);
+return -1;
+}
+
+qmp_parameters_add_string(gc, args, device, snapshot-disk.vdev);
+qmp_parameters_add_string(gc, args, name, snapshot-name);
+
+return qmp_run_command(gc, domid, blockdev-snapshot-delete-internal-sync,

[Xen-devel] [RFC Doc V11 3/5] domain snapshot xl design

2015-08-11 Thread Chunyan Liu


XL Design

1. User Interface

xl snapshot-create:
  Create a snapshot (disk and RAM) of a domain.

  SYNOPSIS:
snapshot-create [--live] [--internal|--external] [--path=path] Domain
[ConfigFile]
  OPTIONS:
-l,--livetake a live snapshot
-i,--internaltake internal disk snapshots to all disks
-e,--externaltake external disk snapshots to all disks
-p,--pathpath to store snapshot data

If no options specified and no @ConfigFile specified:
e.g. # xl snapshot-create domain
By default, it will create a domain snapshot with default name
generated according to creation time. This name will be used to
generate default RAM snaphsot name and disk snapshot name, and
generate the default directory to store all the snapshot data
(RAM snapshot file, external disk snapshot files, etc.)
e.g. result of above command would be:
default snapshot root directory:
/var/lib/xen/snapshots/
default snapshot name generated :
20150122xxx
default subdirectory to save data of this snapshot:
/var/lib/xen/snapshots/domain_uuid/20150122xxx/
RAM snapshot file:
By default, it will save memory. Location is here:
/var/lib/xen/snapshots/domain_uuid/20150122xxx/20150122xxx.save
disk snapshots:
By default, to each domain disk, take internal disk snapshot if
that disk supports, otherwise, take external disk snapshot.

Internal disk snapshot: take disk snapshot with name 20150122xxx
External disk snapshot: external file is:
/var/lib/xen/snapshots/domain_uuid/20150122xxx/vda_20150122xxx.qcow2
/var/lib/xen/snapshots/domain_uuid/20150122xxx/vdb_20150122xxx.qcow2

If option includes --live, then the domain is not paused while creating
the snapshot, like live migration. This increases size of the memory
dump file, but reducess downtime of the guest.

If option includes --path, all snapshot data will be saved in this @path.
If no @ConfigFile:name specified, then use default name (generated by
time).

User could specify snapshot information in details through @ConfigFile,
see following ConfigFile syntax. If configuration in @ConfigFile conflicts
with options, use options.


xl snapshot-revert:
  Revert domain to status of a snapshot.

  SYNOPSIS:
snapshot-revert [--pause] [--force] Domain ConfigFile

  OPTIONS:
-p,--pausekeep domain paused after the revert
-f,--forcetry harder on risky revert


About domain snapshot delete:
xl doesn't have snapshot chain information, so it couldn't do the full work.
If supply:
  xl snapshot-delete domain cfgfile
For internal disk snapshot, deleting disk snapshot doesn't need snapshot 
chain info, this commands can finish the work. But for external disk 
snapshot,
deleting disk snapshot will need to merge backing file chain, then will need
the backing file chain information, this command can not finish that.

So, deleting domain snapshots will be left to user:

user could delete RAM snapshots and disk snapshots by themselves:
RAM snapshot file: user could remove it directly.
Disk snapshots:
  - Internal disk snapshot, issue 'qemu-img snapshot -d'
  - External disk snapshot, basically it is implemented as backing file 
chain.
Use 'qemu-img commit' to remove one file from the chain and merge its 
data
forward.


2. cfgfile syntax

# snapshot name. If user doesn't provide a VM snapshot name, xl will generate
# a name automatically by creation time or by @path basename.
name=

# save memory or disk-only.
# If memory is '0', doesn't save memory, take disk-only domain snapshot.
# If memory is '1', domain memory is saved.
# Default if 1.
memory=1

# memory location. This field is valid when memory=1.
# If it is set to , xl will generate a path by creation time or by @path
# basename.
memory_path=

# disk snapshot specification
#
# Syntax: 'external path, external format, target device'
#
# By default, if no disks is specified here, it will take disk snapshot
# to all disks: take internal disk snapshot if disk support internal disk
# snapshot; and external disk snapshot to other disks.

#disks=['/tmp/hda_snapshot.qcow2,qcow2,hda', ',,hdb',]


3. xl snapshot-xxx implementation

xl snapshot-create

1), parse args or user configuration file.
2), if saveing memory: save domain (store saved memory to memory_path)
if taking disk-only snapshot: pause domain, quiece disks. (not
supported now, maybe in future.)
3), create disk snapshots according to disk snapshot configuration
4), unpause domain

xl snapshot-revert

1), parse user configuration file
2), destroy current domain
3), revert disk snapshots according to disk snapshot configuration
4), restore domain from saved memory.

4. Notes

* user should take care of snapshot data: saved memory file, disk

[Xen-devel] [RFC PATCH 2/7] qmp: add qmp handlers to create disk snapshots

2015-08-11 Thread Chunyan Liu

Add qmp handlers to take disk snapshots. This will be used when
creating a domain snapshots.

Signed-off-by: Chunyan Liu cy...@suse.com
---
 tools/libxl/libxl_internal.h |  4 +++
 tools/libxl/libxl_qmp.c  | 66 
 2 files changed, 70 insertions(+)

diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 6ea6c83..c3dec85 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -1735,6 +1735,10 @@ _hidden int libxl__qmp_set_global_dirty_log(libxl__gc 
*gc, int domid, bool enabl
 _hidden int libxl__qmp_insert_cdrom(libxl__gc *gc, int domid, const 
libxl_device_disk *disk);
 /* Add a virtual CPU */
 _hidden int libxl__qmp_cpu_add(libxl__gc *gc, int domid, int index);
+/* Create disk snapshots for a group of disks in a transaction */
+_hidden int libxl__qmp_disk_snapshot_transaction(libxl__gc *gc, int domid,
+libxl_disk_snapshot *snapshot,
+int nb);
 /* close and free the QMP handler */
 _hidden void libxl__qmp_close(libxl__qmp_handler *qmp);
 /* remove the socket file, if the file has already been removed,
diff --git a/tools/libxl/libxl_qmp.c b/tools/libxl/libxl_qmp.c
index 965c507..2216511 100644
--- a/tools/libxl/libxl_qmp.c
+++ b/tools/libxl/libxl_qmp.c
@@ -968,6 +968,72 @@ int libxl__qmp_cpu_add(libxl__gc *gc, int domid, int idx)
 return qmp_run_command(gc, domid, cpu-add, args, NULL, NULL);
 }
 
+/*
+ * requires QEMU version = 1.1
+ * qmp command example:
+ * - { execute: transaction,
+ *  arguments: { actions: [
+ *  { type: blockdev-snapshot-sync, data : { device: ide-hd0,
+ *  snapshot-file: 
/some/place/my-image,
+ *  format: qcow2 } },
+ *  { 'type': 'blockdev-snapshot-internal-sync', 'data' : {
+ *  device: ide-hd1,
+ *  name: snapshot0 } } ] } }
+ *  { 'type': 'blockdev-snapshot-internal-sync', 'data' : {
+ *  device: ide-hd2,
+ *  name: snapshot0 } } ] } }
+ * - { return: {} }
+ */
+int libxl__qmp_disk_snapshot_transaction(libxl__gc *gc, int domid,
+ libxl_disk_snapshot *snapshot,
+ int nb)
+{
+libxl__json_object *args = NULL;
+libxl__json_object *actions = NULL;
+libxl__json_object **type = NULL;
+libxl__json_object **data = NULL;
+int i;
+
+type = (libxl__json_object**)calloc(nb, sizeof(libxl__json_object*));
+data = (libxl__json_object**)calloc(nb, sizeof(libxl__json_object*));
+actions = libxl__json_object_alloc(gc, JSON_ARRAY);
+
+for (i = 0; i  nb; i++) {
+switch (snapshot[i].type) {
+case LIBXL_DISK_SNAPSHOT_TYPE_INTERNAL:
+/* internal disk snapshot */
+qmp_parameters_add_string(gc, type[i], type,
+  blockdev-snapshot-internal-sync);
+qmp_parameters_add_string(gc, data[i], name,
+  snapshot[i].name);
+qmp_parameters_add_string(gc, data[i], device,
+  snapshot[i].disk.vdev);
+qmp_parameters_common_add(gc, type[i], data, data[i]);
+flexarray_append(actions-u.array, (void*)type[i]);
+break;
+case LIBXL_DISK_SNAPSHOT_TYPE_EXTERNAL:
+/* external disk snapshot */
+qmp_parameters_add_string(gc, type[i], type,
+  blockdev-snapshot-sync);
+qmp_parameters_add_string(gc, data[i], device,
+  snapshot[i].disk.vdev);
+qmp_parameters_add_string(gc, data[i], snapshot-file,
+  snapshot[i].u.external.external_path);
+qmp_parameters_add_string(gc, data[i], format,
+
libxl_disk_format_to_string(snapshot[i].u.external.external_format));
+qmp_parameters_common_add(gc, type[i], data, data[i]);
+flexarray_append(actions-u.array, (void*)type[i]);
+break;
+default:
+LOG(ERROR, Invalid disk snapshot type);
+return -1;
+}
+}
+
+qmp_parameters_common_add(gc, args, actions, actions);
+return qmp_run_command(gc, domid, transaction, args, NULL, NULL);
+}
+
 int libxl__qmp_initializations(libxl__gc *gc, uint32_t domid,
const libxl_domain_config *guest_config)
 {
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [RFC PATCH 3/7] libxl: save disk format to xenstore

2015-08-11 Thread Chunyan Liu

Disk snapshot handling depends on disk format. Currently since disk
format is not saved into xenstore, when getting device disk list,
disk-format is LIBXL_DISK_FORMAT_UNKNOWN. Disk snapshot cannot
continue without correct disk format information, so adding code
to save disk format to xenstore so that when getting device disk
list, disk-format contains correct information.

Signed-off-by: Chunyan Liu cy...@suse.com
---
 tools/libxl/libxl.c   | 10 +-
 tools/libxl/libxl_utils.c | 16 
 tools/libxl/libxl_utils.h |  1 +
 3 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index 083f099..dce43d6 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -2524,6 +2524,8 @@ static void device_disk_add(libxl__egc *egc, uint32_t 
domid,
 goto out;
 }
 
+flexarray_append(back, format);
+flexarray_append(back, 
libxl__device_disk_string_of_format(disk-format));
 flexarray_append(back, frontend-id);
 flexarray_append(back, libxl__sprintf(gc, %d, domid));
 flexarray_append(back, online);
@@ -2682,7 +2684,13 @@ static int libxl__device_disk_from_xs_be(libxl__gc *gc,
 }
 disk-is_cdrom = !strcmp(tmp, cdrom);
 
-disk-format = LIBXL_DISK_FORMAT_UNKNOWN;
+tmp = libxl__xs_read(gc, XBT_NULL,
+ libxl__sprintf(gc, %s/format, be_path));
+if (!tmp) {
+LOG(ERROR, Missing xenstore node %s/format, be_path);
+goto cleanup;
+}
+libxl_string_to_format(ctx, tmp, (disk-format));
 
 return 0;
 cleanup:
diff --git a/tools/libxl/libxl_utils.c b/tools/libxl/libxl_utils.c
index bfc9699..067a9fc 100644
--- a/tools/libxl/libxl_utils.c
+++ b/tools/libxl/libxl_utils.c
@@ -322,6 +322,22 @@ out:
 return rc;
 }
 
+int libxl_string_to_format(libxl_ctx *ctx, char *s, libxl_disk_format *format)
+{
+int rc = 0;
+
+if (!strcmp(s, aio)) {
+*format = LIBXL_DISK_FORMAT_RAW;
+} else if (!strcmp(s, vhd)) {
+*format = LIBXL_DISK_FORMAT_VHD;
+} else if (!strcmp(s, qcow)) {
+*format = LIBXL_DISK_FORMAT_QCOW;
+} else if (!strcmp(s, qcow2)) {
+*format = LIBXL_DISK_FORMAT_QCOW2;
+}
+return rc;
+}
+
 int libxl_read_file_contents(libxl_ctx *ctx, const char *filename,
  void **data_r, int *datalen_r) {
 GC_INIT(ctx);
diff --git a/tools/libxl/libxl_utils.h b/tools/libxl/libxl_utils.h
index 1e5ca8a..0897069 100644
--- a/tools/libxl/libxl_utils.h
+++ b/tools/libxl/libxl_utils.h
@@ -37,6 +37,7 @@ int libxl_get_stubdom_id(libxl_ctx *ctx, int guest_domid);
 int libxl_is_stubdom(libxl_ctx *ctx, uint32_t domid, uint32_t *target_domid);
 int libxl_create_logfile(libxl_ctx *ctx, const char *name, char **full_name);
 int libxl_string_to_backend(libxl_ctx *ctx, char *s, libxl_disk_backend 
*backend);
+int libxl_string_to_format(libxl_ctx *ctx, char *s, libxl_disk_format *format);
 
 int libxl_read_file_contents(libxl_ctx *ctx, const char *filename,
  void **data_r, int *datalen_r);
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [URGENT RFC] Branching and reopening -unstable

2015-08-11 Thread Ian Jackson

Wei Liu writes ([URGENT RFC] Branching and reopening -unstable):
 Branching should be done at one of the RC tags. It might not be enough
 time for us to reach consensus before tagging RC1, so I would say lets
 branch at RC2 if we don't observe blocker bugs.
 
 Maintainers should be responsible for both 4.6 branch and -unstable
 branch.
 
 As for bug fixes, here are two options.

I think this conflates the three questions which should be answered:

 Q1: What is the status of the newly branched -unstable ?  Should
 we avoid (some or all) big sets of changes ?
  (a) Don't branch
  (b) Branch but don't allow /any/ big changes.
  Seems to make branching rather pointless.
  (c) Branch but allow /some/ big changes.
  Tree is `half open', which is not ideal.
  (d) Branch and allow /all/ changes.

 Q2: If we don't avoid such changes, and a bugfix has a conflict
 with a change in the new unstable, who is responsible for fixing
 it up ?  Options include:
  (a) the relevant maintainers (double whammy for maintainers)
  (b) the submitter of the bugfix (very undesirable)
  (c) the submitter of the big set of changes (but what do
we do if they don't respond?)
  (d) the stable tree maintainers (already ruled out, so included
in this list for completeness; out of the question IMO)

 Q3: What workflow should we use, for bugfixes for bugs in 4.6-pre ?
There are three options, not two:

  (a) Bugfixes go to 4.6 first, cherry pick to unstable
  This keeps our focus on 4.6, which is good.

  (b) Bugfixes go to 4.6 first, merge 4.6 to unstable.
  Not tenable if we have big changes in unstable.

  (c) Bugfixes to to unstable, cherry pick to 4.6.
  Undesirable IMO because it shifts focus to unstable.

Of these 2(c)/3(a) would be ideal but we don't have a good answer to
the problem posted in Q2(c).  I think that leaves us with 2(a):
maintainers have to deal with the fallout.

That makes 1(d) untenable in my view.  As a maintainer, I do not want
that additional workload.  That leaves us with 1(a) or 1(c)/2(a)/3(a).

With 1(c), who should decide on a particular series ?  Well, who is
taking the risk ?  The maintainer, who will have to pick up the
pieces.  I therefore conclude, we have two options:

A 1(a)/-/-

  Do not branch yet: defer divergence until the risk of bugfixes is
  much lower.

B 1(c)(maintainer)/2(a)/3(a)

  Branch.

  Maintainers may choose to defer patch series based on risk of
  conflicts with bugfixes required for 4.6.  Clear communication with
  submitters is required.

  Bugfixes for bugs in 4.6 will be accepted onto the 4.6 branch.
  Maintainers are required to cherry pick them onto unstable.

  Bugfixes will not be accepted for unstable unless it is clear that
  the bug was introduced in unstable since 4.6 branched.

I am happy with B because it gives the relevant maintainers the
option.

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH for-4.6] tools: Don't try to update the firmware directory on ARM

2015-08-11 Thread Wei Liu

On Tue, Aug 11, 2015 at 01:22:24PM +0100, Ian Campbell wrote:
 On Sun, 2015-08-09 at 14:49 +0100, Julien Grall wrote:
  Hi Wei,
  
  On 08/08/2015 16:16, Wei Liu wrote:
   On Fri, Aug 07, 2015 at 06:27:18PM +0100, Julien Grall wrote:
The firmware directory is not built at all on ARM. Attempting to 
update
it using the target subtree-force-update will fail when try to update
seabios.

Signed-off-by: Julien Grall julien.gr...@citrix.com

---
Cc: Ian Jackson ian.jack...@eu.citrix.com
Cc: Stefano Stabellini stefano.stabell...@eu.citrix.com
Cc: Ian Campbell ian.campb...@citrix.com
Cc: Wei Liu wei.l...@citrix.com

 I've noticed it while trying to update the QEMU tree used by Xen 
on
 a platform where iasl is not present (required by seabios in 
order
 to update it).

 I think this should go in Xen 4.6 and possibly backport to Xen 
4.5
---
  tools/Makefile | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/tools/Makefile b/tools/Makefile
index 45cb4b2..2618559 100644
--- a/tools/Makefile
+++ b/tools/Makefile
@@ -305,7 +305,9 @@ endif
  ifeq ($(CONFIG_QEMU_TRAD),y)
$(MAKE) qemu-xen-traditional-dir-force-update
  endif
+ifeq ($(CONFIG_X86),y)
$(MAKE) -C firmware subtree-force-update
+endif

   
   This is not optimal. What if you want to build OVMF on arm in the
   future?
 
 Slight aside, but I already looked at doing this but concluded that the
 right answer was to add this to raisin not xen.git. As it happens on ARM we
 would boot the UEFI binary directly, so we don't need to compile it into
 hvmloader or just through other hoops, so it is a bit easier than on x86.
 

Right. Makes sense.

You also can't preclude you don't have any other firmwares that
   need to be built on ARM in the future.
   I think a proper way of doing this is to make CONFIG_SEABIOS=n when
   you're building on ARM. See tools/configure.ac.
  
  tools/Makefile only build the firmware directory for x86 see:
  
  SUBDIRS-$(CONFIG_X86) += firmware
  
  Hence why I wrote the patch in the current way.
 
 I think having the update rule match (in spirit at least) the SUBDIRS rules
 make sense as a patch for now, so I'm in favour of taking this patch as it
 is.
 

Fine by me then.

Acked-by: Wei Liu wei.l...@citrix.com

  Building the firmware directory for would require more work than replace 
  SUBDIRS-$(CONFIG_X86) to SUBDIRS-y.
  In general, I do agree that we enable this with configure.ac but, IHMO 
  this is not Xen 4.6 material...
  
  Although I would be happy to fix it for Xen 4.7.
  
  Regards,
  

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [URGENT RFC] Branching and reopening -unstable

2015-08-11 Thread Jan Beulich

 On 11.08.15 at 12:44, wei.l...@citrix.com wrote:
 As for bug fixes, here are two options.
 
 Option 1: bug fixes go into -unstable, backport / cherry-pick bug
 fixes back to 4.6. This seems to leave the tree in half frozen status
 because we need to reject refactoring patches in case they cause
 backporting failure.
 
 Option 2: bug fixes go into 4.6, merge them to -unstable. If merge has
 conflict and maintainers can't deal with that, the authors of those
 changes in -unstable which cause conflict is responsible for fixing up
 the conflict.

I don't see why even on #2 bug fixes shouldn't go into -unstable
first - as usual backports should carry a reference to the master
commit.

And personally I'd favor the revised #2 over #1 or unrevised #2.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [v4 16/17] vmx: Add some scheduler hooks for VT-d posted interrupts

2015-08-11 Thread Jan Beulich

 On 30.07.15 at 20:26, dario.faggi...@citrix.com wrote:
 On Thu, 2015-07-30 at 02:04 +, Wu, Feng wrote:
  -Original Message-
  From: Dario Faggioli [mailto:dario.faggi...@citrix.com]
  Since this is one of the differences
  between the two, was it the cause of the issues you were seeing? If yes,
  can you elaborate on how and why?
  
  In the end, I'm not too opposed to the hook being at the beginning
  rather than at the end, but there has to be a reason, which may well end
  up better be stated in a comment...
 
 Here is the reason I put arch_vcpu_wake() ahead of vcpu_wake():
 arch_vcpu_wake() does some prerequisites for a vCPU which is about
 to run, such as, setting SN again, changing NV filed back to
 ' posted_intr_vector ', which should be finished before the vCPU is
 actually scheduled to run. However, if we put arch_vcpu_wake() later
 in vcpu_wake() right before ' vcpu_schedule_unlock_irqrestore', after
 the 'wake' hook get finished, the vcpu can run at any time (maybe in
 another pCPU since the current pCPU is protected by the lock), if
 this can happen, it is incorrect. Does my understanding make sense?
 
 It's safe in any case. In fact, the spinlock will  prevent both the
 vcpu's processor to schedule, as well as any other processors to steal
 the waking vcpu from the runqueue to run it.
 
 That's actually why I wanted to double check you changing the position
 of the hook (wrt the draft), as it felt weird that the issue were in
 there. :-)
 
 So, now that we know that safety is not an issue, where should we put
 the hook?
 
 Having it before SCHED_OP(wake) may make people think that arch specific
 code is (or can, at some point) somehow influencing the scheduler
 specific wakeup code, which is not (and should not become, if possible)
 the case.
 
 However, I kind of like the fact that the spinlock is released as soon
 as possible, after the call to SCHED_OP(wake). That will make it more
 likely, for the processors we may have sent IPIs to, during the
 scheduler specific wakeup code, to find the spinlock free. So, looking
 at things from this angle, it would be better to avoid putting stuff in
 between SCHED_OP(wake) and vcpu_schedule_unlock().
 
 So, all in all, I'd say leave it on top, where it is in this patch. Of
 course, if others have opinions, I'm all ears. :-)

If it is kept at the beginning, the hook should be renamed to
something like arch_vcpu_wake_prepare().

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [RFC 4/4] HVM x86 deprivileged mode: Trap handlers for deprivileged mode

2015-08-11 Thread Ben Catterall




On 10/08/15 11:07, Tim Deegan wrote:

Hi,


@@ -685,8 +685,17 @@ static int hap_page_fault(struct vcpu *v, unsigned long va,
  {
  struct domain *d = v-domain;

+/* If we get a page fault whilst in HVM security user mode */
+if( v-user_mode == 1 )
+{
+printk(HVM: #PF (%u:%u) whilst in user mode\n,
+ d-domain_id, v-vcpu_id);
+domain_crash_synchronous();
+}
+


This should happen in paging_fault() so it can guard the
shadow-pagetable paths too.  Once it's there, it'll need a check for
is_hvm_vcpu() as well as for user_mode.  Maybe have a helper function
'is_hvm_deprivileged_vcpu()' to do both checks, also used in
hvm_deprivileged_check_trap() c.


Ok, I'll make this change.

  HAP_ERROR(Intercepted a guest #PF (%u:%u) with HAP enabled.\n,
d-domain_id, v-vcpu_id);
+
  domain_crash(d);
  return 0;
  }
diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index 9f5a6c6..19d465f 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -74,6 +74,7 @@
  #include asm/vpmu.h
  #include public/arch-x86/cpuid.h
  #include xsm/xsm.h
+#include xen/hvm/deprivileged.h

  /*
   * opt_nmi: one of 'ignore', 'dom0', or 'fatal'.
@@ -500,6 +501,11 @@ static void do_guest_trap(
  struct trap_bounce *tb;
  const struct trap_info *ti;

+/* If we take the trap whilst in HVM deprivileged mode
+ * then we should crash the domain.
+ */
+hvm_deprivileged_check_trap(__FUNCTION__);


I wonder whether it would be better to switch to an IDT with all
unacceptable traps stubbed out, rather than have to blacklist them all
separately.  Probably not - this check is cheap, and maintaining the
parallel tables would be a pain.

Or maybe there's some single point upstream of here, in the asm
handlers, that would catch all the cases where this check is needed?


Yep, I think this can be done.

In any case, the check needs to return an error code so the caller
knows to return without running the rest of the handler (and likewise
elsewhere).


understood.

Cheers,

Tim.



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] OSSTEST -- nested test case development, RFC: ts-guest-destroy doesn't call guest_await_dhcp_tcp() if guest has fixed IP

2015-08-11 Thread Ian Jackson

Ian Campbell writes (Re: OSSTEST -- nested test case development, RFC: 
ts-guest-destroy doesn't call guest_await_dhcp_tcp() if guest has fixed IP):
 However by reconfiguring things to be static the L1 host will no longer be
 generating DHCP RENEW requests when the lease times out, so the DHCP server
 is at liberty to release the lease when it times out or, worse, reuse the
 IP address for something else.

Indeed.  This is wrong.

 So I think we do actually need to start supporting a dynamic mode for at
 least L1 hosts (and that may well easily extend to L0 hosts too). Although
 it is not 100% accurate I think we can assume that DHCP renewal will always
 work, i.e. once a host is given a particular IP address so long as it keeps
 renewing the lease it will keep the same address.

It isn't clear to me that we need to make this assumption, in the
general case.  We probably need to assume that the DHCP-assigned IP
addresses don't change unexpectedly during the execution of a
particular ts-* script (where `unexpectedly' means `other than as an
obvious consequence of actions such as rebooting).

 So I think we can still discover the DHCP address assigned to the L1 guest,
 and propagate it into $r{${l1ident}_ip} when we convert it to an L1 host,
 but we then also need to modify the Xen installation runs to use dhcp mode
 for such cases and not switch to static as we do for an L0 host.

This would be the right approach, but ...

 I'm not quite sure how this should be recorded in the runvars. I think we
 may want to wait for Ian to return from vacation next week.

... having looked at it like this, I think recording the L1 IP
addresss in the runvars is wrong.  It should be looked up each time
(by something called by selecthost).

 The alternative would be that selecthost needs to query the DHCP leases
 file for these kinds of hosts, that would have the benefit of handling
 potential lease expiry over a reboot.

Exactly.

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH V3 1/6] x86/xsaves: enable xsaves/xrstors for pv guest

2015-08-11 Thread Andrew Cooper

On 11/08/15 08:50, Shuai Ruan wrote:
 On Fri, Aug 07, 2015 at 01:44:41PM +0100, Andrew Cooper wrote:
 On 07/08/15 09:00, Shuai Ruan wrote:
 +goto skip;
 +}
 +
 +if ( !guest_kernel_mode(v, regs) || (regs-edi  0x3f) )
 What does edi have to do with xsaves?  only edx:eax are special
 according to the manual.

 regs-edi is the guest_linear_address
 Whyso?  xsaves takes an unconditional memory parameter,  not a pointer
 in %rdi.  (regs-edi is only correct for ins/outs because the pointer is
 architecturally required to be in %rdi.)
 You are right. The linear_address should be decoded from the instruction.
 There is nothing currently in emulate_privileged_op() which does ModRM
 decoding for memory references, nor SIB decoding.  xsaves/xrstors would
 be the first such operations.

 I am also not sure that adding arbitrary memory decode here is sensible.

 In an ideal world, we would have what is currently x86_emulate() split
 in 3 stages.

 Stage 1 does straight instruction decode to some internal representation.

 Stage 2 does an audit to see whether the decoded instruction is
 plausible for the reason why an emulation was needed.  We have had a
 number of security issues with emulation in the past where guests cause
 one instruction to trap for emulation, then rewrite the instruction to
 be something else, and exploit a bug in the emulator.

 Stage 3 performs the actions required for emulation.

 Currently, x86_emulate() is limited to instructions which might
 legitimately fault for emulation, but with the advent of VM
 introspection, this is proving to be insufficient.  With my x86
 maintainers hat on, I would like to avoid the current situation we have
 with multiple bits of code doing x86 instruction decode and emulation
 (which are all different).

 I think the 3-step approach above caters suitably to all usecases, but
 it is a large project itself.  It allows the introspection people to
 have a full and complete x86 emulation infrastructure, while also
 preventing areas like the shadow paging from being opened up to
 potential vulnerabilities in unrelated areas of the x86 architecture.

 I would even go so far as to say that it is probably ok not to support
 xsaves/xrestors in PV guests until something along the above lines is
 sorted.  The first feature in XSS is processor trace which a PV guest
 couldn't use anyway.  I suspect the same applies to most of the other
 Why PV guest couldn't use precessor trace?

After more consideration, Xen should not expose xsaves/xrstors to PV
guests at all.

 XSS features, or they wouldn't need to be privileged in the first place.

 Thanks for your such detail suggestions.
 For xsaves/xrstors would also bring other benefits for PV guest such as
 saving memory of XSAVE area. If we do not support xsaves/xrstors in PV , 
 PV guest would lose these benefits. What's your opinions toward this?

PV guests running under Xen are exactly the same as regular user
processes running under Linux.

There is a reason everything covered by xsaves/xrstors is restricted to
ring0; it would be a security hole to allow guests to configure the
features themselves.

Features such as Processor Trace would need a hypercall interface for
guests to use.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [RFC 3/4] HVM x86 deprivileged mode: Code for switching into/out of deprivileged mode

2015-08-11 Thread Ben Catterall




On 10/08/15 10:49, Tim Deegan wrote:

Hi,

At 17:45 +0100 on 06 Aug (1438883118), Ben Catterall wrote:

The process to switch into and out of deprivileged mode can be likened to
setjmp/longjmp.

To enter deprivileged mode, we take a copy of the stack from the guest's
registers up to the current stack pointer.


This copy is pretty unfortunate, but I can see that avoiding it will
be a bit complex.  Could we do something with more stacks?  AFAICS
there have to be three stacks anyway:

  - one to hold the depriv execution context;
  - one to hold the privileged execution context; and
  - one to take interrupts on.

So maybe we could do some fiddling to make Xen take interrupts on a
different stack while we're depriv'd?

If we do have to copy, we could track whether the original stack has
been clobbered by an interrupt, and so avoid (at least some of) the
copy back afterwards?

One nit in the assembler - if I've followed correctly, this saved IP:


+/* Perform a near call to push rip onto the stack */
+call   1f


is returned to (with adjustments) here:


+/* Go to user mode return code */
+jmp*(%rsi)


It would be good to make this a matched pair of call/ret if we can;
the CPU has special branch prediction tracking for function calls that
gets confused by a call that's not returned to.


sure, will do.

Cheers,

Tim.



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH V6 3/7] libxl: add pvusb API

2015-08-11 Thread Wei Liu

On Mon, Aug 10, 2015 at 06:35:24PM +0800, Chunyan Liu wrote:
 Add pvusb APIs, including:
  - attach/detach (create/destroy) virtual usb controller.
  - attach/detach usb device
  - list usb controller and usb devices
  - some other helper functions
 
 Signed-off-by: Chunyan Liu cy...@suse.com
 Signed-off-by: Simon Cao caobosi...@gmail.com
 
 ---
 changes:
   - Address George's comments:
   * Update libxl_device_usb_getinfo to read ctrl/port only and
 get other information.
   * Update backend path according to xenstore frontend 'xxx/backend'
 entry instead of using TOOLSTACK_DOMID.
   * Use 'type' to indicate qemu/pv instead of previous naming 'protocol'.
   * Add USB 'devtype' union, currently only includes hostdev
 

I will leave this to Ian and George since they had strong opinions on
this.

I only skimmed this patch. Some comments below.

[...]
 +
 +int libxl_device_usb_getinfo(libxl_ctx *ctx, uint32_t domid,
 + libxl_device_usb *usb,
 + libxl_usbinfo *usbinfo);
 +
  /* Network Interfaces */
  int libxl_device_nic_add(libxl_ctx *ctx, uint32_t domid, libxl_device_nic 
 *nic,
   const libxl_asyncop_how *ao_how)
 diff --git a/tools/libxl/libxl_device.c b/tools/libxl/libxl_device.c
 index bee5ed5..935f25b 100644
 --- a/tools/libxl/libxl_device.c
 +++ b/tools/libxl/libxl_device.c
 @@ -676,6 +676,10 @@ void libxl__devices_destroy(libxl__egc *egc, 
 libxl__devices_remove_state *drs)
  aodev-action = LIBXL__DEVICE_ACTION_REMOVE;
  aodev-dev = dev;
  aodev-force = drs-force;
 +if (dev-backend_kind == LIBXL__DEVICE_KIND_VUSB) {
 +libxl__initiate_device_usbctrl_remove(egc, aodev);
 +continue;
 +}

Is there a risk that this races with individual device removal? I think
you get away with it because removal of individual device is idempotent?

  libxl__initiate_device_remove(egc, aodev);
  }
  }
 diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
 index f98f089..5be3b3a 100644
 --- a/tools/libxl/libxl_internal.h
 +++ b/tools/libxl/libxl_internal.h
 @@ -2553,6 +2553,14 @@ _hidden void libxl__device_vtpm_add(libxl__egc *egc, 
 uint32_t domid,
 libxl_device_vtpm *vtpm,
 libxl__ao_device *aodev);
  
 +_hidden void libxl__device_usbctrl_add(libxl__egc *egc, uint32_t domid,
 +   libxl_device_usbctrl *usbctrl,
 +   libxl__ao_device *aodev);
 +
 +_hidden void libxl__device_usb_add(libxl__egc *egc, uint32_t domid,
 +   libxl_device_usb *usb,
 +   libxl__ao_device *aodev);
 +
  /* Internal function to connect a vkb device */
  _hidden int libxl__device_vkb_add(libxl__gc *gc, uint32_t domid,
libxl_device_vkb *vkb);
 @@ -2585,6 +2593,13 @@ _hidden void libxl__wait_device_connection(libxl__egc*,
  _hidden void libxl__initiate_device_remove(libxl__egc *egc,
 libxl__ao_device *aodev);
  
 +_hidden int libxl__device_from_usbctrl(libxl__gc *gc, uint32_t domid,
[...]
 +void libxl__device_usb_add(libxl__egc *egc, uint32_t domid,
 +   libxl_device_usb *usb,
 +   libxl__ao_device *aodev)
 +{
 +STATE_AO_GC(aodev-ao);
 +int rc = -1;
 +char *busid = NULL;
 +
 +assert(usb-u.hostdev.hostbus  0  usb-u.hostdev.hostaddr  0);
 +
 +busid = usb_busaddr_to_busid(gc, usb-u.hostdev.hostbus,
 + usb-u.hostdev.hostaddr);
 +if (!busid) {
 +LOG(ERROR, USB device doesn't exist in sysfs);
 +goto out;
 +}
 +
 +if (!is_usb_assignable(gc, usb)) {
 +LOG(ERROR, USB device is not assignable.);
 +goto out;
 +}
 +
 +/* check usb device is already assigned */
 +if (is_usb_assigned(gc, usb)) {
 +LOG(ERROR, USB device is already attached to a domain.);
 +goto out;
 +}
 +
 +rc = libxl__device_usb_setdefault(gc, domid, usb, aodev-update_json);
 +if (rc) goto out;
 +
 +rc = libxl__device_usb_add_xenstore(gc, domid, usb, aodev-update_json);
 +if (rc) goto out;
 +
 +rc = usbback_dev_assign(gc, usb);
 +if (rc) {
 +libxl__device_usb_remove_xenstore(gc, domid, usb);
 +goto out;
 +}
 +
 +libxl__ao_complete(egc, ao, 0);
 +rc = 0;
 +
 +out:

You forget to complete ao in failure path.

But I'm not very familiar with the AO machinery, I will let Ian comment
on this.

Wei.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH V6 7/7] domcreate: support pvusb in configuration file

2015-08-11 Thread Wei Liu

On Mon, Aug 10, 2015 at 06:35:28PM +0800, Chunyan Liu wrote:
 Add code to support pvusb in domain config file. One could specify
 usbctrl and usb in domain's configuration file and create domain,
 then usb controllers will be created and usb device would be attached
 to guest automatically.
 
 One could specify usb controllers and usb devices in config file
 like this:
 usbctrl=['version=2,ports=4', 'version=1, ports=4', ]
 usbdev=['2.1,controller=0,port=1', ]
 
 Signed-off-by: Chunyan Liu cy...@suse.com
 Signed-off-by: Simon Cao caobosi...@gmail.com
 ---
[...]
  }
  
 +if (!xlu_cfg_get_list(config, usbctrl, usbctrls, 0, 0)) {
 +d_config-num_usbctrls = 0;
 +d_config-usbctrls = NULL;
 +while ((buf = xlu_cfg_get_listitem(usbctrls, d_config-num_usbctrls))
 +   != NULL) {
 +libxl_device_usbctrl *usbctrl;
 +
 +d_config-usbctrls =
 +(libxl_device_usbctrl *)realloc(d_config-usbctrls,
 +sizeof(libxl_device_usbctrl) * (d_config-num_usbctrls + 1));
 +usbctrl = d_config-usbctrls + d_config-num_usbctrls;
 +libxl_device_usbctrl_init(usbctrl);
 +

Use ARRAY_EXTEND_INIT macro.

 +parse_usbctrl_config(usbctrl, buf);
 +
 +d_config-num_usbctrls++;
 +}
 +}
 +
 +if (!xlu_cfg_get_list(config, usbdev, usbs, 0, 0)) {
 +d_config-num_usbs = 0;
 +d_config-usbs = NULL;
 +while ((buf = xlu_cfg_get_listitem(usbs, d_config-num_usbs)) != 
 NULL) {
 +libxl_device_usb *usb;
 +
 +d_config-usbs = (libxl_device_usb *)realloc(d_config-usbs,
 +sizeof(libxl_device_usb) * (d_config-num_usbs + 1));
 +usb = d_config-usbs + d_config-num_usbs;
 +libxl_device_usb_init(usb);
 +

Ditto.

Wei.

 +parse_usb_config(usb, buf);
 +
 +d_config-num_usbs++;
 +}
 +}
 +
  switch (xlu_cfg_get_list(config, cpuid, cpuids, 0, 1)) {
  case 0:
  {
 -- 
 2.1.4
 
 
 ___
 Xen-devel mailing list
 Xen-devel@lists.xen.org
 http://lists.xen.org/xen-devel

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH V6 2/7] libxl_read_file_contents: add new entry to read sysfs file

2015-08-11 Thread Wei Liu

On Mon, Aug 10, 2015 at 06:35:23PM +0800, Chunyan Liu wrote:
 Sysfs file has size=4096 but actual file content is less than that.
 Current libxl_read_file_contents will treat it as error when file size
 and actual file content differs, so reading sysfs file content with
 this function always fails.
 
 Add a new entry libxl_read_sysfs_file_contents to handle sysfs file
 specially. It would be used in later pvusb work.
 
 Signed-off-by: Chunyan Liu cy...@suse.com
 
 ---
 Changes:
   - read one more byte to check bigger size problem.
 
  tools/libxl/libxl_internal.h |  2 ++
  tools/libxl/libxl_utils.c| 51 
 ++--
  2 files changed, 42 insertions(+), 11 deletions(-)
 
 diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
 index 6013628..f98f089 100644
 --- a/tools/libxl/libxl_internal.h
 +++ b/tools/libxl/libxl_internal.h
 @@ -4001,6 +4001,8 @@ void libxl__bitmap_copy_best_effort(libxl__gc *gc, 
 libxl_bitmap *dptr,
  
  int libxl__count_physical_sockets(libxl__gc *gc, int *sockets);
  #endif
 +_hidden int libxl_read_sysfs_file_contents(libxl_ctx *ctx, const char 
 *filename,
 +   void **data_r, int *datalen_r);

Indentation looks wrong.

  
  /*
   * Local variables:
 diff --git a/tools/libxl/libxl_utils.c b/tools/libxl/libxl_utils.c
 index bfc9699..9234efb 100644
 --- a/tools/libxl/libxl_utils.c
 +++ b/tools/libxl/libxl_utils.c
 @@ -322,8 +322,10 @@ out:
  return rc;
  }
  
 -int libxl_read_file_contents(libxl_ctx *ctx, const char *filename,
 - void **data_r, int *datalen_r) {
 +static int libxl_read_file_contents_core(libxl_ctx *ctx, const char 
 *filename,
 + void **data_r, int *datalen_r,
 + bool tolerate_shrinking_file)
 +{
  GC_INIT(ctx);
  FILE *f = 0;
  uint8_t *data = 0;
 @@ -359,20 +361,34 @@ int libxl_read_file_contents(libxl_ctx *ctx, const char 
 *filename,
  datalen = stab.st_size;
  
  if (stab.st_size  data_r) {
 -data = malloc(datalen);
 +data = malloc(datalen + 1);
  if (!data) goto xe;
  
 -rs = fread(data, 1, datalen, f);
 -if (rs != datalen) {
 -if (ferror(f))
 +rs = fread(data, 1, datalen + 1, f);
 +if (rs  datalen) {
 +LOG(ERROR, %s increased size while we were reading it,
 +filename);
 +goto xe;
 +}
 +
 +if (rs  datalen) {
 +if (ferror(f)) {
  LOGE(ERROR, failed to read %s, filename);
 -else if (feof(f))
 -LOG(ERROR, %s changed size while we were reading it,
 - filename);
 -else
 +goto xe;
 +} else if (feof(f)) {
 +if (tolerate_shrinking_file) {
 +datalen = rs;
 +} else {
 +LOG(ERROR, %s shrunk size while we were reading it,
 +filename);
 +goto xe;
 +}
 +} else {
  abort();
 -goto xe;
 +}

This is a bit bikeshedding, but you can leave goto xe out of two `if'
to reduce patch size.

  }
 +
 +data = realloc(data, datalen);

Should check return value of realloc.

The logic of this function reflects what has been discussed so far.

Wei.

  }
  
  if (fclose(f)) {
 @@ -396,6 +412,19 @@ int libxl_read_file_contents(libxl_ctx *ctx, const char 
 *filename,
  return e;
  }
  
 +int libxl_read_file_contents(libxl_ctx *ctx, const char *filename,
 + void **data_r, int *datalen_r)
 +{
 +return libxl_read_file_contents_core(ctx, filename, data_r, datalen_r, 
 0);
 +}
 +
 +int libxl_read_sysfs_file_contents(libxl_ctx *ctx, const char *filename,
 +   void **data_r, int *datalen_r)
 +{
 +return libxl_read_file_contents_core(ctx, filename, data_r, datalen_r, 
 1);
 +}
 +
 +
  #define READ_WRITE_EXACTLY(rw, zero_is_eof, constdata)\
\
int libxl_##rw##_exactly(libxl_ctx *ctx, int fd, \
 -- 
 2.1.4

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [URGENT RFC] Branching and reopening -unstable

2015-08-11 Thread Wei Liu

CCing Hongyang, I missed him when I copy-n-paste emails from MAINTAINERS.

On Tue, Aug 11, 2015 at 11:44:07AM +0100, Wei Liu wrote:
 Hi all
 
 RC1 is going to be tagged this week (maybe today). We need to figure
 out when to branch / reopen -unstable for committing and what rules
 should be applied until 4.6 is out of the door.
 
 Ian, Ian and I had a conversation IRL. We discussed several things,
 but figured it is necessary to have more people involved before making
 any decision.
 
 Here is my recollection of the conversation.
 
 Branching should be done at one of the RC tags. It might not be enough
 time for us to reach consensus before tagging RC1, so I would say lets
 branch at RC2 if we don't observe blocker bugs.
 
 Maintainers should be responsible for both 4.6 branch and -unstable
 branch.
 
 As for bug fixes, here are two options.
 
 Option 1: bug fixes go into -unstable, backport / cherry-pick bug
 fixes back to 4.6. This seems to leave the tree in half frozen status
 because we need to reject refactoring patches in case they cause
 backporting failure.
 
 Option 2: bug fixes go into 4.6, merge them to -unstable. If merge has
 conflict and maintainers can't deal with that, the authors of those
 changes in -unstable which cause conflict is responsible for fixing up
 the conflict.
 
 Ian and Ian, anything I miss? Anything to add?
 
 Others, thoughts?
 
 Wei.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH OSSTEST v2] Arrange to test migration from the previous Xen version

2015-08-11 Thread Ian Campbell

On Mon, 2015-08-03 at 17:01 +0100, Ian Campbell wrote:
 On Fri, 2015-07-24 at 17:28 +0100, Ian Campbell wrote:
  @@ -191,6 +208,27 @@ create_build_jobs () {
   revision_ovmf=$REVISION_OVMF
   done
   
  +if [ x$want_prevxen = xy ] ; then
  +if [ x$REVISION_PREVXEN = x ] ; then
  +echo 2 prevxen ?; exit 1
  +fi
 
 This breaks things with standalone mode, or any make-flight which didn't
 come from cr-daily-branch.
 
 In such cases we don't have REVISION_XEN or TREE_XEN either, we just get
 the defaults.
 
 I think we need to do something like select_prevxenbranch but to pick a
 xen.git branch name rather than an osstest branch name.
 
 Or we quietly skip this test if REVISION_PREVXEN is not set.
 
 One to chew on I think.

At the moment I'm somewhat inclined towards omitting the build-$ARCH-prev
job in this case but still creating the associated test jobs.

In standalone mode this may still be useful (maybe your hosts are already
configured and you want to run an individual step).

In production mode the test jobs will then fail their ts-build-check step,
which correctly reflects what has happened.

I think this is the effect of the following incremental patch.

Ian.

diff --git a/mfi-common b/mfi-common
index 737db99..810e533 100644
--- a/mfi-common
+++ b/mfi-common
@@ -208,10 +208,7 @@ create_build_jobs () {
 revision_ovmf=$REVISION_OVMF
 done
 
-if [ x$want_prevxen = xy ] ; then
-if [ x$REVISION_PREVXEN = x ] ; then
-echo 2 prevxen ?; exit 1
-fi
+if [ x$want_prevxen = xy -a x$REVISION_PREVXEN != x ] ; then
 # TODO could find latest pass on that branch and attempt to reuse.
 #bfiprevxen=...
 #

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH xen-tip] xen/PMU: __pcpu_scope_xenpmu_shared can be static

2015-08-11 Thread kbuild test robot


Signed-off-by: Fengguang Wu fengguang...@intel.com
---
 pmu.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/xen/pmu.c b/arch/x86/xen/pmu.c
index 7218cea..1d1ae1b 100644
--- a/arch/x86/xen/pmu.c
+++ b/arch/x86/xen/pmu.c
@@ -15,7 +15,7 @@
 
 
 /* Shared page between hypervisor and domain */
-DEFINE_PER_CPU(struct xen_pmu_data *, xenpmu_shared);
+static DEFINE_PER_CPU(struct xen_pmu_data *, xenpmu_shared);
 #define get_xenpmu_data()per_cpu(xenpmu_shared, smp_processor_id())
 
 /* perf callbacks */

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [xen-tip:linux-next 19/23] arch/x86/xen/pmu.c:18:1: sparse: symbol '__pcpu_scope_xenpmu_shared' was not declared. Should it be static?

2015-08-11 Thread kbuild test robot

tree:   git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip linux-next
head:   0d26d72cab825a0227c8d8e0e42161125b3116fd
commit: 9cd3857a7d89a259870c6ee6994f5ef41511654c [19/23] xen/PMU: 
Initialization code for Xen PMU
reproduce:
  # apt-get install sparse
  git checkout 9cd3857a7d89a259870c6ee6994f5ef41511654c
  make ARCH=x86_64 allmodconfig
  make C=1 CF=-D__CHECK_ENDIAN__


sparse warnings: (new ones prefixed by )

 arch/x86/xen/pmu.c:18:1: sparse: symbol '__pcpu_scope_xenpmu_shared' was not 
 declared. Should it be static?

Please review and possibly fold the followup patch.

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [v4 11/17] vt-d: Add API to update IRTE when VT-d PI is used

2015-08-11 Thread Jan Beulich

 On 28.07.15 at 09:34, feng...@intel.com wrote:
 From: Jan Beulich [mailto:jbeul...@suse.com]
 Sent: Friday, July 24, 2015 11:28 PM
  On 23.07.15 at 13:35, feng...@intel.com wrote:
  +GET_IREMAP_ENTRY(ir_ctrl-iremap_maddr, remap_index,
 iremap_entries, p);
  +
  +old_ire = new_ire = *p;
  +
  +/* Setup/Update interrupt remapping table entry. */
  +setup_posted_irte(new_ire, pi_desc, gvec);
  +ret = cmpxchg16b(p, old_ire, new_ire);
  +
  +ASSERT(ret == *(__uint128_t *)old_ire);
  +
  +iommu_flush_cache_entry(p, sizeof(struct iremap_entry));
 
 sizeof(*p) please.
 
  +iommu_flush_iec_index(iommu, 0, remap_index);
  +
  +if ( iremap_entries )
  +unmap_vtd_domain_page(iremap_entries);
 
 The conditional comes way too late: Either GET_IREMAP_ENTRY()
 can produce NULL, in which case you're hosed above. Or it can't,
 in which case the check here is pointless.
 
 I cannot find the case GET_IREMAP_ENTRY() produce NULL for
 iremap_entries,

And I didn't say it would - I simply listed both possibilities and their
respective consequences for your code.

 if it is, GET_IREMAP_ENTRY() itself will get
 a big problem, right? So this check is not needed, maybe I can
 add an ASSERT() after GET_IREMAP_ENTRY().

You might, but iirc no other uses do so, so you could as well omit
any such checks.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH for-4.6] tools: Don't try to update the firmware directory on ARM

2015-08-11 Thread Ian Campbell

On Sun, 2015-08-09 at 14:49 +0100, Julien Grall wrote:
 Hi Wei,
 
 On 08/08/2015 16:16, Wei Liu wrote:
  On Fri, Aug 07, 2015 at 06:27:18PM +0100, Julien Grall wrote:
   The firmware directory is not built at all on ARM. Attempting to 
   update
   it using the target subtree-force-update will fail when try to update
   seabios.
   
   Signed-off-by: Julien Grall julien.gr...@citrix.com
   
   ---
   Cc: Ian Jackson ian.jack...@eu.citrix.com
   Cc: Stefano Stabellini stefano.stabell...@eu.citrix.com
   Cc: Ian Campbell ian.campb...@citrix.com
   Cc: Wei Liu wei.l...@citrix.com
   
I've noticed it while trying to update the QEMU tree used by Xen 
   on
a platform where iasl is not present (required by seabios in 
   order
to update it).
   
I think this should go in Xen 4.6 and possibly backport to Xen 
   4.5
   ---
 tools/Makefile | 2 ++
 1 file changed, 2 insertions(+)
   
   diff --git a/tools/Makefile b/tools/Makefile
   index 45cb4b2..2618559 100644
   --- a/tools/Makefile
   +++ b/tools/Makefile
   @@ -305,7 +305,9 @@ endif
 ifeq ($(CONFIG_QEMU_TRAD),y)
 $(MAKE) qemu-xen-traditional-dir-force-update
 endif
   +ifeq ($(CONFIG_X86),y)
 $(MAKE) -C firmware subtree-force-update
   +endif
   
  
  This is not optimal. What if you want to build OVMF on arm in the
  future?

Slight aside, but I already looked at doing this but concluded that the
right answer was to add this to raisin not xen.git. As it happens on ARM we
would boot the UEFI binary directly, so we don't need to compile it into
hvmloader or just through other hoops, so it is a bit easier than on x86.

   You also can't preclude you don't have any other firmwares that
  need to be built on ARM in the future.
  I think a proper way of doing this is to make CONFIG_SEABIOS=n when
  you're building on ARM. See tools/configure.ac.
 
 tools/Makefile only build the firmware directory for x86 see:
 
 SUBDIRS-$(CONFIG_X86) += firmware
 
 Hence why I wrote the patch in the current way.

I think having the update rule match (in spirit at least) the SUBDIRS rules
make sense as a patch for now, so I'm in favour of taking this patch as it
is.

 Building the firmware directory for would require more work than replace 
 SUBDIRS-$(CONFIG_X86) to SUBDIRS-y.
 In general, I do agree that we enable this with configure.ac but, IHMO 
 this is not Xen 4.6 material...
 
 Although I would be happy to fix it for Xen 4.7.
 
 Regards,
 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [URGENT RFC] Branching and reopening -unstable

2015-08-11 Thread Andrew Cooper

On 11/08/15 12:13, Ian Jackson wrote:
 Wei Liu writes ([URGENT RFC] Branching and reopening -unstable):
 Branching should be done at one of the RC tags. It might not be enough
 time for us to reach consensus before tagging RC1, so I would say lets
 branch at RC2 if we don't observe blocker bugs.

 Maintainers should be responsible for both 4.6 branch and -unstable
 branch.

 As for bug fixes, here are two options.
 I think this conflates the three questions which should be answered:

  Q1: What is the status of the newly branched -unstable ?  Should
  we avoid (some or all) big sets of changes ?
   (a) Don't branch
   (b) Branch but don't allow /any/ big changes.
   Seems to make branching rather pointless.
   (c) Branch but allow /some/ big changes.
   Tree is `half open', which is not ideal.
   (d) Branch and allow /all/ changes.

  Q2: If we don't avoid such changes, and a bugfix has a conflict
  with a change in the new unstable, who is responsible for fixing
  it up ?  Options include:
   (a) the relevant maintainers (double whammy for maintainers)
   (b) the submitter of the bugfix (very undesirable)
   (c) the submitter of the big set of changes (but what do
 we do if they don't respond?)
   (d) the stable tree maintainers (already ruled out, so included
 in this list for completeness; out of the question IMO)

  Q3: What workflow should we use, for bugfixes for bugs in 4.6-pre ?
 There are three options, not two:

   (a) Bugfixes go to 4.6 first, cherry pick to unstable
   This keeps our focus on 4.6, which is good.

   (b) Bugfixes go to 4.6 first, merge 4.6 to unstable.
   Not tenable if we have big changes in unstable.

   (c) Bugfixes to to unstable, cherry pick to 4.6.
   Undesirable IMO because it shifts focus to unstable.

 Of these 2(c)/3(a) would be ideal but we don't have a good answer to
 the problem posted in Q2(c).  I think that leaves us with 2(a):
 maintainers have to deal with the fallout.

 That makes 1(d) untenable in my view.  As a maintainer, I do not want
 that additional workload.  That leaves us with 1(a) or 1(c)/2(a)/3(a).

 With 1(c), who should decide on a particular series ?  Well, who is
 taking the risk ?  The maintainer, who will have to pick up the
 pieces.  I therefore conclude, we have two options:

 A 1(a)/-/-

   Do not branch yet: defer divergence until the risk of bugfixes is
   much lower.

 B 1(c)(maintainer)/2(a)/3(a)

   Branch.

   Maintainers may choose to defer patch series based on risk of
   conflicts with bugfixes required for 4.6.  Clear communication with
   submitters is required.

   Bugfixes for bugs in 4.6 will be accepted onto the 4.6 branch.
   Maintainers are required to cherry pick them onto unstable.

   Bugfixes will not be accepted for unstable unless it is clear that
   the bug was introduced in unstable since 4.6 branched.

 I am happy with B because it gives the relevant maintainers the
 option.

Very much A.

By definition, 1(c) will destabilise the tree and generate artificial
work for the maintainers and committers.

The most important action at this point is to stabilise 4.6 for release,
and peoples efforts are far better spent pursuing that, rather than
continuing work on unstable.

For the sake of a couple of weeks, contributors can keep their patches
for a little while longer.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [URGENT RFC] Branching and reopening -unstable

2015-08-11 Thread Stefano Stabellini

On Tue, 11 Aug 2015, Ian Jackson wrote:
 Wei Liu writes ([URGENT RFC] Branching and reopening -unstable):
  Branching should be done at one of the RC tags. It might not be enough
  time for us to reach consensus before tagging RC1, so I would say lets
  branch at RC2 if we don't observe blocker bugs.
  
  Maintainers should be responsible for both 4.6 branch and -unstable
  branch.
  
  As for bug fixes, here are two options.
 
 I think this conflates the three questions which should be answered:
 
  Q1: What is the status of the newly branched -unstable ?  Should
  we avoid (some or all) big sets of changes ?
   (a) Don't branch
   (b) Branch but don't allow /any/ big changes.
   Seems to make branching rather pointless.
   (c) Branch but allow /some/ big changes.
   Tree is `half open', which is not ideal.
   (d) Branch and allow /all/ changes.
 
  Q2: If we don't avoid such changes, and a bugfix has a conflict
  with a change in the new unstable, who is responsible for fixing
  it up ?  Options include:
   (a) the relevant maintainers (double whammy for maintainers)
   (b) the submitter of the bugfix (very undesirable)

Why is it very undesirable?

In the Linux community for example is customary to provide a patch for
each of the stable trees you need backports to, in case there are any
merge conflicts. This would be the same.


   (c) the submitter of the big set of changes (but what do
 we do if they don't respond?)
   (d) the stable tree maintainers (already ruled out, so included
 in this list for completeness; out of the question IMO)
 
  Q3: What workflow should we use, for bugfixes for bugs in 4.6-pre ?
 There are three options, not two:
 
   (a) Bugfixes go to 4.6 first, cherry pick to unstable
   This keeps our focus on 4.6, which is good.
 
   (b) Bugfixes go to 4.6 first, merge 4.6 to unstable.
   Not tenable if we have big changes in unstable.
 
   (c) Bugfixes to to unstable, cherry pick to 4.6.
   Undesirable IMO because it shifts focus to unstable.
 
 Of these 2(c)/3(a) would be ideal but we don't have a good answer to
 the problem posted in Q2(c).  I think that leaves us with 2(a):
 maintainers have to deal with the fallout.
 
 That makes 1(d) untenable in my view.  As a maintainer, I do not want
 that additional workload.  That leaves us with 1(a) or 1(c)/2(a)/3(a).
 
 With 1(c), who should decide on a particular series ?  Well, who is
 taking the risk ?  The maintainer, who will have to pick up the
 pieces.  I therefore conclude, we have two options:
 
 A 1(a)/-/-
 
   Do not branch yet: defer divergence until the risk of bugfixes is
   much lower.
 
 B 1(c)(maintainer)/2(a)/3(a)
 
   Branch.
 
   Maintainers may choose to defer patch series based on risk of
   conflicts with bugfixes required for 4.6.  Clear communication with
   submitters is required.
 
   Bugfixes for bugs in 4.6 will be accepted onto the 4.6 branch.
   Maintainers are required to cherry pick them onto unstable.
 
   Bugfixes will not be accepted for unstable unless it is clear that
   the bug was introduced in unstable since 4.6 branched.
 
 I am happy with B because it gives the relevant maintainers the
 option.
 
 Ian.
 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] Design doc of adding ACPI support for arm64 on Xen - version 2

2015-08-11 Thread Shannon Zhao



On 2015/8/11 17:46, Julien Grall wrote:
 On 11/08/15 03:09, Shannon Zhao wrote:
 Hi Julien,
 
 Hi Shannon,
 
 On 2015/8/7 18:33, Julien Grall wrote:
 Hi Shannon,

 Just some clarification questions.

 On 07/08/15 03:11, Shannon Zhao wrote:
 3. Dom0 gets grant table and event channel irq information
 ---
 As said above, we assign the hypervisor_id be XenVMM to tell Dom0 that
 it runs on Xen hypervisor.

 For grant table, add two new HVM_PARAMs: HVM_PARAM_GNTTAB_START_ADDRESS
 and HVM_PARAM_GNTTAB_SIZE.

 For event channel irq, reuse HVM_PARAM_CALLBACK_IRQ and add a new
 delivery type:
 val[63:56] == 3: val[15:8] is flag: val[7:0] is a PPI (ARM and ARM64
 only)

 Can you describe the content of flag?


 This needs definition as well. I think it could use the definition of
 xenv table. Bit 0 stands interrupt mode and bit 1 stands interrupt
 polarity. And explain it in the comment of HVM_PARAM_CALLBACK_IRQ.
 
 That would be fine for me.
 
 When constructing Dom0 in Xen, save these values. Then Dom0 could get
 them through hypercall HVMOP_get_param.

 4. Map MMIO regions
 ---
 Register a bus_notifier for platform and amba bus in Linux. Add a new
 XENMAPSPACE XENMAPSPACE_dev_mmio. Within the register, check if the
 device is newly added, then call hypercall XENMEM_add_to_physmap to map
 the mmio regions.

 5. Route device interrupts to Dom0
 --
 Route all the SPI interrupts to Dom0 before Dom0 booting.

 Not all the SPI will be routed to DOM0. Some are used by Xen and should
 never be used by any guest. I have in mind the UART and SMMU interrupts.

 You will have to find away to skip them nicely. Note that not all the
 IRQs used by Xen are properly registered when we build DOM0 (see the SMMU).

 To uart, we can get the interrupt information from SPCR table and hide
 it from Dom0.
 
 Can you clarify your meaning of hide from DOM0? Did you mean avoid to
 route the SPI to DOM0?
 
Yes.

 IIUC, currently Xen (as well as Linux) doesn't support use SMMU when
 booting with ACPI. When it supports, it could read the interrupts
 information from IORT table and Hide them from Dom0.
 
 Well for Xen we don't even have ACPI supported upstream ;). For Linux
 there is some on-going work. Anyway, this is not important right now.
 

Yeah, that could be done after this patchset upstream.

Thanks,
-- 
Shannon


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH V6 1/7] libxl: export some functions for pvusb use

2015-08-11 Thread Wei Liu

On Mon, Aug 10, 2015 at 06:35:22PM +0800, Chunyan Liu wrote:
 Signed-off-by: Chunyan Liu cy...@suse.com
 Signed-off-by: Simon Cao caobosi...@gmail.com

Acked-by: Wei Liu wei.l...@citrix.com

 
 ---
  tools/libxl/libxl.c  | 4 ++--
  tools/libxl/libxl_internal.h | 3 +++
  2 files changed, 5 insertions(+), 2 deletions(-)
 
 diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
 index 083f099..006e8da 100644
 --- a/tools/libxl/libxl.c
 +++ b/tools/libxl/libxl.c
 @@ -1995,7 +1995,7 @@ out:
  }
  
  /* common function to get next device id */
 -static int libxl__device_nextid(libxl__gc *gc, uint32_t domid, char *device)
 +int libxl__device_nextid(libxl__gc *gc, uint32_t domid, char *device)
  {
  char *dompath, **l;
  unsigned int nb;
 @@ -2014,7 +2014,7 @@ static int libxl__device_nextid(libxl__gc *gc, uint32_t 
 domid, char *device)
  return nextid;
  }
  
 -static int libxl__resolve_domid(libxl__gc *gc, const char *name,
 +int libxl__resolve_domid(libxl__gc *gc, const char *name,
  uint32_t *domid)

Nit: please adjust indentation.

  {
  if (!name)
 diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
 index 6ea6c83..6013628 100644
 --- a/tools/libxl/libxl_internal.h
 +++ b/tools/libxl/libxl_internal.h
 @@ -1152,6 +1152,9 @@ _hidden int libxl__init_console_from_channel(libxl__gc 
 *gc,
   libxl__device_console *console,
   int dev_num,
   libxl_device_channel *channel);
 +_hidden int libxl__device_nextid(libxl__gc *gc, uint32_t domid, char 
 *device);
 +_hidden int libxl__resolve_domid(libxl__gc *gc, const char *name,
 + uint32_t *domid);
  
  /*
   * For each aggregate type which can be used as an input we provide:
 -- 
 2.1.4

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [linux-next test] 60648: regressions - FAIL

2015-08-11 Thread osstest service owner

flight 60648 linux-next real [real]
http://logs.test-lab.xenproject.org/osstest/logs/60648/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm 9 debian-hvm-install fail 
REGR. vs. 60637
 test-amd64-i386-qemut-rhel6hvm-amd  9 redhat-install  fail REGR. vs. 60637

Regressions which are regarded as allowable (not blocking):
 test-armhf-armhf-xl-rtds 11 guest-start   fail REGR. vs. 60637
 test-amd64-i386-xl   14 guest-saverestorefail   like 60637
 test-amd64-i386-xl-xsm   14 guest-saverestorefail   like 60637
 test-amd64-i386-pair21 guest-migrate/src_host/dst_host fail like 60637
 test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop  fail like 60637
 test-amd64-i386-xl-qemut-win7-amd64 17 guest-stop  fail like 60637
 test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stop fail like 60637

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-pvh-intel 14 guest-saverestorefail  never pass
 test-amd64-amd64-xl-pvh-amd  11 guest-start  fail   never pass
 test-armhf-armhf-libvirt-raw  9 debian-di-installfail   never pass
 test-armhf-armhf-xl-qcow2 9 debian-di-installfail   never pass
 test-armhf-armhf-libvirt-vhd  9 debian-di-installfail   never pass
 test-armhf-armhf-xl-raw   9 debian-di-installfail   never pass
 test-armhf-armhf-xl-vhd   9 debian-di-installfail   never pass
 test-armhf-armhf-libvirt-qcow2  9 debian-di-installfail never pass
 test-amd64-i386-libvirt-xsm  12 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  14 guest-saverestorefail   never pass
 test-amd64-amd64-libvirt 12 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-pair 21 guest-migrate/src_host/dst_host fail never 
pass
 test-amd64-amd64-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  14 guest-saverestorefail   never pass
 test-amd64-i386-libvirt  12 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-pair 21 guest-migrate/src_host/dst_host fail never pass
 test-armhf-armhf-xl-arndale  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  13 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qcow2 11 migrate-support-checkfail never pass
 test-armhf-armhf-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-xsm 14 guest-saverestorefail   never pass
 test-amd64-i386-libvirt-vhd  11 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-raw 11 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 11 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check 
fail never pass
 test-armhf-armhf-xl-xsm  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-xsm  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  13 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check 
fail never pass
 test-armhf-armhf-xl-cubietruck 12 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 13 saverestore-support-checkfail never pass
 test-armhf-armhf-xl-multivcpu 13 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 12 migrate-support-checkfail  never pass
 test-armhf-armhf-libvirt 14 guest-saverestorefail   never pass
 test-armhf-armhf-libvirt 12 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-raw  11 migrate-support-checkfail   never pass
 test-amd64-amd64-xl-qemut-win7-amd64 17 guest-stop fail never pass
 test-amd64-i386-libvirt-qcow2 11 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-credit2  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  12 migrate-support-checkfail   never pass

version targeted for testing:
 linuxb195df50400676bdeaacdca27051e1a71ccd570f
baseline version:
 linuxdd2384a75d1c046faf068a6352732a204814b86d

Last test of basis  (not found) 
Failing since 0  1970-01-01 00:00:00 Z 16658 days
Testing same since60648  2015-08-10 09:20:46 Z1 days1 attempts

jobs:
 build-amd64-xsm  pass
 build-armhf-xsm  pass
 build-i386-xsm   pass
 build-amd64

Re: [Xen-devel] [URGENT RFC] Branching and reopening -unstable

2015-08-11 Thread Ian Campbell

On Tue, 2015-08-11 at 12:13 +0100, Ian Jackson wrote:
 Wei Liu writes ([URGENT RFC] Branching and reopening -unstable):
  Branching should be done at one of the RC tags. It might not be enough
  time for us to reach consensus before tagging RC1, so I would say lets
  branch at RC2 if we don't observe blocker bugs.
  
  Maintainers should be responsible for both 4.6 branch and -unstable
  branch.
  
  As for bug fixes, here are two options.
 
 I think this conflates the three questions which should be answered:
 
  Q1: What is the status of the newly branched -unstable ?  Should
  we avoid (some or all) big sets of changes ?
   (a) Don't branch
   (b) Branch but don't allow /any/ big changes.
   Seems to make branching rather pointless.
   (c) Branch but allow /some/ big changes.
   Tree is `half open', which is not ideal.
   (d) Branch and allow /all/ changes.
 
  Q2: If we don't avoid such changes, and a bugfix has a conflict
  with a change in the new unstable, who is responsible for fixing
  it up ?  Options include:
   (a) the relevant maintainers (double whammy for maintainers)
   (b) the submitter of the bugfix (very undesirable)
   (c) the submitter of the big set of changes (but what do
 we do if they don't respond?)
   (d) the stable tree maintainers (already ruled out, so included
 in this list for completeness; out of the question IMO)
 
  Q3: What workflow should we use, for bugfixes for bugs in 4.6-pre ?
 There are three options, not two:
 
   (a) Bugfixes go to 4.6 first, cherry pick to unstable
   This keeps our focus on 4.6, which is good.
 
   (b) Bugfixes go to 4.6 first, merge 4.6 to unstable.
   Not tenable if we have big changes in unstable.
 
   (c) Bugfixes to to unstable, cherry pick to 4.6.
   Undesirable IMO because it shifts focus to unstable.

FWIW I think historically we have always done (c) here. That's not to say
we shouldn't change but thought it worth noting.

 Of these 2(c)/3(a) would be ideal but we don't have a good answer to
 the problem posted in Q2(c).  I think that leaves us with 2(a):
 maintainers have to deal with the fallout.
 
 That makes 1(d) untenable in my view.  As a maintainer, I do not want
 that additional workload.  That leaves us with 1(a) or 1(c)/2(a)/3(a).
 
 With 1(c), who should decide on a particular series ?  Well, who is
 taking the risk ?  The maintainer, who will have to pick up the
 pieces.  I therefore conclude, we have two options:
 
 A 1(a)/-/-
 
   Do not branch yet: defer divergence until the risk of bugfixes is
   much lower.
 
 B 1(c)(maintainer)/2(a)/3(a)
 
   Branch.
 
   Maintainers may choose to defer patch series based on risk of
   conflicts with bugfixes required for 4.6.  Clear communication with
   submitters is required.
 
   Bugfixes for bugs in 4.6 will be accepted onto the 4.6 branch.
   Maintainers are required to cherry pick them onto unstable.
 
   Bugfixes will not be accepted for unstable unless it is clear that
   the bug was introduced in unstable since 4.6 branched.
 
 I am happy with B because it gives the relevant maintainers the
 option.
 
 Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [URGENT RFC] Branching and reopening -unstable

2015-08-11 Thread Stefano Stabellini

On Tue, 11 Aug 2015, Wei Liu wrote:
 Hi all
 
 RC1 is going to be tagged this week (maybe today). We need to figure
 out when to branch / reopen -unstable for committing and what rules
 should be applied until 4.6 is out of the door.
 
 Ian, Ian and I had a conversation IRL. We discussed several things,
 but figured it is necessary to have more people involved before making
 any decision.
 
 Here is my recollection of the conversation.
 
 Branching should be done at one of the RC tags. It might not be enough
 time for us to reach consensus before tagging RC1, so I would say lets
 branch at RC2 if we don't observe blocker bugs.
 
 Maintainers should be responsible for both 4.6 branch and -unstable
 branch.
 
 As for bug fixes, here are two options.
 
 Option 1: bug fixes go into -unstable, backport / cherry-pick bug
 fixes back to 4.6. This seems to leave the tree in half frozen status
 because we need to reject refactoring patches in case they cause
 backporting failure.
 
 Option 2: bug fixes go into 4.6, merge them to -unstable. If merge has
 conflict and maintainers can't deal with that, the authors of those
 changes in -unstable which cause conflict is responsible for fixing up
 the conflict.
 
 Ian and Ian, anything I miss? Anything to add?
 
 Others, thoughts?

I don't see why Option 1 should be different from Option 2 in terms of
dealing with conflicts. I think that in both cases we should just ask
contributors for help to fix the conflict.

So I would go for a revised Option 1:

Option 1b: bug fixes go into -unstable, backport / cherry-pick bug fixes
back to 4.6. If merge has conflict and maintainers can't deal with that,
the authors of those changes in -unstable which cause conflict is
responsible for fixing up the conflict.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v4 07/11] x86/intel_pstate: the main boby of the intel_pstate driver

2015-08-11 Thread Jan Beulich

 On 27.07.15 at 11:30, wei.w.w...@intel.com wrote:
 On 24/07/2015 21:54,  Jan Beulich wrote:
  On 25.06.15 at 13:16, wei.w.w...@intel.com wrote:
  +int __init intel_pstate_init(void)
  +{
  +  int cpu, rc = 0;
  +  const struct x86_cpu_id *id;
  +  struct cpu_defaults *cpu_info;
  +
  +  id = x86_match_cpu(intel_pstate_cpu_ids);
  +  if (!id)
  +  return -ENODEV;
  +
  +  cpu_info = (struct cpu_defaults *)id-driver_data;
  +
  +  copy_pid_params(cpu_info-pid_policy);
  +  copy_cpu_funcs(cpu_info-funcs);
  +
  +  if (intel_pstate_msrs_not_valid())
  +  return -ENODEV;
  +
  +  all_cpu_data = xzalloc_array(struct cpudata *, NR_CPUS);
  +  if (!all_cpu_data)
  +  return -ENOMEM;
  +
  +  rc = cpufreq_register_driver(intel_pstate_driver);
  +  if (rc)
  +  goto out;
  +
  +  return rc;
  +out:
  +  for_each_online_cpu(cpu) {
  +  if (all_cpu_data[cpu]) {
  +  kill_timer(all_cpu_data[cpu]-timer);
  +  xfree(all_cpu_data[cpu]);
  +  }
  +  }
 
 I have a hard time seeing where in this function the setup happens that is
 being undone here (keeping in mind that the notifier registration inside
 cpufreq_register_driver() doesn't actually call the notifier function).
 
 And then, looking at the diff between this and what Linux 4.2-rc3 has (which
 admittedly looks a little newer than what you sent, so I already subtract
 some of the delta), it is significantly larger than the source file itself. 
 That
 surely doesn't suggest a clone-with- minimal-delta. Yet as said before - 
 either
 you do that, or you accept us picking at things you inherited from Linux.
 
 I think it's better to choose the latter - picking out things that are useful 
 for us from Linux.
 Can you please take a look this patch and summarize the comments? Thanks.

I'm sorry, but for a first round I'd rather expect _you_ to go through
the code you intend to add and spot possible problems. Only then, on
a submission where you state that you did so, would I want to invest
time in sanity checking things.

And then I hope you realize that the clone-with-minimal-delta would
have benefits on the maintenance side going forward (fewer manual
adjustments needed due to non-applying Linux side changes).

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] RFC: HVM de-privileged mode scheduling considerations

2015-08-11 Thread Ben Catterall




On 04/08/15 14:46, George Dunlap wrote:

On Mon, Aug 3, 2015 at 3:34 PM, Ian Campbell ian.campb...@citrix.com wrote:

On Mon, 2015-08-03 at 14:54 +0100, Andrew Cooper wrote:

On 03/08/15 14:35, Ben Catterall wrote:

Hi all,

I am working on an x86 proof-of-concept to evaluate if it is feasible
to move device models and x86 emulation code for HVM guests into a
de-privileged context.

I was hoping to get feedback from relevant maintainers on scheduling
considerations for this system to mitigate potential DoS attacks.

Many thanks in advance,
Ben

This is intended as a proof-of-concept, with the aim of determining if
this idea is feasible within performance constraints.

Motivation
--
The motivation for moving the device models and x86 emulation code
into ring 3 is to mitigate a system  compromise due a bug in any of
these systems. These systems are currently part of the hypervisor and,
consequently, a bug in any of these could allow an attacker to gain
control (or perform a DOS) of
Xen and/or guests.

Migrating between PCPUs
---
There is a need to support migration between pcpus so that the
scheduler can still perform this operation. However, there is an issue
to resolve. Currently, I have a per-vcpu copy of the Xen ring 0 stack
up to the point of entering the de-privileged mode. This allows us to
restore this stack and then continue from the entry point when we have
finished in de-privileged mode. There will be per-pcpu data on these
per-vcpu stacks such as saved stack frame pointers for the per-pcpu
stack, smp_processor_id() responses etc.

Therefore, it will be necessary to lock the vcpu to the current pcpu
when it enters this user mode so that it does not wake up on a
different pcpu where such pointers and other data are invalid. We can
do this by setting a hard affinity to the pcpu that the vcpu is
executing on. See common/wait.c which does something similar to what I
am doing.

However, needing to have hard affinity to a pcpu leads to the
following problem:
- An attacker could lock multiple vcpus to a single pcpu, leading to a
DoS. This could be achieved by  spinning in a loop in Xen
de-privileged mode (assuming a bug in this mode) and performing this
operation on multiple vcpus at once. The attacker could wait until all
of their vcpus were on the same pcpu and then execute this attack.
This could cause the pcpu to, effectively, lock up, as it will be
under heavy load, and we would be unable to move work elsewhere.

A solution to the DoS would be to force migration to another pcpu, if
after, say, 100 quanta have passed where the vcpu has remained in
de-privileged mode. This forcing of migration would require us to
forcibly complete the de-privileged operation, and then, just before
returning into the guest, force a cpu change. We could not just force
a migration at the schedule call point as the Xen stack needs to
unwind to free up resources. We would reset this count each time we
completed a de-privileged mode operation.

A legitimate long-running de-privileged operation would trigger this
forced migration mechanism. However, it is unlikely that such
operations will be needed and the count can be adjusted appropriately
to mitigate this.

Any suggestions or feedback would be appreciated!


I don't see why any scheduling support is needed.

Currently all operations like this are run synchronously in the vmexit
context of the vcpu.  Any current DoS is already a real issue.


The point is that this work is supposed to mitigate (or eliminate) such
issues, so we would like to remove this existing real issue.

IOW while it might be expected that an in-Xen DM can DoS the system, an in
-Xen-ring3 DM should not be able to do so.


In any reasonable situation, emulation of a device is a small state
mutation and occasionally kicking off a further action to perform.  (The
far bigger risk from this kind of emulation is following bad
pointers/etc, rather than long loops.)

I think it would be entirely reasonable to have a deadline for a single
execution of depriv mode, after which the domain is declared malicious
and killed.


I think this could make sense, it's essentially a harsher variant of Ben's
suggestion to abort an attempt to process the MMIO in order to migrate to
another pcpu, but it has the benefit of being easier to implement and
easier to reason about in terms of interactions with other aspects of the
system (i.e. it seems to remove the need to think of ways an attacker might
game that other system).


We already have this for host pcpus - the watchdog defaults to 5
seconds.  Having a similar cutoff for depriv mode should be fine.


That's a reasonable analogy.

Perhaps we would want the depriv-watchdog to be some 1/N fraction of the
pcpu -watchdog, for a smallish N, to avoid the risk of any slop in the
timing allowing the pcpu watchdog to fire. N=3 for example (on the grounds
that N=2 is probably sufficient, so N=3 must be awesome).


+1

  -George


Thanks all! I'll do

[Xen-devel] [PATCH xen-tip] xen/PMU: pmu_modes[] can be static

2015-08-11 Thread kbuild test robot


Signed-off-by: Fengguang Wu fengguang...@intel.com
---
 sys-hypervisor.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/xen/sys-hypervisor.c b/drivers/xen/sys-hypervisor.c
index 0907275..b5a7342 100644
--- a/drivers/xen/sys-hypervisor.c
+++ b/drivers/xen/sys-hypervisor.c
@@ -377,7 +377,7 @@ struct pmu_mode {
uint32_t mode;
 };
 
-struct pmu_mode pmu_modes[] = {
+static struct pmu_mode pmu_modes[] = {
{off, XENPMU_MODE_OFF},
{self, XENPMU_MODE_SELF},
{hv, XENPMU_MODE_HV},

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [xen-tip:linux-next 18/23] drivers/xen/sys-hypervisor.c:380:17: sparse: symbol 'pmu_modes' was not declared. Should it be static?

2015-08-11 Thread kbuild test robot

tree:   git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip linux-next
head:   0d26d72cab825a0227c8d8e0e42161125b3116fd
commit: 3ad90fe1671a12522e3360aa4c39094360a10b38 [18/23] xen/PMU: Sysfs 
interface for setting Xen PMU mode
reproduce:
  # apt-get install sparse
  git checkout 3ad90fe1671a12522e3360aa4c39094360a10b38
  make ARCH=x86_64 allmodconfig
  make C=1 CF=-D__CHECK_ENDIAN__


sparse warnings: (new ones prefixed by )

 drivers/xen/sys-hypervisor.c:380:17: sparse: symbol 'pmu_modes' was not 
 declared. Should it be static?

Please review and possibly fold the followup patch.

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [xen-tip:linux-next 21/23] arch/x86/xen/pmu.c:211:20: sparse: symbol 'xen_amd_read_pmc' was not declared. Should it be static?

2015-08-11 Thread kbuild test robot

tree:   git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip linux-next
head:   0d26d72cab825a0227c8d8e0e42161125b3116fd
commit: 80ef65bb2362fd9eedcb4ec1d41d8a6d0b99dfbb [21/23] xen/PMU: Intercept 
PMU-related MSR and APIC accesses
reproduce:
  # apt-get install sparse
  git checkout 80ef65bb2362fd9eedcb4ec1d41d8a6d0b99dfbb
  make ARCH=x86_64 allmodconfig
  make C=1 CF=-D__CHECK_ENDIAN__


sparse warnings: (new ones prefixed by )

   arch/x86/xen/pmu.c:18:1: sparse: symbol '__pcpu_scope_xenpmu_shared' was not 
declared. Should it be static?
 arch/x86/xen/pmu.c:211:20: sparse: symbol 'xen_amd_read_pmc' was not 
 declared. Should it be static?
 arch/x86/xen/pmu.c:220:20: sparse: symbol 'xen_intel_read_pmc' was not 
 declared. Should it be static?

Please review and possibly fold the followup patch.

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH xen-tip] xen/PMU: xen_amd_read_pmc() can be static

2015-08-11 Thread kbuild test robot


Signed-off-by: Fengguang Wu fengguang...@intel.com
---
 pmu.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/xen/pmu.c b/arch/x86/xen/pmu.c
index cbd68dd..2b81722 100644
--- a/arch/x86/xen/pmu.c
+++ b/arch/x86/xen/pmu.c
@@ -208,7 +208,7 @@ bool pmu_msr_write(unsigned int msr, uint32_t low, uint32_t 
high, int *err)
return false;
 }
 
-unsigned long long xen_amd_read_pmc(int counter)
+static unsigned long long xen_amd_read_pmc(int counter)
 {
uint32_t msr;
int err;
@@ -217,7 +217,7 @@ unsigned long long xen_amd_read_pmc(int counter)
return native_read_msr_safe(msr, err);
 }
 
-unsigned long long xen_intel_read_pmc(int counter)
+static unsigned long long xen_intel_read_pmc(int counter)
 {
int err;
uint32_t msr;

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH for-4.6] libxl: fix libxl__build_hvm error code return path

2015-08-11 Thread Ian Campbell

On Tue, 2015-08-11 at 09:57 +0100, Wei Liu wrote:
 On Fri, Aug 07, 2015 at 06:08:25PM +0200, Roger Pau Monne wrote:
  This is a simple fix to make sure libxl__build_hvm returns an error 
  code in
  case of failure.
  
 
 
 
  Signed-off-by: Roger Pau Monné roger@citrix.com
  Cc: Ian Jackson ian.jack...@eu.citrix.com
  Cc: Stefano Stabellini stefano.stabell...@eu.citrix.com
  Cc: Ian Campbell ian.campb...@citrix.com
  Cc: Wei Liu wei.l...@citrix.com
 
 Acked-by: Wei Liu wei.l...@citrix.com

Unfortunately I think this will result in any valid rc's any path happens
to have being discarded in favour of a generic ERROR_FAIL.

If we are going to band aid this for 4.6 then I think setting rc =
ERROR_FAIL just after the libxl__domain_device_construct_rdm error handling
might be better.

Even better would be to put the rc = ERROR_FAIL into the various if (ret)
blocks. I don't think that would be an unacceptably large patch (it's 3-4
sites from what I can see) and it would be closer to heading in the right
direction.

Ian.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH for-4.6] libxl: fix libxl__build_hvm error code return path

2015-08-11 Thread Wei Liu

On Tue, Aug 11, 2015 at 01:44:45PM +0100, Ian Campbell wrote:
 On Tue, 2015-08-11 at 09:57 +0100, Wei Liu wrote:
  On Fri, Aug 07, 2015 at 06:08:25PM +0200, Roger Pau Monne wrote:
   This is a simple fix to make sure libxl__build_hvm returns an error 
   code in
   case of failure.
   
  
  
  
   Signed-off-by: Roger Pau Monné roger@citrix.com
   Cc: Ian Jackson ian.jack...@eu.citrix.com
   Cc: Stefano Stabellini stefano.stabell...@eu.citrix.com
   Cc: Ian Campbell ian.campb...@citrix.com
   Cc: Wei Liu wei.l...@citrix.com
  
  Acked-by: Wei Liu wei.l...@citrix.com
 
 Unfortunately I think this will result in any valid rc's any path happens
 to have being discarded in favour of a generic ERROR_FAIL.
 

Don't worry, this is the original behaviour.

 If we are going to band aid this for 4.6 then I think setting rc =
 ERROR_FAIL just after the libxl__domain_device_construct_rdm error handling
 might be better.
 
 Even better would be to put the rc = ERROR_FAIL into the various if (ret)
 blocks. I don't think that would be an unacceptably large patch (it's 3-4
 sites from what I can see) and it would be closer to heading in the right
 direction.
 

I can do this as well, since Roger is on vacation at the moment.

Wei.

 Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH for-4.6] tools/libxc: linux: Don't use getpagesize() when unmapping the grants

2015-08-11 Thread Ian Campbell

On Fri, 2015-08-07 at 22:45 +0100, Wei Liu wrote:
 On Fri, Aug 07, 2015 at 07:53:55PM +0100, Julien Grall wrote:
  The grants are based on the Xen granularity (i.e 4KB). While the 
  function
  to map grants for Linux (linux_gnttab_grant_map) is using the correct
  size (XC_PAGE_SIZE), the unmap one (linux_gnttab_munmap) is using
  getpagesize().
  
  On domain using a page granularity different than Xen (this is the case
  for AARCH64 guest using 64KB page), the unmap will be called with the
  wrong size.
  
  Signed-off-by: Julien Grall julien.gr...@citrix.com
  
  ---
  Cc: Ian Jackson ian.jack...@eu.citrix.com
  Cc: Stefano Stabellini stefano.stabell...@eu.citrix.com
  Cc: Ian Campbell ian.campb...@citrix.com
  Cc: Wei Liu wei.l...@citrix.com
  
 
 Acked-by: Wei Liu wei.l...@citrix.com

Acked-by: Ian Campbell ian.campb...@citrix.com

 I think this is a bug fix and should be applied for 4.6.

Agreed.

WRT to backports for 4.5 I'd appreciate being given a full list of required
fixes once everything is in place and working for 4.6/devbranch rather than
my tracking it piecemeal.

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] x86/HVM: honor p2m_ram_ro in hvm_map_guest_frame_rw()

2015-08-11 Thread Jan Beulich

 On 31.07.15 at 18:06, boris.ostrov...@oracle.com wrote:
 On 07/24/2015 05:41 AM, Jan Beulich wrote:
 @@ -1693,14 +1703,22 @@ int nvmx_handle_vmclear(struct cpu_user_
   else
   {
   /* Even if this VMCS isn't the current one, we must clear it. */
 -vvmcs = hvm_map_guest_frame_rw(gpa  PAGE_SHIFT, 0);
 +bool_t writable;
 +
 +vvmcs = hvm_map_guest_frame_rw(gpa  PAGE_SHIFT, 0, writable);
 
 Since you replaced 'gpa  PAGE_SHIFT' with 'paddr_to_pfn(gpa' above, 
 perhaps it should be replaced here too.

Yes indeed.

 Other than that,
 Reviewed-by: Boris Ostrovsky boris.ostrov...@oracle.com

Thanks, Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v7 0/7] xen/PMU: PMU support for Xen PV(H) guests

2015-08-11 Thread David Vrabel

Applied to for-linus-4.3, thanks.

David


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3] xen-apic: Enable on domU as well

2015-08-11 Thread David Vrabel

On 10/08/15 14:40, Jason A. Donenfeld wrote:
 It turns out that domU also requires the Xen APIC driver. Otherwise we
 get stuck in busy loops that never exit, such as in this stack trace:

Applied to for-linus-4.2 and tagged for stable, thanks.

David

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [URGENT RFC] Branching and reopening -unstable

2015-08-11 Thread Lars Kurth


 On 11 Aug 2015, at 12:13, Ian Jackson ian.jack...@eu.citrix.com wrote:
 
 Wei Liu writes ([URGENT RFC] Branching and reopening -unstable):
 Branching should be done at one of the RC tags. It might not be enough
 time for us to reach consensus before tagging RC1, so I would say lets
 branch at RC2 if we don't observe blocker bugs.
 
 Maintainers should be responsible for both 4.6 branch and -unstable
 branch.
 
 As for bug fixes, here are two options.


What do other projects that are similar to us do? And how does it work for 
them? Any reference points?

 I think this conflates the three questions which should be answered:
 
 Q1: What is the status of the newly branched -unstable ?  Should
 we avoid (some or all) big sets of changes ?
  (a) Don't branch
  (b) Branch but don't allow /any/ big changes.
  Seems to make branching rather pointless.
  (c) Branch but allow /some/ big changes.
  Tree is `half open', which is not ideal.
  (d) Branch and allow /all/ changes.
 
 Q2: If we don't avoid such changes, and a bugfix has a conflict
 with a change in the new unstable, who is responsible for fixing
 it up ?  Options include:
  (a) the relevant maintainers (double whammy for maintainers)
  (b) the submitter of the bugfix (very undesirable)
  (c) the submitter of the big set of changes (but what do
we do if they don't respond?)
  (d) the stable tree maintainers (already ruled out, so included
in this list for completeness; out of the question IMO)
 
 Q3: What workflow should we use, for bugfixes for bugs in 4.6-pre ?
There are three options, not two:
 
  (a) Bugfixes go to 4.6 first, cherry pick to unstable
  This keeps our focus on 4.6, which is good.
 
  (b) Bugfixes go to 4.6 first, merge 4.6 to unstable.
  Not tenable if we have big changes in unstable.
 
  (c) Bugfixes to to unstable, cherry pick to 4.6.
  Undesirable IMO because it shifts focus to unstable.
 
 Of these 2(c)/3(a) would be ideal but we don't have a good answer to
 the problem posted in Q2(c).  I think that leaves us with 2(a):
 maintainers have to deal with the fallout.
 
 That makes 1(d) untenable in my view.  As a maintainer, I do not want
 that additional workload.  That leaves us with 1(a) or 1(c)/2(a)/3(a).
 
 With 1(c), who should decide on a particular series ?  Well, who is
 taking the risk ?  The maintainer, who will have to pick up the
 pieces.  I therefore conclude, we have two options:
 
 A 1(a)/-/-
 
  Do not branch yet: defer divergence until the risk of bugfixes is
  much lower.
 
 B 1(c)(maintainer)/2(a)/3(a)
 
  Branch.
 
  Maintainers may choose to defer patch series based on risk of
  conflicts with bugfixes required for 4.6.  Clear communication with
  submitters is required.
 
  Bugfixes for bugs in 4.6 will be accepted onto the 4.6 branch.
  Maintainers are required to cherry pick them onto unstable.
 
  Bugfixes will not be accepted for unstable unless it is clear that
  the bug was introduced in unstable since 4.6 branched.
 
 I am happy with B because it gives the relevant maintainers the
 option.
 
 Ian.

It may be helpful, to evaluate this proposal against a couple of the 
outstanding patch series which were close and didn't make it into 4.6. In other 
words, change sets which we reasonably expect to turn up in the next 4-8 weeks 
or so. 

Regards
Lars
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH for-4.6 v2 1/4] cxenstored: fix systemd socket activation

2015-08-11 Thread Ian Campbell

On Mon, 2015-08-10 at 09:00 +0100, Wei Liu wrote:
 There were two problems with original code:
 
 1. sd_booted() was used to determined if the process was started by
systemd, which was wrong.
 2. Exit with error if pidfile was specified, which was too harsh.
 
 These two combined made cxenstored unable to start by hand if it ran
 on a system which had systemd.
 
 Fix issues with following changes:
 
 1. Use sd_listen_fds to determine if the process is started by systemd.
 2. Don't exit if pidfile is specified.
 
 Rename function and restructure code to make things clearer.
 
 A side effect of this patch is that gcc 4.8 with -Wmaybe-uninitialized
 in non-debug build spits out spurious warning about sock and ro_sock
 might be uninitialized. Since CentOS 7 ships gcc 4.8, we need to work
 around that by setting sock and ro_sock to NULL at the beginning of
 main.
 
 Signed-off-by: Wei Liu wei.l...@citrix.com
 Tested-by: George Dunlap george.dun...@eu.citrix.com

Acked-by: Ian Campbell ian.campb...@citrix.com


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [URGENT RFC] Branching and reopening -unstable

2015-08-11 Thread Ian Campbell

On Tue, 2015-08-11 at 13:55 +0100, Andrew Cooper wrote:
 On 11/08/15 12:13, Ian Jackson wrote:
  Wei Liu writes ([URGENT RFC] Branching and reopening -unstable):
   Branching should be done at one of the RC tags. It might not be 
   enough
   time for us to reach consensus before tagging RC1, so I would say 
   lets
   branch at RC2 if we don't observe blocker bugs.
   
   Maintainers should be responsible for both 4.6 branch and -unstable
   branch.
   
   As for bug fixes, here are two options.
  I think this conflates the three questions which should be answered:
  
   Q1: What is the status of the newly branched -unstable ?  Should
   we avoid (some or all) big sets of changes ?
(a) Don't branch
(b) Branch but don't allow /any/ big changes.
Seems to make branching rather pointless.
(c) Branch but allow /some/ big changes.
Tree is `half open', which is not ideal.
(d) Branch and allow /all/ changes.
  
   Q2: If we don't avoid such changes, and a bugfix has a conflict
   with a change in the new unstable, who is responsible for fixing
   it up ?  Options include:
(a) the relevant maintainers (double whammy for maintainers)
(b) the submitter of the bugfix (very undesirable)
(c) the submitter of the big set of changes (but what do
  we do if they don't respond?)
(d) the stable tree maintainers (already ruled out, so included
  in this list for completeness; out of the question IMO)
  
   Q3: What workflow should we use, for bugfixes for bugs in 4.6-pre ?
  There are three options, not two:
  
(a) Bugfixes go to 4.6 first, cherry pick to unstable
This keeps our focus on 4.6, which is good.
  
(b) Bugfixes go to 4.6 first, merge 4.6 to unstable.
Not tenable if we have big changes in unstable.
  
(c) Bugfixes to to unstable, cherry pick to 4.6.
Undesirable IMO because it shifts focus to unstable.
  
  Of these 2(c)/3(a) would be ideal but we don't have a good answer to
  the problem posted in Q2(c).  I think that leaves us with 2(a):
  maintainers have to deal with the fallout.
  
  That makes 1(d) untenable in my view.  As a maintainer, I do not want
  that additional workload.  That leaves us with 1(a) or 1(c)/2(a)/3(a).
  
  With 1(c), who should decide on a particular series ?  Well, who is
  taking the risk ?  The maintainer, who will have to pick up the
  pieces.  I therefore conclude, we have two options:
  
  A 1(a)/-/-
  
Do not branch yet: defer divergence until the risk of bugfixes is
much lower.
  
  B 1(c)(maintainer)/2(a)/3(a)
  
Branch.
  
Maintainers may choose to defer patch series based on risk of
conflicts with bugfixes required for 4.6.  Clear communication with
submitters is required.
  
Bugfixes for bugs in 4.6 will be accepted onto the 4.6 branch.
Maintainers are required to cherry pick them onto unstable.
  
Bugfixes will not be accepted for unstable unless it is clear that
the bug was introduced in unstable since 4.6 branched.
  
  I am happy with B because it gives the relevant maintainers the
  option.
 
 Very much A.
 
 By definition, 1(c) will destabilise the tree and generate artificial
 work for the maintainers and committers.
 
 The most important action at this point is to stabilise 4.6 for release,
 and peoples efforts are far better spent pursuing that, rather than
 continuing work on unstable.

While I agree that people who have things to do for the release should
prioritise the release not all contributors have a stake in the stable
releases and even those that do may not have anything which they are able
to help with etc (or e.g. have other pressures which prevent them dropping
all development work to dedicate full time to the release).

Realistically even those with 4.6-ish tasks and responsibilities aren't
going to have enough such things to do to fill their time 100% between now
and the release.

 For the sake of a couple of weeks, contributors can keep their patches
 for a little while longer.

A full freeze cycle is more like 6-8 weeks not a couple, which is where
the tension arises between the stable release and other developers.

What seems to have been missed (or gotten a bit mislaid) in the current
analysis is _when_ to branch, the analysis assumes at rc1 while the status
quo for the last few releases has been just before release (or very late in
the rc cycle at least), which are two opposite ends of the spectrum.

There is of course plenty of middle ground between those two points. In
your use of a couple of weeks are you making a counter proposal to branch
at (say) rc3 or are you arguing to keep the development branch closed until
9 October?

Depending on where in the rc cycle we branch different options may have
different weights of up or down side.

Ian.

 
 ~Andrew

___
Xen-devel mailing

Re: [Xen-devel] [PATCH for-4.6] libxl: fix libxl__build_hvm error code return path

2015-08-11 Thread Ian Campbell

On Tue, 2015-08-11 at 14:48 +0100, Wei Liu wrote:
 In 25652f23 (tools/libxl: detect and avoid conflicts with RDM), new
 code was added to use rc to store libxl function call return value,
 which complied to libxl coding style. That patch, however, didn't change
 other locations where return value was stored in ret. In the end
 libxl__build_hvm could return 0 when it failed.
 
 Explicitly set rc to ERROR_FAIL in all error paths to fix this.
 
 Signed-off-by: Wei Liu wei.l...@citrix.com

You missed the path from libxl__domain_firmware, which incorrectly relies
on rc being already initialised by the declaration (which per CODING_STYLE
ought to be removed too).

However perhaps you prefer to leave those other two hunks until 4.7 and
this patch is at least an improvement of sorts so:

Acked-by: Ian Campbell ian.campb...@citrix.com


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v2 1/4] tools: Update sonames for 4.6 RCs

2015-08-11 Thread Wei Liu

On Tue, Aug 11, 2015 at 03:27:46PM +0100, Ian Jackson wrote:
 Update libxc to 4.6.
 Update libxl to 4.6.
 Update libxlu to 4.6.
 
 I did
   git-grep 'MAJOR.*='
 and also to check I had everything
   git-grep 'SONAME_LDFLAG' | egrep -v 'MAJOR' |less
 
 The other, un-updated, libraries are:
   blktap2 (control, libvhd) 1.0  in-tree users only, no ABI changes
   libfsimage1.0  no ABI changes
   libvchan  1.0  no ABI changes
   libxenstat0.0 (!)  no ABI changes
   libxenstore   3.0  no ABI changes
 
 My assertions no ABI changes are based on the output of
   git-diff origin/stable-4.5..staging .
 and similar runes, sometimes limited to .h files.
 
 Signed-off-by: Ian Jackson ian.jack...@eu.citrix.com
 CC: Ian Campbell ian.campb...@citrix.com
 CC: Wei Liu wei.l...@citrix.com

Acked-by: Wei Liu wei.l...@citrix.com

 ---
 v2: Bump libxlu too.  [ Reported by Wei Liu. ]
 [ not resending the remaining patches ]
 ---
  tools/libxc/Makefile |2 +-
  tools/libxl/Makefile |4 ++--
  2 files changed, 3 insertions(+), 3 deletions(-)
 
 diff --git a/tools/libxc/Makefile b/tools/libxc/Makefile
 index 8ae0ea0..a0f899b 100644
 --- a/tools/libxc/Makefile
 +++ b/tools/libxc/Makefile
 @@ -1,7 +1,7 @@
  XEN_ROOT = $(CURDIR)/../..
  include $(XEN_ROOT)/tools/Rules.mk
  
 -MAJOR= 4.5
 +MAJOR= 4.6
  MINOR= 0
  
  ifeq ($(CONFIG_LIBXC_MINIOS),y)
 diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
 index 9036076..c5ecec1 100644
 --- a/tools/libxl/Makefile
 +++ b/tools/libxl/Makefile
 @@ -5,10 +5,10 @@
  XEN_ROOT = $(CURDIR)/../..
  include $(XEN_ROOT)/tools/Rules.mk
  
 -MAJOR = 4.5
 +MAJOR = 4.6
  MINOR = 0
  
 -XLUMAJOR = 4.3
 +XLUMAJOR = 4.6
  XLUMINOR = 0
  
  CFLAGS += -Werror -Wno-format-zero-length -Wmissing-declarations \
 -- 
 1.7.10.4

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH v2 1/4] tools: Update sonames for 4.6 RCs

2015-08-11 Thread Ian Jackson

Update libxc to 4.6.
Update libxl to 4.6.
Update libxlu to 4.6.

I did
  git-grep 'MAJOR.*='
and also to check I had everything
  git-grep 'SONAME_LDFLAG' | egrep -v 'MAJOR' |less

The other, un-updated, libraries are:
  blktap2 (control, libvhd) 1.0  in-tree users only, no ABI changes
  libfsimage1.0  no ABI changes
  libvchan  1.0  no ABI changes
  libxenstat0.0 (!)  no ABI changes
  libxenstore   3.0  no ABI changes

My assertions no ABI changes are based on the output of
  git-diff origin/stable-4.5..staging .
and similar runes, sometimes limited to .h files.

Signed-off-by: Ian Jackson ian.jack...@eu.citrix.com
CC: Ian Campbell ian.campb...@citrix.com
CC: Wei Liu wei.l...@citrix.com
---
v2: Bump libxlu too.  [ Reported by Wei Liu. ]
[ not resending the remaining patches ]
---
 tools/libxc/Makefile |2 +-
 tools/libxl/Makefile |4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/tools/libxc/Makefile b/tools/libxc/Makefile
index 8ae0ea0..a0f899b 100644
--- a/tools/libxc/Makefile
+++ b/tools/libxc/Makefile
@@ -1,7 +1,7 @@
 XEN_ROOT = $(CURDIR)/../..
 include $(XEN_ROOT)/tools/Rules.mk
 
-MAJOR= 4.5
+MAJOR= 4.6
 MINOR= 0
 
 ifeq ($(CONFIG_LIBXC_MINIOS),y)
diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
index 9036076..c5ecec1 100644
--- a/tools/libxl/Makefile
+++ b/tools/libxl/Makefile
@@ -5,10 +5,10 @@
 XEN_ROOT = $(CURDIR)/../..
 include $(XEN_ROOT)/tools/Rules.mk
 
-MAJOR = 4.5
+MAJOR = 4.6
 MINOR = 0
 
-XLUMAJOR = 4.3
+XLUMAJOR = 4.6
 XLUMINOR = 0
 
 CFLAGS += -Werror -Wno-format-zero-length -Wmissing-declarations \
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] x86/HVM: honor p2m_ram_ro in hvm_map_guest_frame_rw()

2015-08-11 Thread Tim Deegan

At 07:51 -0600 on 11 Aug (1439279513), Jan Beulich wrote:
  On 27.07.15 at 13:09, t...@xen.org wrote:
  At 13:02 +0100 on 24 Jul (1437742964), Andrew Cooper wrote:
  On 24/07/15 10:41, Jan Beulich wrote:
   Beyond that log-dirty handling in _hvm_map_guest_frame() looks bogus
   too: What if a XEN_DOMCTL_SHADOW_OP_* gets issued and acted upon
   between the setting of the dirty flag and the actual write happening?
   I.e. shouldn't the flag instead be set in hvm_unmap_guest_frame()?
  
  It does indeed.  (Ideally the dirty bit should probably be held high for 
  the duration that a mapping exists, but that is absolutely infeasible to 
  do).
  
  IMO that would not be very useful -- a well-behaved toolstack will
  have to make sure that relevant mappings are torn down before
  stop-and-copy.  Forcing the dirty bit high in the meantime just makes
  every intermediate pass send a wasted copy of the page, without
  actually closing the race window if the tools are buggy.
 
 Making sure such mappings got torn down in time doesn't help
 when the most recent write happened _after_ the most recent
 clearing of the dirty flag in a pass prior to stop-and-copy.

This is why e.g. __gnttab_unmap_common sets the dirty bit again
as it unmaps.

Cheers,

Tim.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] Commit moratorium for 4.6rc1

2015-08-11 Thread Ian Jackson

Please avoid committing anything just now.  We need the push gate
clear for a patch to update the tools library sonames, which is needed
for rc1.

Thanks,
Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 2/4] Update version to Xen 4.6 RC

2015-08-11 Thread Wei Liu

On Tue, Aug 11, 2015 at 03:09:18PM +0100, Ian Jackson wrote:
 * Change README to say `Xen 4.6-rc'
 * Change XEN_EXTRAVERSION so that we are `4.6.0-rc'
 
 Note that the RC number (eg, 1 for rc1) is not in the version string,
 so that we do not need to update this again when we cut the next RC.
 
 Signed-off-by: Ian Jackson ian.jack...@eu.citrix.com
 CC: Jan Beulich jbeul...@suse.com
 CC: Konrad Rzeszutek Wilk konrad.w...@oracle.com
 CC: Ian Campbell ian.campb...@citrix.com
 CC: Wei Liu wei.l...@citrix.com

Acked-by: Wei Liu wei.l...@citrix.com

 ---
  README   |   12 ++--
  xen/Makefile |2 +-
  2 files changed, 7 insertions(+), 7 deletions(-)
 
 diff --git a/README b/README
 index 0e456b8..522f1a2 100644
 --- a/README
 +++ b/README
 @@ -1,10 +1,10 @@
  #
 -__  ___  ___ __ _  
 -\ \/ /___ _ __   | || |  / /__   _ _ __  ___| |_ __ _| |__ | | ___ 
 - \  // _ \ '_ \  | || |_| '_ \ _| | | | '_ \/ __| __/ _` | '_ \| |/ _ \
 - /  \  __/ | | | |__   _| (_) |_| |_| | | | \__ \ || (_| | |_) | |  __/
 -/_/\_\___|_| |_||_|(_)___/   \__,_|_| |_|___/\__\__,_|_.__/|_|\___|
 -   
 +__  ___  ___
 +\ \/ /___ _ __   | || |  / /__ __ ___
 + \  // _ \ '_ \  | || |_| '_ \ _| '__/ __|
 + /  \  __/ | | | |__   _| (_) |_| | | (__
 +/_/\_\___|_| |_||_|(_)___/  |_|  \___|
 +
  #
  
  http://www.xen.org/
 diff --git a/xen/Makefile b/xen/Makefile
 index 6305880..6088c9d 100644
 --- a/xen/Makefile
 +++ b/xen/Makefile
 @@ -2,7 +2,7 @@
  # All other places this is stored (eg. compile.h) should be autogenerated.
  export XEN_VERSION   = 4
  export XEN_SUBVERSION= 6
 -export XEN_EXTRAVERSION ?= -unstable$(XEN_VENDORVERSION)
 +export XEN_EXTRAVERSION ?= .0-rc$(XEN_VENDORVERSION)
  export XEN_FULLVERSION   = 
 $(XEN_VERSION).$(XEN_SUBVERSION)$(XEN_EXTRAVERSION)
  -include xen-version
  
 -- 
 1.7.10.4

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 4/4] Update QEMU_UPSTREAM_REVISION for 4.6 RC1

2015-08-11 Thread Wei Liu

On Tue, Aug 11, 2015 at 03:09:20PM +0100, Ian Jackson wrote:
 When we make RC1 we arrange to get a specific version of
 qemu-xen-upstream.
 
 Signed-off-by: Ian Jackson ian.jack...@eu.citrix.com
 CC: Wei Liu wei.l...@citrix.com
 CC: Stefano Stabellini stefano.stabell...@eu.citrix.com

Acked-by: Wei Liu wei.l...@citrix.com

 ---
  Config.mk |2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)
 
 diff --git a/Config.mk b/Config.mk
 index d8b650e..75b49a3 100644
 --- a/Config.mk
 +++ b/Config.mk
 @@ -254,7 +254,7 @@ SEABIOS_UPSTREAM_URL ?= git://xenbits.xen.org/seabios.git
  MINIOS_UPSTREAM_URL ?= git://xenbits.xen.org/mini-os.git
  endif
  OVMF_UPSTREAM_REVISION ?= cb9a7ebabcd6b8a49dc0854b2f9592d732b5afbd
 -QEMU_UPSTREAM_REVISION ?= master
 +QEMU_UPSTREAM_REVISION ?= qemu-xen-4.6.0-rc1
  MINIOS_UPSTREAM_REVISION ?= b36bcb370d611ad7f41e8c21d061e6291e088c58
  # Fri Jun 26 11:58:40 2015 +0100
  # Correct printf formatting for tpm_tis message.
 -- 
 1.7.10.4

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 1/4] tools: Update sonames for 4.6 RCs

2015-08-11 Thread Wei Liu

On Tue, Aug 11, 2015 at 03:09:17PM +0100, Ian Jackson wrote:
 Update libxc to 4.6.
 Update libxl to 4.6.
 
 I did
   git-grep 'MAJOR.*='
 and also to check I had everything
   git-grep 'SONAME_LDFLAG' | egrep -v 'MAJOR' |less
 
 The other, un-updated, libraries are:
   blktap2 (control, libvhd) 1.0  in-tree users only, no ABI changes
   libfsimage1.0  no ABI changes
   libvchan  1.0  no ABI changes
   libxenstat0.0 (!)  no ABI changes
   libxenstore   3.0  no ABI changes
 
 My assertions no ABI changes are based on the output of
   git-diff origin/stable-4.5..staging .
 and similar runes, sometimes limited to .h files.
 
 Signed-off-by: Ian Jackson ian.jack...@eu.citrix.com
 CC: Ian Campbell ian.campb...@citrix.com
 CC: Wei Liu wei.l...@citrix.com
 ---
  tools/libxc/Makefile |2 +-
  tools/libxl/Makefile |2 +-
  2 files changed, 2 insertions(+), 2 deletions(-)
 
 diff --git a/tools/libxc/Makefile b/tools/libxc/Makefile
 index 8ae0ea0..a0f899b 100644
 --- a/tools/libxc/Makefile
 +++ b/tools/libxc/Makefile
 @@ -1,7 +1,7 @@
  XEN_ROOT = $(CURDIR)/../..
  include $(XEN_ROOT)/tools/Rules.mk
  
 -MAJOR= 4.5
 +MAJOR= 4.6
  MINOR= 0
  
  ifeq ($(CONFIG_LIBXC_MINIOS),y)
 diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
 index 9036076..a5ffa01 100644
 --- a/tools/libxl/Makefile
 +++ b/tools/libxl/Makefile
 @@ -5,7 +5,7 @@
  XEN_ROOT = $(CURDIR)/../..
  include $(XEN_ROOT)/tools/Rules.mk
  
 -MAJOR = 4.5
 +MAJOR = 4.6
  MINOR = 0
  
  XLUMAJOR = 4.3
   

What about libxlutil?

I'm pretty sure its ABI has changed.

Wei.

 -- 
 1.7.10.4

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 3/4] Update QEMU_TRADITIONAL_REVISION for 4.6 RC1

2015-08-11 Thread Wei Liu

On Tue, Aug 11, 2015 at 03:09:19PM +0100, Ian Jackson wrote:
 (We will not necessarily bump this tag number for future RCs, unless
 something has changed in qemu-xen-traditional.)
 
 Signed-off-by: Ian Jackson ian.jack...@eu.citrix.com
 CC: Wei Liu wei.l...@citrix.com

Acked-by: Wei Liu wei.l...@citrix.com

 ---
  Config.mk |3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)
 
 diff --git a/Config.mk b/Config.mk
 index e9a7097..d8b650e 100644
 --- a/Config.mk
 +++ b/Config.mk
 @@ -266,7 +266,8 @@ SEABIOS_UPSTREAM_REVISION ?= rel-1.8.2
  ETHERBOOT_NICS ?= rtl8139 8086100e
  
  
 -QEMU_TRADITIONAL_REVISION ?= 7f057440b31da38196e3398fd1b618fc36ad97d6
 +QEMU_TRADITIONAL_REVISION ?= xen-4.6.0-rc1
 +# 7f057440b31da38196e3398fd1b618fc36ad97d6
  # Wed Jun 3 14:41:27 2015 +0200
  # ide: Clear DRQ after handling all expected accesses
  
 -- 
 1.7.10.4

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] Design doc of adding ACPI support for arm64 on Xen - version 2

2015-08-11 Thread Ian Campbell

On Fri, 2015-08-07 at 10:11 +0800, Shannon Zhao wrote:
 This document is going to explain the design details of Xen booting with
 ACPI on ARM. Maybe parts of it may not be appropriate. Any comments are
 welcome.

Some small subsets of this seem like they might overlap with what will be
required for PVH on x86 (a new x86 guest mode not dissimilar to the sole
ARM guest mode). If so then it would be preferable IMHO if PVH x86 could
use the same interfaces.

I've trimmed the quotes to just those bits and CCd some of the PVH people
(Boris and Roger[0]) in case they have any thoughts.

Actually, having done the trimming there is only one such bit:

[...]
 4. Map MMIO regions
 ---
 Register a bus_notifier for platform and amba bus in Linux. Add a new
 XENMAPSPACE XENMAPSPACE_dev_mmio. Within the register, check if the
 device is newly added, then call hypercall XENMEM_add_to_physmap to map
 the mmio regions.

Ian.

[0] Roger is away for a week or so, but I'm expect feedback to be of the
we could use one extra field type rather than this needs to be done some
totally different way for x86/PVH (in which case we wouldn't want to share
the interface anyway I suppose) so need to block on awaiting that feedback.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] Design doc of adding ACPI support for arm64 on Xen - version 2

2015-08-11 Thread David Vrabel

On 11/08/15 15:12, Ian Campbell wrote:
 On Fri, 2015-08-07 at 10:11 +0800, Shannon Zhao wrote:

 [...]
 3. Dom0 gets grant table and event channel irq information
 ---
 As said above, we assign the hypervisor_id be XenVMM to tell Dom0 that
 it runs on Xen hypervisor.

 For grant table, add two new HVM_PARAMs: HVM_PARAM_GNTTAB_START_ADDRESS
 and HVM_PARAM_GNTTAB_SIZE.
 
 The reason we expose this range is essentially to allow OS authors to take
 a short cut by telling them about an IPA range which is unused, so it is
 available for remapping the grant table into. On x86 there is a BAR on the
 Xen platform PCI device which serves a similar purpose.
 
 IIRC somebody (perhaps David V, CCd) had proposed at some point to make it
 so that Linux was able to pick such an IPA itself by examining the memory
 map or by some other scheme.

PVH in Linux uses ballooned pages which are vmap()'d into a virtually
contiguous region.

See xlated_setup_gnttab_pages().

David

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] Design doc of adding ACPI support for arm64 on Xen - version 2

2015-08-11 Thread Ian Campbell

On Tue, 2015-08-11 at 15:51 +0100, David Vrabel wrote:
 On 11/08/15 15:12, Ian Campbell wrote:
  On Fri, 2015-08-07 at 10:11 +0800, Shannon Zhao wrote:
   
  [...]
   3. Dom0 gets grant table and event channel irq information
   ---
   As said above, we assign the hypervisor_id be XenVMM to tell Dom0 
   that
   it runs on Xen hypervisor.
   
   For grant table, add two new HVM_PARAMs: 
   HVM_PARAM_GNTTAB_START_ADDRESS
   and HVM_PARAM_GNTTAB_SIZE.
  
  The reason we expose this range is essentially to allow OS authors to 
  take
  a short cut by telling them about an IPA range which is unused, so it 
  is
  available for remapping the grant table into. On x86 there is a BAR on 
  the
  Xen platform PCI device which serves a similar purpose.
  
  IIRC somebody (perhaps David V, CCd) had proposed at some point to make 
  it
  so that Linux was able to pick such an IPA itself by examining the 
  memory
  map or by some other scheme.
 
 PVH in Linux uses ballooned pages which are vmap()'d into a virtually
 contiguous region.
 
 See xlated_setup_gnttab_pages().

So somewhat more concrete than a proposal then ;-)

I don't see anything there which would be a problem on ARM, so we should
probably go that route there too (at least for ACPI, if not globally for
all ARM guests).

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] Design doc of adding ACPI support for arm64 on Xen - version 2

2015-08-11 Thread David Vrabel

On 11/08/15 15:59, Ian Campbell wrote:
 On Tue, 2015-08-11 at 15:51 +0100, David Vrabel wrote:
 On 11/08/15 15:12, Ian Campbell wrote:
 On Fri, 2015-08-07 at 10:11 +0800, Shannon Zhao wrote:

 [...]
 3. Dom0 gets grant table and event channel irq information
 ---
 As said above, we assign the hypervisor_id be XenVMM to tell Dom0 
 that
 it runs on Xen hypervisor.

 For grant table, add two new HVM_PARAMs: 
 HVM_PARAM_GNTTAB_START_ADDRESS
 and HVM_PARAM_GNTTAB_SIZE.

 The reason we expose this range is essentially to allow OS authors to 
 take
 a short cut by telling them about an IPA range which is unused, so it 
 is
 available for remapping the grant table into. On x86 there is a BAR on 
 the
 Xen platform PCI device which serves a similar purpose.

 IIRC somebody (perhaps David V, CCd) had proposed at some point to make 
 it
 so that Linux was able to pick such an IPA itself by examining the 
 memory
 map or by some other scheme.

 PVH in Linux uses ballooned pages which are vmap()'d into a virtually
 contiguous region.

 See xlated_setup_gnttab_pages().
 
 So somewhat more concrete than a proposal then ;-)
 
 I don't see anything there which would be a problem on ARM, so we should
 probably go that route there too (at least for ACPI, if not globally for
 all ARM guests).

If someone does this please move xlated_setup_gnttab_pages() into
drivers/xen/xlate_mmu.c, and not copy it into an arm specific file.

David

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [RFC PATCH v3.1 2/2] xsplice: Add hook for build_id

2015-08-11 Thread Jan Beulich

 On 27.07.15 at 21:20, kon...@kernel.org wrote:
 --- a/xen/include/xen/compile.h.in
 +++ b/xen/include/xen/compile.h.in
 @@ -10,4 +10,5 @@
  #define XEN_EXTRAVERSION @@extraversion@@
  
  #define XEN_CHANGESET@@changeset@@
 +#define XEN_BUILD_ID@@changeset@@

How can the changset be a valid / sufficient build ID (even if maybe
this is intended to only be a default / fallback)? Wasn't this meant
to specifically account for rebuilds (with, say, a compiler slightly
updated from the original one, and hence possibly producing
slightly different code)?

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH for-4.6 URGENT 0/4] Prepare for RC1

2015-08-11 Thread Ian Jackson

This is the result of me going through the relevant (pre-tagging) part
of the release checklist.

The qemu tags referred to have just been created.

Thanks,
Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [RFC PATCH v3.1 1/2] xsplice: rfc.v3.1

2015-08-11 Thread Jan Beulich

 On 31.07.15 at 17:46, konrad.w...@oracle.com wrote:
 On Thu, Jul 30, 2015 at 09:47:40AM -0700, Johannes Erdfelt wrote:
 On Mon, Jul 27, 2015, Konrad Rzeszutek Wilk kon...@kernel.org wrote:
  +struct xsplice_reloc_howto {  
  +uint32_thowto; /* XSPLICE_HOWTO_* */  
  +uint32_tflag; /* XSPLICE_HOWTO_FLAG_* */  
  +uint32_tsize; /* Size, in bytes, of the item to be relocated. */  
  +uint32_tr_shift; /* The value the final relocation is shifted 
  right by; used to drop unwanted data from the relocation. */  
  +uint64_tmask; /* Bitmask for which parts of the instruction or 
  data are replaced with the relocated value. */  
  +uint8_t pad[8]; /* Must be zero. */  
  +};  
 
 I'm curious how r_shift and mask are used. I'm familiar with x86 and
 x86_64 and I'm not sure how these fit in. Is this to support other
 architectures?
 
 It is to patch up data. We can specify the exact mask for an unsigned
 int - so we only patch specific bits. Ditto if we want to remove certain
 values.

Still I don't see a practical use: What relocated item would (on x86)
be stored starting at other than bit 0 of a byte/word? Also, wouldn't
a shift count be redundant with the mask value anyway?

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

1 2 3 >

1 - 100 of 207 matches

Mail list logo