Re: [Xen-devel] [PATCH V6 2/7] libxl_read_file_contents: add new entry to read sysfs file
On 8/11/2015 at 07:26 PM, in message 2015082655.ge7...@zion.uk.xensource.com, Wei Liu wei.l...@citrix.com wrote: On Mon, Aug 10, 2015 at 06:35:23PM +0800, Chunyan Liu wrote: Sysfs file has size=4096 but actual file content is less than that. Current libxl_read_file_contents will treat it as error when file size and actual file content differs, so reading sysfs file content with this function always fails. Add a new entry libxl_read_sysfs_file_contents to handle sysfs file specially. It would be used in later pvusb work. Signed-off-by: Chunyan Liu cy...@suse.com --- Changes: - read one more byte to check bigger size problem. tools/libxl/libxl_internal.h | 2 ++ tools/libxl/libxl_utils.c| 51 ++-- 2 files changed, 42 insertions(+), 11 deletions(-) diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h index 6013628..f98f089 100644 --- a/tools/libxl/libxl_internal.h +++ b/tools/libxl/libxl_internal.h @@ -4001,6 +4001,8 @@ void libxl__bitmap_copy_best_effort(libxl__gc *gc, libxl_bitmap *dptr, int libxl__count_physical_sockets(libxl__gc *gc, int *sockets); #endif +_hidden int libxl_read_sysfs_file_contents(libxl_ctx *ctx, const char *filename, + void **data_r, int *datalen_r); Indentation looks wrong. /* * Local variables: diff --git a/tools/libxl/libxl_utils.c b/tools/libxl/libxl_utils.c index bfc9699..9234efb 100644 --- a/tools/libxl/libxl_utils.c +++ b/tools/libxl/libxl_utils.c @@ -322,8 +322,10 @@ out: return rc; } -int libxl_read_file_contents(libxl_ctx *ctx, const char *filename, - void **data_r, int *datalen_r) { +static int libxl_read_file_contents_core(libxl_ctx *ctx, const char *filename, + void **data_r, int *datalen_r, + bool tolerate_shrinking_file) +{ GC_INIT(ctx); FILE *f = 0; uint8_t *data = 0; @@ -359,20 +361,34 @@ int libxl_read_file_contents(libxl_ctx *ctx, const char *filename, datalen = stab.st_size; if (stab.st_size data_r) { -data = malloc(datalen); +data = malloc(datalen + 1); if (!data) goto xe; -rs = fread(data, 1, datalen, f); -if (rs != datalen) { -if (ferror(f)) +rs = fread(data, 1, datalen + 1, f); +if (rs datalen) { +LOG(ERROR, %s increased size while we were reading it, +filename); +goto xe; +} + +if (rs datalen) { +if (ferror(f)) { LOGE(ERROR, failed to read %s, filename); -else if (feof(f)) -LOG(ERROR, %s changed size while we were reading it, - filename); -else +goto xe; +} else if (feof(f)) { +if (tolerate_shrinking_file) { +datalen = rs; +} else { +LOG(ERROR, %s shrunk size while we were reading it, +filename); +goto xe; +} +} else { abort(); -goto xe; +} This is a bit bikeshedding, but you can leave goto xe out of two `if' to reduce patch size. I guess you mean if (ferror(f)) and if (feof(f)) ? We can't leave 'goto xe' outside, since in if (feof(f)) if (tolerate_shrinking_file), it's not error but an expected result in sysfs case. } + +data = realloc(data, datalen); Should check return value of realloc. Will add a check: if (!data) goto xe; Thanks, Chunyan The logic of this function reflects what has been discussed so far. Wei. } if (fclose(f)) { @@ -396,6 +412,19 @@ int libxl_read_file_contents(libxl_ctx *ctx, const char *filename, return e; } +int libxl_read_file_contents(libxl_ctx *ctx, const char *filename, + void **data_r, int *datalen_r) +{ +return libxl_read_file_contents_core(ctx, filename, data_r, datalen_r, 0); +} + +int libxl_read_sysfs_file_contents(libxl_ctx *ctx, const char *filename, + void **data_r, int *datalen_r) +{ +return libxl_read_file_contents_core(ctx, filename, data_r, datalen_r, 1); +} + + #define READ_WRITE_EXACTLY(rw, zero_is_eof, constdata) \ \ int libxl_##rw##_exactly(libxl_ctx *ctx, int fd, \ -- 2.1.4
Re: [Xen-devel] Design doc of adding ACPI support for arm64 on Xen
On 2015/8/12 4:43, Konrad Rzeszutek Wilk wrote: On Wed, Aug 05, 2015 at 09:03:06PM +0800, Shannon Zhao wrote: On 2015/8/5 20:48, Julien Grall wrote: On 05/08/15 12:49, Shannon Zhao wrote: That's great! Keep in mind that many ARM platforms have non-PCI busses, so I think we'll need an amba and a platform bus_notifier too, in addition to the existing pci bus notifier. Thanks for your reminding. I thought about amba. Since ACPI of current linux kernel doesn't support probe amba bus devices, so this bus_notifier will not be used at the moment. But there are some voice that we need to make ACPI support amba on the linux arm kernel mail list. And to me it doesn't matter to add the amba bus_notifier. This comment raised one question. What happen if the hardware has MMIO region not described in the ACPI? This sounds weird. If a device is described in ACPI table, it will not describe the MMIO region which the driver will use? Does this situation exist? If the hardware has mmio region not described in the ACPI, how does the driver know the region and use it? On the x86 world we would query the PCI configuration registers and read the device BAR registers. Those would contain the MMIO regions the device uses. But x86 is funny and you do say 'many .. ARM .. have non-PCI buses' - which would imply you have not hit this yet. Are PCI devices interrogated differently on ARM? No configuration registers? For PCI devices, on ARM it will reuse the existing bus_notifier xen_pci_notifier to call hypercall to map mmio regions. And other operates are same with X86. Thanks, -- Shannon ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Design doc of adding ACPI support for arm64 on Xen - version 2
Hi Julien, On 2015/8/12 0:19, Julien Grall wrote: Hi Shannon, On 07/08/15 03:11, Shannon Zhao wrote: 2. Create minimal DT to pass required information to Dom0 -- The minimal DT mainly passes Dom0 bootargs, address and size of initrd (if available), address and size of uefi system table, address and size of uefi memory table, uefi-mmap-desc-size and uefi-mmap-desc-ver. An example of the minimal DT: / { #address-cells = 2; #size-cells = 1; chosen { bootargs = kernel=Image console=hvc0 earlycon=pl011,0x1c09 root=/dev/vda2 rw rootfstype=ext4 init=/bin/sh acpi=force; linux,initrd-start = 0x; linux,initrd-end = 0x; linux,uefi-system-table = 0x; linux,uefi-mmap-start = 0x; linux,uefi-mmap-size = 0x; linux,uefi-mmap-desc-size = 0x; linux,uefi-mmap-desc-ver = 0x; }; }; For details loook at https://github.com/torvalds/linux/blob/master/Documentation/arm/uefi.txt AFAICT, the device tree properties in this documentation are only used in order to communicate between the UEFI stub and Linux. This means that those properties are not standardize and can change at any time by Linux folks. They don't even live in Documentation/devicetree/ I would also expect to see the same needs for FreeBSD running as DOM0 with ACPI. I'm not very clear about how FreeBSD communicates with UEFI. And when booting with DT, how does FreeBSD communicate with UEFI? Not through these properties? So it looks like to me that a generic name would be better for all those properties. If we change these name, it needs change some functions in Linux. Will it impact the use of Linux with UEFI not on Xen? Thanks, -- Shannon ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH for-4.6] libxl: fix libxl__build_hvm error code return path
On Fri, Aug 07, 2015 at 06:08:25PM +0200, Roger Pau Monne wrote: This is a simple fix to make sure libxl__build_hvm returns an error code in case of failure. Signed-off-by: Roger Pau Monné roger@citrix.com Cc: Ian Jackson ian.jack...@eu.citrix.com Cc: Stefano Stabellini stefano.stabell...@eu.citrix.com Cc: Ian Campbell ian.campb...@citrix.com Cc: Wei Liu wei.l...@citrix.com Acked-by: Wei Liu wei.l...@citrix.com Though I would like to make commit message clearer. In 25652f23 (tools/libxl: detect and avoid conflicts with RDM), new code was added to use rc to store libxl function call return value, which complied to libxl coding style. That patch, however, didn't change other locations where return value was stored in ret. In the end libxl__build_hvm could return 0 when it failed. Explicitly set rc to ERROR_FAIL in error path to fix this. A more comprehensive fix would be changing all ret to rc, which should be done when next development window opens. --- I would rather prefer to have it fixed in a proper way like it's done in my libxl: fix libxl__build_hvm error handling as part of the HVMlite series, but I understand that given the current status of the tree and the willingness to backport this to stable branches the other approach is going to be much harder. --- tools/libxl/libxl_dom.c | 1 + 1 file changed, 1 insertion(+) diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c index e1f11a3..668ce11 100644 --- a/tools/libxl/libxl_dom.c +++ b/tools/libxl/libxl_dom.c @@ -1019,6 +1019,7 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid, return 0; out: +rc = ERROR_FAIL; return rc; } -- 1.9.5 (Apple Git-50.3) ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [distros-debian-squeeze test] 37818: all pass
flight 37818 distros-debian-squeeze real [real] http://osstest.xs.citrite.net/~osstest/testlogs/logs/37818/ Perfect :-) All tests in this flight passed baseline version: flight 37776 jobs: build-amd64 pass build-armhf pass build-i386 pass build-amd64-pvopspass build-armhf-pvopspass build-i386-pvops pass test-amd64-amd64-amd64-squeeze-netboot-pygrubpass test-amd64-i386-amd64-squeeze-netboot-pygrub pass test-amd64-amd64-i386-squeeze-netboot-pygrub pass test-amd64-i386-i386-squeeze-netboot-pygrub pass sg-report-flight on osstest.xs.citrite.net logs: /home/osstest/logs images: /home/osstest/images Logs, config files, etc. are available at http://osstest.xs.citrite.net/~osstest/testlogs/logs Test harness code can be found at http://xenbits.xensource.com/gitweb?p=osstest.git;a=summary Push not applicable. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH V3 5/6] x86/xsaves: support compact format for hvm save/restore
On 11/08/15 09:01, Shuai Ruan wrote: + +/* + * The FP xstates and SSE xstates are legacy states. They are always + * in the fixed offsets in the xsave area in either compacted form + * or standard form. + */ +xstate_comp_offsets[0] = 0; +xstate_comp_offsets[1] = XSAVE_SSE_OFFSET; + +xstate_comp_offsets[2] = FXSAVE_SIZE + XSAVE_HDR_SIZE; + +for (i = 2; i xstate_features; i++) This loop will run off the end of xstate_comp_sizes[] for any processor supporting AVX512 or greater. For the length of xsate_comp_sizes is 64, I think the case you mentioned above will not happen. xstate_features is a bitmap. The comparison i xstate_features is bogus, and loops many more times than you intend. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH RFC v2 0/5] Multi-queue support for xen-blkfront and xen-blkback
On 08/10/2015 11:52 PM, Jens Axboe wrote: On 08/10/2015 05:03 AM, Rafal Mielniczuk wrote: On 01/07/15 04:03, Jens Axboe wrote: On 06/30/2015 08:21 AM, Marcus Granado wrote: Hi, Our measurements for the multiqueue patch indicate a clear improvement in iops when more queues are used. The measurements were obtained under the following conditions: - using blkback as the dom0 backend with the multiqueue patch applied to a dom0 kernel 4.0 on 8 vcpus. - using a recent Ubuntu 15.04 kernel 3.19 with multiqueue frontend applied to be used as a guest on 4 vcpus - using a micron RealSSD P320h as the underlying local storage on a Dell PowerEdge R720 with 2 Xeon E5-2643 v2 cpus. - fio 2.2.7-22-g36870 as the generator of synthetic loads in the guest. We used direct_io to skip caching in the guest and ran fio for 60s reading a number of block sizes ranging from 512 bytes to 4MiB. Queue depth of 32 for each queue was used to saturate individual vcpus in the guest. We were interested in observing storage iops for different values of block sizes. Our expectation was that iops would improve when increasing the number of queues, because both the guest and dom0 would be able to make use of more vcpus to handle these requests. These are the results (as aggregate iops for all the fio threads) that we got for the conditions above with sequential reads: fio_threads io_depth block_size 1-queue_iops 8-queue_iops 8 32 512 158K 264K 8 321K 157K 260K 8 322K 157K 258K 8 324K 148K 257K 8 328K 124K 207K 8 32 16K84K 105K 8 32 32K50K 54K 8 32 64K24K 27K 8 32 128K11K 13K 8-queue iops was better than single queue iops for all the block sizes. There were very good improvements as well for sequential writes with block size 4K (from 80K iops with single queue to 230K iops with 8 queues), and no regressions were visible in any measurement performed. Great results! And I don't know why this code has lingered for so long, so thanks for helping get some attention to this again. Personally I'd be really interested in the results for the same set of tests, but without the blk-mq patches. Do you have them, or could you potentially run them? Hello, We rerun the tests for sequential reads with the identical settings but with Bob Liu's multiqueue patches reverted from dom0 and guest kernels. The results we obtained were *better* than the results we got with multiqueue patches applied: fio_threads io_depth block_size 1-queue_iops 8-queue_iops *no-mq-patches_iops* 8 32 512 158K 264K 321K 8 321K 157K 260K 328K 8 322K 157K 258K 336K 8 324K 148K 257K 308K 8 328K 124K 207K 188K 8 32 16K84K 105K 82K 8 32 32K50K 54K 36K 8 32 64K24K 27K 16K 8 32 128K11K 13K 11K We noticed that the requests are not merged by the guest when the multiqueue patches are applied, which results in a regression for small block sizes (RealSSD P320h's optimal block size is around 32-64KB). We observed similar regression for the Dell MZ-5EA1000-0D3 100 GB 2.5 Internal SSD As I understand blk-mq layer bypasses I/O scheduler which also effectively disables merges. Could you explain why it is difficult to enable merging in the blk-mq layer? That could help closing the performance gap we observed. Otherwise, the tests shows that the multiqueue patches does not improve the performance, at least when it comes to sequential read/writes operations. blk-mq still provides merging, there should be no difference there. Does the xen patches set BLK_MQ_F_SHOULD_MERGE? Yes. Is it possible that xen-blkfront driver dequeue requests too fast after we have multiple hardware queues? Because new requests don't have the chance merging with old requests which were already dequeued and issued. -- Regards, -Bob ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v3 1/2] Differentiate IO/mem resources tracked by ioreq server
On 8/10/2015 6:57 PM, Paul Durrant wrote: -Original Message- From: Andrew Cooper [mailto:andrew.coop...@citrix.com] Sent: 10 August 2015 11:56 To: Paul Durrant; Wei Liu; Yu Zhang Cc: xen-devel@lists.xen.org; Ian Jackson; Stefano Stabellini; Ian Campbell; Keir (Xen.org); jbeul...@suse.com; Kevin Tian; zhiyuan...@intel.com Subject: Re: [PATCH v3 1/2] Differentiate IO/mem resources tracked by ioreq server On 10/08/15 09:33, Paul Durrant wrote: -Original Message- From: Wei Liu [mailto:wei.l...@citrix.com] Sent: 10 August 2015 09:26 To: Yu Zhang Cc: xen-devel@lists.xen.org; Paul Durrant; Ian Jackson; Stefano Stabellini; Ian Campbell; Wei Liu; Keir (Xen.org); jbeul...@suse.com; Andrew Cooper; Kevin Tian; zhiyuan...@intel.com Subject: Re: [PATCH v3 1/2] Differentiate IO/mem resources tracked by ioreq server On Mon, Aug 10, 2015 at 11:33:40AM +0800, Yu Zhang wrote: Currently in ioreq server, guest write-protected ram pages are tracked in the same rangeset with device mmio resources. Yet unlike device mmio, which can be in big chunks, the guest write- protected pages may be discrete ranges with 4K bytes each. This patch uses a seperate rangeset for the guest ram pages. And a new ioreq type, IOREQ_TYPE_MEM, is defined. Note: Previously, a new hypercall or subop was suggested to map write-protected pages into ioreq server. However, it turned out handler of this new hypercall would be almost the same with the existing pair - HVMOP_[un]map_io_range_to_ioreq_server, and there's already a type parameter in this hypercall. So no new hypercall defined, only a new type is introduced. Signed-off-by: Yu Zhang yu.c.zh...@linux.intel.com --- tools/libxc/include/xenctrl.h| 39 +++--- tools/libxc/xc_domain.c | 59 ++-- FWIW the hypercall wrappers look correct to me. diff --git a/xen/include/public/hvm/hvm_op.h b/xen/include/public/hvm/hvm_op.h index 014546a..9106cb9 100644 --- a/xen/include/public/hvm/hvm_op.h +++ b/xen/include/public/hvm/hvm_op.h @@ -329,8 +329,9 @@ struct xen_hvm_io_range { ioservid_t id; /* IN - server id */ uint32_t type; /* IN - type of range */ # define HVMOP_IO_RANGE_PORT 0 /* I/O port range */ -# define HVMOP_IO_RANGE_MEMORY 1 /* MMIO range */ +# define HVMOP_IO_RANGE_MMIO 1 /* MMIO range */ # define HVMOP_IO_RANGE_PCI2 /* PCI segment/bus/dev/func range */ +# define HVMOP_IO_RANGE_MEMORY 3 /* MEMORY range */ This looks problematic. Maybe you can get away with this because this is a toolstack-only interface? Indeed, the old name is a bit problematic. Presumably re-use like this would require an interface version change and some if-defery. I assume it is an interface used by qemu, so this patch in its currently state will break things. If QEMU were re-built against the updated header, yes. Thank you, Andrew Paul. :) Are you referring to the xen_map/unmap_memory_section routines in QEMU? I noticed they are called by xen_region_add/del in QEMU. And I wonder, are these 2 routines used to track a memory region or to track a MMIO region? If the region to be added is a MMIO, I guess the new interface should be fine, but if it is memory region to be added into ioreq server, maybe a patch in QEMU is necessary(e.g. use some if-defery for this new interface version you suggested)? Thanks Yu Paul ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] xen/xenbus: Don't leak memory when unmapping the ring on HVM backend
On 10/08/15 19:10, Julien Grall wrote: The commit ccc9d90a9a8b5c4ad7e9708ec41f75ff9e98d61d xenbus_client: Extend interface to support multi-page ring removes the call to free_xenballooned_pages in xenbus_unmap_ring_vfree_hvm. This will result to not give back the pages to Linux and loose them forever. It only happens when the backends are running in HVM domains. Applied to for-linus-4.2 and tagged for stable, thanks. David ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH V3 1/6] x86/xsaves: enable xsaves/xrstors for pv guest
On Fri, Aug 07, 2015 at 01:44:41PM +0100, Andrew Cooper wrote: On 07/08/15 09:00, Shuai Ruan wrote: +goto skip; +} + +if ( !guest_kernel_mode(v, regs) || (regs-edi 0x3f) ) What does edi have to do with xsaves? only edx:eax are special according to the manual. regs-edi is the guest_linear_address Whyso? xsaves takes an unconditional memory parameter, not a pointer in %rdi. (regs-edi is only correct for ins/outs because the pointer is architecturally required to be in %rdi.) You are right. The linear_address should be decoded from the instruction. There is nothing currently in emulate_privileged_op() which does ModRM decoding for memory references, nor SIB decoding. xsaves/xrstors would be the first such operations. I am also not sure that adding arbitrary memory decode here is sensible. In an ideal world, we would have what is currently x86_emulate() split in 3 stages. Stage 1 does straight instruction decode to some internal representation. Stage 2 does an audit to see whether the decoded instruction is plausible for the reason why an emulation was needed. We have had a number of security issues with emulation in the past where guests cause one instruction to trap for emulation, then rewrite the instruction to be something else, and exploit a bug in the emulator. Stage 3 performs the actions required for emulation. Currently, x86_emulate() is limited to instructions which might legitimately fault for emulation, but with the advent of VM introspection, this is proving to be insufficient. With my x86 maintainers hat on, I would like to avoid the current situation we have with multiple bits of code doing x86 instruction decode and emulation (which are all different). I think the 3-step approach above caters suitably to all usecases, but it is a large project itself. It allows the introspection people to have a full and complete x86 emulation infrastructure, while also preventing areas like the shadow paging from being opened up to potential vulnerabilities in unrelated areas of the x86 architecture. I would even go so far as to say that it is probably ok not to support xsaves/xrestors in PV guests until something along the above lines is sorted. The first feature in XSS is processor trace which a PV guest couldn't use anyway. I suspect the same applies to most of the other Why PV guest couldn't use precessor trace? XSS features, or they wouldn't need to be privileged in the first place. Thanks for your such detail suggestions. For xsaves/xrstors would also bring other benefits for PV guest such as saving memory of XSAVE area. If we do not support xsaves/xrstors in PV , PV guest would lose these benefits. What's your opinions toward this? + +if ( !cpu_has_xsaves || !(v-arch.pv_vcpu.ctrlreg[4] + X86_CR4_OSXSAVE)) +{ +do_guest_trap(TRAP_invalid_op, regs, 0); +goto skip; +} + +if ( v-arch.pv_vcpu.ctrlreg[0] X86_CR0_TS ) +{ +do_guest_trap(TRAP_nmi, regs, 0); +goto skip; +} + +if ( !guest_kernel_mode(v, regs) || (regs-edi 0x3f) ) +goto fail; + +if ( (rc = copy_from_user(guest_xsave_area, (void *) regs-edi, + sizeof(struct xsave_struct))) !=0 ) +{ +propagate_page_fault(regs-edi + + sizeof(struct xsave_struct) - rc, 0); +goto skip; Surely you just need the xstate_bv and xcomp_bv ? I will dig into SDM to see whether I missing some checkings. What I mean by this is that xstate_bv and xcomp_bv are all that you are checking, so you just need two uint64_t's, rather than a full xsave_struct. Sorry to misunderstand your meaning. default: diff --git a/xen/arch/x86/x86_64/mm.c b/xen/arch/x86/x86_64/mm.c index 98310f3..de94ac1 100644 --- a/xen/arch/x86/x86_64/mm.c +++ b/xen/arch/x86/x86_64/mm.c @@ -48,6 +48,58 @@ l2_pgentry_t __section(.bss.page_aligned) l2_bootmap[L2_PAGETABLE_ENTRIES]; l2_pgentry_t *compat_idle_pg_table_l2; +unsigned long do_page_walk_mfn(struct vcpu *v, unsigned long addr) What is this function? Why is it useful? Something like this belongs in its own patch along with a description of why it is being introduced. The fucntion is used for getting the mfn related to guest linear address. Is there an another existing function I can use that can do the same thing? Can you give me a suggestion. do_page_walk() and use virt_to_mfn() on the result? (I am just guessing, but +{ +asm volatile ( .byte 0x48,0x0f,0xc7,0x2f +: =m
Re: [Xen-devel] Second regression due to libxl: Remove linux udev rules (2ba368d13893402b2f1fb3c283ddcc714659dd9b)
On Fri, 2015-08-07 at 10:54 -0400, Konrad Rzeszutek Wilk wrote: I've looked into this, and AFAICT you were probably using the udev rules (you have run_hotplug_scripts=0 in xl.conf?) before 2ba368, and Correct. I think I needed that for driver domains and had left it in there. The intention was that xl devd would be run in the driver domain too, I added an initscript for that purpose last week (or was it two weeks ago?) but you could also just arrange for it to happen in /etc/rc or something. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v3 1/2] Differentiate IO/mem resources tracked by ioreq server
-Original Message- From: Yu, Zhang [mailto:yu.c.zh...@linux.intel.com] Sent: 11 August 2015 09:41 To: Paul Durrant; Andrew Cooper; Wei Liu Cc: Kevin Tian; Keir (Xen.org); Ian Campbell; xen-devel@lists.xen.org; Stefano Stabellini; zhiyuan...@intel.com; jbeul...@suse.com; Ian Jackson Subject: Re: [Xen-devel] [PATCH v3 1/2] Differentiate IO/mem resources tracked by ioreq server On 8/11/2015 4:25 PM, Paul Durrant wrote: -Original Message- From: Yu, Zhang [mailto:yu.c.zh...@linux.intel.com] Sent: 11 August 2015 08:57 To: Paul Durrant; Andrew Cooper; Wei Liu Cc: xen-devel@lists.xen.org; Ian Jackson; Stefano Stabellini; Ian Campbell; Keir (Xen.org); jbeul...@suse.com; Kevin Tian; zhiyuan...@intel.com Subject: Re: [PATCH v3 1/2] Differentiate IO/mem resources tracked by ioreq server On 8/10/2015 6:57 PM, Paul Durrant wrote: -Original Message- From: Andrew Cooper [mailto:andrew.coop...@citrix.com] Sent: 10 August 2015 11:56 To: Paul Durrant; Wei Liu; Yu Zhang Cc: xen-devel@lists.xen.org; Ian Jackson; Stefano Stabellini; Ian Campbell; Keir (Xen.org); jbeul...@suse.com; Kevin Tian; zhiyuan...@intel.com Subject: Re: [PATCH v3 1/2] Differentiate IO/mem resources tracked by ioreq server On 10/08/15 09:33, Paul Durrant wrote: -Original Message- From: Wei Liu [mailto:wei.l...@citrix.com] Sent: 10 August 2015 09:26 To: Yu Zhang Cc: xen-devel@lists.xen.org; Paul Durrant; Ian Jackson; Stefano Stabellini; Ian Campbell; Wei Liu; Keir (Xen.org); jbeul...@suse.com; Andrew Cooper; Kevin Tian; zhiyuan...@intel.com Subject: Re: [PATCH v3 1/2] Differentiate IO/mem resources tracked by ioreq server On Mon, Aug 10, 2015 at 11:33:40AM +0800, Yu Zhang wrote: Currently in ioreq server, guest write-protected ram pages are tracked in the same rangeset with device mmio resources. Yet unlike device mmio, which can be in big chunks, the guest write- protected pages may be discrete ranges with 4K bytes each. This patch uses a seperate rangeset for the guest ram pages. And a new ioreq type, IOREQ_TYPE_MEM, is defined. Note: Previously, a new hypercall or subop was suggested to map write-protected pages into ioreq server. However, it turned out handler of this new hypercall would be almost the same with the existing pair - HVMOP_[un]map_io_range_to_ioreq_server, and there's already a type parameter in this hypercall. So no new hypercall defined, only a new type is introduced. Signed-off-by: Yu Zhang yu.c.zh...@linux.intel.com --- tools/libxc/include/xenctrl.h| 39 +++- -- tools/libxc/xc_domain.c | 59 ++-- FWIW the hypercall wrappers look correct to me. diff --git a/xen/include/public/hvm/hvm_op.h b/xen/include/public/hvm/hvm_op.h index 014546a..9106cb9 100644 --- a/xen/include/public/hvm/hvm_op.h +++ b/xen/include/public/hvm/hvm_op.h @@ -329,8 +329,9 @@ struct xen_hvm_io_range { ioservid_t id; /* IN - server id */ uint32_t type; /* IN - type of range */ # define HVMOP_IO_RANGE_PORT 0 /* I/O port range */ -# define HVMOP_IO_RANGE_MEMORY 1 /* MMIO range */ +# define HVMOP_IO_RANGE_MMIO 1 /* MMIO range */ # define HVMOP_IO_RANGE_PCI2 /* PCI segment/bus/dev/func range */ +# define HVMOP_IO_RANGE_MEMORY 3 /* MEMORY range */ This looks problematic. Maybe you can get away with this because this is a toolstack-only interface? Indeed, the old name is a bit problematic. Presumably re-use like this would require an interface version change and some if-defery. I assume it is an interface used by qemu, so this patch in its currently state will break things. If QEMU were re-built against the updated header, yes. Thank you, Andrew Paul. :) Are you referring to the xen_map/unmap_memory_section routines in QEMU? I noticed they are called by xen_region_add/del in QEMU. And I wonder, are these 2 routines used to track a memory region or to track a MMIO region? If the region to be added is a MMIO, I guess the new interface should be fine, but if it is memory region to be added into ioreq server, maybe a patch in QEMU is necessary(e.g. use some if-defery for this new interface version you suggested)? I was forgetting that QEMU uses libxenctrl so your change to xc_hvm_map_io_range_to_ioreq_server() means everything will continue to work as before. There is still the (admittedly academic) problem of some unknown emulator out there that rolls its own hypercalls and blindly updates to the new version of hvm_op.h suddenly starting to register memory ranges rather than mmio ranges though. I would leave the existing definitions as-is and come up with a new name. So, how about we keep the HVMOP_IO_RANGE_MEMORY name for MMIO, and use a new one, say HVMOP_IO_RANGE_WP_MEM, for
Re: [Xen-devel] About Xen bridged pci devices and suspend/resume for the X10SAE motherboard
On Mon, 2015-08-10 at 10:47 -0400, Konrad Rzeszutek Wilk wrote: On Mon, Aug 10, 2015 at 05:14:28PM +0300, M. Ivanov wrote: On Mon, 2015-08-10 at 09:58 -0400, Konrad Rzeszutek Wilk wrote: On Mon, Aug 10, 2015 at 02:11:38AM +0300, M. Ivanov wrote: Hello, excuse me for bothering you, but I've read an old thread on a mailing list about X10SAE compatibility. http://lists.xen.org/archives/html/xen-devel/2014-02/msg02111.html CC-ing Xen devel. Currently I own this board and am trying to use it with Xen and be able to suspend and resume. But I am getting errors from the USB 3 Renesas controller about parity in my bios event log, and my system hangs on resume, so I was wondering if that is connected to the bridge(tundra) you've mentioned. Did you update the BIOS to the latest version? Will updating to version 3 solve my issue? Can you do a suspend/resume on your X10SAE? It did work at some point. I will find out when I am at home later today. Looking forward to your reply and am really thankful for your time, so far I've tried changing many of the settings in the bios, fiddling with Xen's kernel params, blacklisting the xhci driver, doing a xl detach. The only thing I haven't done yet is updating the bios, but Supermicro's support couldn't give me a changelog: The primary objective for ver3.0 BIOS release is to support Intel Broadwell CPUs We do not know if BIOS update will fix the issue you are seeing as we never tested it with Xen. I will be very glad if you could share any information regarding this matter. Best regards, M. Ivanov signature.asc Description: This is a digitally signed message part ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] About Xen bridged pci devices and suspend/resume for the X10SAE motherboard
On Mon, 2015-08-10 at 09:58 -0400, Konrad Rzeszutek Wilk wrote: On Mon, Aug 10, 2015 at 02:11:38AM +0300, M. Ivanov wrote: Hello, excuse me for bothering you, but I've read an old thread on a mailing list about X10SAE compatibility. http://lists.xen.org/archives/html/xen-devel/2014-02/msg02111.html CC-ing Xen devel. Currently I own this board and am trying to use it with Xen and be able to suspend and resume. But I am getting errors from the USB 3 Renesas controller about parity in my bios event log, and my system hangs on resume, so I was wondering if that is connected to the bridge(tundra) you've mentioned. Did you update the BIOS to the latest version? Will updating to version 3 solve my issue? Can you do a suspend/resume on your X10SAE? I will be very glad if you could share any information regarding this matter. Best regards, M. Ivanov signature.asc Description: This is a digitally signed message part ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v3 3/4] x86/pvh: Handle hypercalls for 32b PVH guests
On 24.07.15 at 20:35, boris.ostrov...@oracle.com wrote: On 07/23/2015 10:21 AM, Jan Beulich wrote: On 11.07.15 at 00:20, boris.ostrov...@oracle.com wrote: Signed-off-by: Boris Ostrovsky boris.ostrov...@oracle.com --- Changes in v3: * Defined compat_mmuext_op(). (XEN_GUEST_HANDLE_PARAM(mmuext_op_compat_t) is not defined in header files so I used 'void' type. How is it not? It's in compat/xen.h (which is a generated header). compat/xen.h has DEFINE_COMPAT_HANDLE(mmuext_op_compat_t) (which is __compat_handle_mmuext_op_compat_t). We need XEN_GUEST_HANDLE(mmuext_op_compat_t), which is __guest_handle_mmuext_op_compat_t. And I wasn't sure it's worth explicitly adding it to a header file (like I think what we do for vcpu_runstate_info_compat_t in sched.h); Hmm, indeed all other compat_..._op()-s use void handles (albeit in most if not all of the cases their native counterparts do too). So I guess using void here is fine then, or using COMPAT_HANDLE() instead. It's not really relevant anyway since COMPAT_CALL() casts the function pointer to the intended type anyway. @@ -4981,7 +5003,7 @@ int hvm_do_hypercall(struct cpu_user_regs *regs) return viridian_hypercall(regs); if ( (eax = NR_hypercalls) || - (is_pvh_domain(currd) ? !pvh_hypercall64_table[eax] + (is_pvh_domain(currd) ? !pvh_hypercall32_table[eax] : !hvm_hypercall32_table[eax]) ) ... this will break (as we're assuming 32- and 64-bit tables to be fully in sync here; there's still the pending work item of constructing these tables so that this has a better chance of not getting broken). So you prefer to have full check --- explicitly for both 32- and 64-bit, right? No. Just adding the missing operation to the table will deal with it. I wouldn't like to see more conditionals to be added to this code path when we can avoid doing so. What we could do is add a respective ASSERT() to the 64-bit path, albeit the NULL deref would be observable as a fault without the ASSERT() too (and adding one wouldn't help release builds [and their security]). Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Design doc of adding ACPI support for arm64 on Xen - version 2
On 11/08/15 03:09, Shannon Zhao wrote: Hi Julien, Hi Shannon, On 2015/8/7 18:33, Julien Grall wrote: Hi Shannon, Just some clarification questions. On 07/08/15 03:11, Shannon Zhao wrote: 3. Dom0 gets grant table and event channel irq information --- As said above, we assign the hypervisor_id be XenVMM to tell Dom0 that it runs on Xen hypervisor. For grant table, add two new HVM_PARAMs: HVM_PARAM_GNTTAB_START_ADDRESS and HVM_PARAM_GNTTAB_SIZE. For event channel irq, reuse HVM_PARAM_CALLBACK_IRQ and add a new delivery type: val[63:56] == 3: val[15:8] is flag: val[7:0] is a PPI (ARM and ARM64 only) Can you describe the content of flag? This needs definition as well. I think it could use the definition of xenv table. Bit 0 stands interrupt mode and bit 1 stands interrupt polarity. And explain it in the comment of HVM_PARAM_CALLBACK_IRQ. That would be fine for me. When constructing Dom0 in Xen, save these values. Then Dom0 could get them through hypercall HVMOP_get_param. 4. Map MMIO regions --- Register a bus_notifier for platform and amba bus in Linux. Add a new XENMAPSPACE XENMAPSPACE_dev_mmio. Within the register, check if the device is newly added, then call hypercall XENMEM_add_to_physmap to map the mmio regions. 5. Route device interrupts to Dom0 -- Route all the SPI interrupts to Dom0 before Dom0 booting. Not all the SPI will be routed to DOM0. Some are used by Xen and should never be used by any guest. I have in mind the UART and SMMU interrupts. You will have to find away to skip them nicely. Note that not all the IRQs used by Xen are properly registered when we build DOM0 (see the SMMU). To uart, we can get the interrupt information from SPCR table and hide it from Dom0. Can you clarify your meaning of hide from DOM0? Did you mean avoid to route the SPI to DOM0? IIUC, currently Xen (as well as Linux) doesn't support use SMMU when booting with ACPI. When it supports, it could read the interrupts information from IORT table and Hide them from Dom0. Well for Xen we don't even have ACPI supported upstream ;). For Linux there is some on-going work. Anyway, this is not important right now. -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v3 2/4] x86/compat: Test both PV and PVH guests for compat mode
On 24.07.15 at 19:54, boris.ostrov...@oracle.com wrote: On 07/23/2015 10:07 AM, Jan Beulich wrote: Plus - is this in line with what the tools are doing? Aren't they assuming !PV = native format context? I.e. don't you need to treat differently v-domain == current-domain and its opposite? Roger btw. raised a similar question on IRC earlier today... Not sure I understand this. You mean for copying 64-bit guest's info into 32-bit dom0? Basically yes - tool stack and guest invocations may need to behave differently. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [RFC 3/4] HVM x86 deprivileged mode: Code for switching into/out of deprivileged mode
At 11:14 +0100 on 10 Aug (1439205273), Andrew Cooper wrote: On 10/08/15 10:49, Tim Deegan wrote: Hi, At 17:45 +0100 on 06 Aug (1438883118), Ben Catterall wrote: The process to switch into and out of deprivileged mode can be likened to setjmp/longjmp. To enter deprivileged mode, we take a copy of the stack from the guest's registers up to the current stack pointer. This copy is pretty unfortunate, but I can see that avoiding it will be a bit complex. Could we do something with more stacks? AFAICS there have to be three stacks anyway: - one to hold the depriv execution context; - one to hold the privileged execution context; and - one to take interrupts on. So maybe we could do some fiddling to make Xen take interrupts on a different stack while we're depriv'd? That should happen naturally by virtue of the privilege level change involved in taking the interrupt. Right, and this is why we need a third stack - so interrupts don't trash the existing priv state on the 'normal' Xen stack. And so we either need to copy the priv stack out (and maybe copy it back), or tell the CPU to use a different stack. If we had enough headroom, we could try to be clever and tell the CPU to take interrupts on the priv stack _below_ the existing state. That would avoid the first of your problems below. * Under this model, PV exception handlers should copy themselves onto the privileged execution stack. * Currently, the IST handlers copy themselves onto the primary stack if they interrupt guest context. * AMD Task Register on vmexit. (this old gem) Gah, this thing. :( Tim. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v4 02/11] x86/intel_pstate: add some calculation related support
On 27.07.15 at 07:48, wei.w.w...@intel.com wrote: +/* + * clamp_t - return a value clamped to a given range using a given +type + * @type: the type of variable to use + * @val: current value + * @lo: minimum allowable value + * @hi: maximum allowable value + * + * This macro does no typechecking and uses temporary variables of +type + * 'type' to make all the comparisons. + */ +#define clamp_t(type, val, lo, hi) min_t(type, max_t(type, val, lo), +hi) Shouldn't you also add a type checking variant then (which ought to be used instead of the one above wherever possible)? Hi Jan, I think the max_t() and min_t() have handled the typechecking thing, maybe we do not need to do it again here. If you have a different opinion, how should we do a typechecking here? Is the following what you expected? #define clamp_t(type, val, lo, hi)\ ({ type _val = (val); \ type _lo = (lo); \ type _hi = (hi); \ min_t(type, max_t(type, _val, _lo), _hi) }) I don't think you understood: I asked for a clamp() to accompany clamp_t(), just like e.g. max_t() is a less preferred sibling of max(). Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Clarification regarding xen toolstack for booting a pv guest
On Tue, Aug 11, 2015 at 01:51:10AM +0100, Wei Liu wrote: On Mon, Aug 10, 2015 at 05:00:51PM -0700, sainath grandhi wrote: Hello all, I was measuring amount of time taken on host by the Xen toolstack while launching a PV guest. I notice that there is around 2-3 seconds of time spent on dom0 by toolstack before guest starts executing. Significant amount of time is taken in the function xc_dom_boot_mem_init, around 2 seconds for a guest memory of 2 GB This code does allocate guest memory and mappings and amount of time this function takes increases proportionally with increasing requested guest memory in the guest config file. Has anyone noticed similar thing? Is ~2 seconds of wall clock time reasonable for guest memory mapping? I guess it is because Dom0 has to balloon down to free up memory for guest. What is your Xen command line? Have you tried putting dom0_mem=512M,max:512M there? And I forgot to mention there is a patch to make PV guest creation faster by using superpage and batching. See 415b58c1 and 826ca36fa3 in xen-unstable tree. Wei. Wei. Thanks ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH V3 3/6] x86/xsaves: enable xsaves/xrstors for hvm guest
On Fri, Aug 07, 2015 at 02:04:51PM +0100, Andrew Cooper wrote: On 07/08/15 09:22, Shuai Ruan wrote: void hvm_cpuid(unsigned int input, unsigned int *eax, unsigned int *ebx, unsigned int *ecx, unsigned int *edx) { @@ -4456,6 +4460,34 @@ void hvm_cpuid(unsigned int input, unsigned int *eax, unsigned int *ebx, *ebx = _eax + _ebx; } } +if ( count == 1 ) +{ +if ( cpu_has_xsaves ) +{ +*ebx = XSTATE_AREA_MIN_SIZE; +if ( v-arch.xcr0 | v-arch.msr_ia32_xss ) +for ( sub_leaf = 2; sub_leaf 63; sub_leaf++ ) +{ +if ( !((v-arch.xcr0 | v-arch.msr_ia32_xss) + (1ULL sub_leaf)) ) +continue; +domain_cpuid(d, input, sub_leaf, _eax, _ebx, _ecx, + _edx); +*ebx = *ebx + _eax; +} +} +else +{ +*eax = ~XSAVES; +*ebx = *ecx = *edx = 0; +} +if ( !cpu_has_xgetbv1 ) +*eax = ~XGETBV1; +if ( !cpu_has_xsavec ) +*eax = ~XSAVEC; +if ( !cpu_has_xsaveopt ) +*eax = ~XSAVEOPT; +} Urgh - I really need to get domain cpuid fixed in Xen. This is currently making a very bad situation a little worse. In patch 4, I expose the xsaves/xsavec/xsaveopt and need to check whether the hardware supoort it. What's your suggestion about this? Calling into domain_cpuid() in the loop is not useful as nothing will set the subleaves up. As a first pass, reading from xstate_{offsets,sizes} will be better than nothing, as it will at least What do you mean by xstate_{offsets,sizes}? match reality until the domain is migrated. For CPUID(eax=0dh) with subleaf 1, the value of ebx will change according to the v-arch.xcr0 | v-arch.msr_ia32_xss. So add code in hvm_cpuid function is the best way I can think of. Your suggestions :)? Longterm, I plan to overhaul the cpuid infrastructure to allow it to properly represent per-core and per-package data, as well as move it into the Xen architectural migration state, to avoid any host specific values leaking into guest state. This however is also a lot of work, which you don't want to dependent on. static int construct_vmcs(struct vcpu *v) { struct domain *d = v-domain; @@ -1204,6 +1206,9 @@ static int construct_vmcs(struct vcpu *v) __vmwrite(GUEST_PAT, guest_pat); } +if ( cpu_has_vmx_xsaves ) +__vmwrite(XSS_EXIT_BITMAP, VMX_XSS_EXIT_BITMAP); + vmx_vmcs_exit(v); /* PVH: paging mode is updated by arch_set_info_guest(). */ diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c index d3183a8..64ff63b 100644 --- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -2708,6 +2708,16 @@ static int vmx_handle_apic_write(void) return vlapic_apicv_write(current, exit_qualification 0xfff); } +static void vmx_handle_xsaves(void) +{ +WARN(); +} + +static void vmx_handle_xrstors(void) +{ +WARN(); +} + What is these supposed to do? They are not an appropriate handlers. These two handlers do nothing here. Perform xsaves in HVM guest will not trap in hypersior in this patch (by setting XSS_EXIT_BITMAP zero). However it may trap in the future. See SDM Volume 3 Section 25.1.3 for detail information. in which case use domain_crash(). WARN() here will allow a guest to DoS Xen. I will change this in next version. ~Andrew Thanks for your review ,Andrew. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH V3 3/6] x86/xsaves: enable xsaves/xrstors for hvm guest
On 11/08/15 08:59, Shuai Ruan wrote: On Fri, Aug 07, 2015 at 02:04:51PM +0100, Andrew Cooper wrote: On 07/08/15 09:22, Shuai Ruan wrote: void hvm_cpuid(unsigned int input, unsigned int *eax, unsigned int *ebx, unsigned int *ecx, unsigned int *edx) { @@ -4456,6 +4460,34 @@ void hvm_cpuid(unsigned int input, unsigned int *eax, unsigned int *ebx, *ebx = _eax + _ebx; } } +if ( count == 1 ) +{ +if ( cpu_has_xsaves ) +{ +*ebx = XSTATE_AREA_MIN_SIZE; +if ( v-arch.xcr0 | v-arch.msr_ia32_xss ) +for ( sub_leaf = 2; sub_leaf 63; sub_leaf++ ) +{ +if ( !((v-arch.xcr0 | v-arch.msr_ia32_xss) + (1ULL sub_leaf)) ) +continue; +domain_cpuid(d, input, sub_leaf, _eax, _ebx, _ecx, + _edx); +*ebx = *ebx + _eax; +} +} +else +{ +*eax = ~XSAVES; +*ebx = *ecx = *edx = 0; +} +if ( !cpu_has_xgetbv1 ) +*eax = ~XGETBV1; +if ( !cpu_has_xsavec ) +*eax = ~XSAVEC; +if ( !cpu_has_xsaveopt ) +*eax = ~XSAVEOPT; +} Urgh - I really need to get domain cpuid fixed in Xen. This is currently making a very bad situation a little worse. In patch 4, I expose the xsaves/xsavec/xsaveopt and need to check whether the hardware supoort it. What's your suggestion about this? Calling into domain_cpuid() in the loop is not useful as nothing will set the subleaves up. As a first pass, reading from xstate_{offsets,sizes} will be better than nothing, as it will at least What do you mean by xstate_{offsets,sizes}? Shorthand for xstate_offsets xstate_sizes, per the standard shell expansion. match reality until the domain is migrated. For CPUID(eax=0dh) with subleaf 1, the value of ebx will change according to the v-arch.xcr0 | v-arch.msr_ia32_xss. So add code in hvm_cpuid function is the best way I can think of. Your suggestions :)? Which is liable to change on different hardware. Once a vm has migrated, Xen may not legitimately execute another cpuid instruction as part of emulating guest cpuid, as it is not necessarily accurate. Xen currently does not currently have proper cpuid encapsulation, which causes host-specific details to leak into guests across migrate. I have a longterm plan to fix it, but it is not simple or quick to do. In this case, reading from xstate_{offsets,sizes} is better than nothing, but will need fixing in the longterm. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [RFC 3/4] HVM x86 deprivileged mode: Code for switching into/out of deprivileged mode
On Thu, 2015-08-06 at 21:55 +0100, Andrew Cooper wrote: On 06/08/15 17:45, Ben Catterall wrote: The process to switch into and out of deprivileged mode can be likened to setjmp/longjmp. To enter deprivileged mode, we take a copy of the stack from the guest's registers up to the current stack pointer. This allows us to restore the stack when we have finished the deprivileged mode operation, meaning we can continue execution from that point. This is similar to if a context switch had happened. To exit deprivileged mode, we copy the stack back, replacing the current stack. We can then continue execution from where we left off, which will unwind the stack and free up resources. This method means that we do not need to change any other code paths and its invocation will be transparent to callers. This should allow the feature to be more easily deployed to different parts of Xen. Note that this copy of the stack is per-vcpu but, it will contain per -pcpu data. Extra work is needed to properly migrate vcpus between pcpus. Under what circumstances do you see there being persistent state in the depriv area between calls, given that the calls are synchronous from VM actions? Would we not want to keep (some of) the device model's state in a depriv area? e.g. anything which is purely internal to the DM which is therefore only accessed from depriv-land? Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v3 07/32] xen/x86: fix arch_set_info_guest for HVM guests
On 04.08.15 at 20:08, andrew.coop...@citrix.com wrote: On 03/08/15 18:31, Roger Pau Monné wrote: struct vcpu_hvm_x86_16 { uint16_t ax; uint16_t cx; uint16_t dx; uint16_t bx; uint16_t sp; uint16_t bp; uint16_t si; uint16_t di; uint16_t ip; uint32_t cr[8]; Definitely no need for anything other than cr0 and 4 in 16 bit mode. uint32_t cs_base; uint32_t ds_base; uint32_t ss_base; uint32_t cs_limit; uint32_t ds_limit; uint32_t ss_limit; uint16_t cs_ar; uint16_t ds_ar; uint16_t ss_ar; }; struct vcpu_hvm_x86_32 { uint32_t eax; uint32_t ecx; uint32_t edx; uint32_t ebx; uint32_t esp; uint32_t ebp; uint32_t esi; uint32_t edi; uint32_t eip; uint32_t cr[8]; Don't need cr's 5-8. I disagree with a number of things discussed so far (like the statement above), but I guess I'll better comment on v4 than continue this thread. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v3 1/2] Differentiate IO/mem resources tracked by ioreq server
On 8/10/2015 4:26 PM, Wei Liu wrote: On Mon, Aug 10, 2015 at 11:33:40AM +0800, Yu Zhang wrote: Currently in ioreq server, guest write-protected ram pages are tracked in the same rangeset with device mmio resources. Yet unlike device mmio, which can be in big chunks, the guest write- protected pages may be discrete ranges with 4K bytes each. This patch uses a seperate rangeset for the guest ram pages. And a new ioreq type, IOREQ_TYPE_MEM, is defined. Note: Previously, a new hypercall or subop was suggested to map write-protected pages into ioreq server. However, it turned out handler of this new hypercall would be almost the same with the existing pair - HVMOP_[un]map_io_range_to_ioreq_server, and there's already a type parameter in this hypercall. So no new hypercall defined, only a new type is introduced. Signed-off-by: Yu Zhang yu.c.zh...@linux.intel.com --- tools/libxc/include/xenctrl.h| 39 +++--- tools/libxc/xc_domain.c | 59 ++-- FWIW the hypercall wrappers look correct to me. diff --git a/xen/include/public/hvm/hvm_op.h b/xen/include/public/hvm/hvm_op.h index 014546a..9106cb9 100644 --- a/xen/include/public/hvm/hvm_op.h +++ b/xen/include/public/hvm/hvm_op.h @@ -329,8 +329,9 @@ struct xen_hvm_io_range { ioservid_t id; /* IN - server id */ uint32_t type; /* IN - type of range */ # define HVMOP_IO_RANGE_PORT 0 /* I/O port range */ -# define HVMOP_IO_RANGE_MEMORY 1 /* MMIO range */ +# define HVMOP_IO_RANGE_MMIO 1 /* MMIO range */ # define HVMOP_IO_RANGE_PCI2 /* PCI segment/bus/dev/func range */ +# define HVMOP_IO_RANGE_MEMORY 3 /* MEMORY range */ This looks problematic. Maybe you can get away with this because this is a toolstack-only interface? Thanks Wei. Well, I believe this interface could be used both by the backend device driver and qemu as well(which I neglected). :-) Yu Wei. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] how can I find hypercall page address?
On 11/08/15 03:44, big strong wrote: My goal is to intercept hyprcalls to detect malicious calls. So I need firstly find where the hypercalls are. As I have said before, a guest may have an arbitrary number of hypercall pages. Furthermore, the hypercall page is merely a convenience; nothing prevents a guest manually issuing hypercalls. My plan is to locate hypercall page first, then walk through the hypercall page to get address of hyperccalls. If there is any other solutions, please let me know. Thanks very much. It sounds like you want VM introspection, but it doesn't work like this. try http://libvmi.com/ as a starting point. ~Andrew 2015-08-10 23:04 GMT+08:00 Dario Faggioli dario.faggi...@citrix.com mailto:dario.faggi...@citrix.com: On Sat, 2015-08-08 at 08:02 +0800, big strong wrote: I think I've stated clearly what I want to do. Well... |I want to locate the hypercall page address when creating a new domU, so as to locate hypercalls. Ok. What for? Dario -- This happens because I choose it to happen! (Raistlin Majere) - Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems RD Ltd., Cambridge (UK) ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [xen-unstable test] 60647: tolerable FAIL
flight 60647 xen-unstable real [real] http://logs.test-lab.xenproject.org/osstest/logs/60647/ Failures :-/ but no regressions. Tests which are failing intermittently (not blocking): test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsm 13 guest-localmigrate fail pass in 60639 test-amd64-amd64-pygrub 6 xen-bootfail pass in 60639 Regressions which are regarded as allowable (not blocking): test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsm 16 guest-localmigrate/x10 fail in 60639 like 60624 test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm 9 debian-hvm-install fail like 60639 test-armhf-armhf-xl-rtds 11 guest-start fail like 60639 test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop fail like 60639 test-amd64-amd64-xl-qemut-win7-amd64 17 guest-stop fail like 60639 Tests which did not succeed, but are not blocking: test-amd64-amd64-xl-pvh-amd 11 guest-start fail never pass test-armhf-armhf-xl-vhd 9 debian-di-installfail never pass test-armhf-armhf-libvirt-raw 9 debian-di-installfail never pass test-amd64-amd64-xl-pvh-intel 11 guest-start fail never pass test-amd64-amd64-libvirt-pair 21 guest-migrate/src_host/dst_host fail never pass test-armhf-armhf-xl-qcow2 9 debian-di-installfail never pass test-armhf-armhf-libvirt-qcow2 9 debian-di-installfail never pass test-amd64-amd64-libvirt 12 migrate-support-checkfail never pass test-amd64-i386-libvirt-pair 21 guest-migrate/src_host/dst_host fail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check fail never pass test-armhf-armhf-xl-arndale 12 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 13 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-qcow2 11 migrate-support-checkfail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check fail never pass test-amd64-i386-libvirt-raw 11 migrate-support-checkfail never pass test-armhf-armhf-xl-xsm 13 saverestore-support-checkfail never pass test-armhf-armhf-xl-xsm 12 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 13 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit2 12 migrate-support-checkfail never pass test-amd64-i386-libvirt-qcow2 11 migrate-support-checkfail never pass test-amd64-amd64-libvirt-xsm 12 migrate-support-checkfail never pass test-armhf-armhf-libvirt-xsm 12 migrate-support-checkfail never pass test-armhf-armhf-libvirt-xsm 14 guest-saverestorefail never pass test-armhf-armhf-xl-multivcpu 13 saverestore-support-checkfail never pass test-armhf-armhf-xl-multivcpu 12 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 12 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 13 saverestore-support-checkfail never pass test-amd64-i386-libvirt-xsm 12 migrate-support-checkfail never pass test-amd64-i386-libvirt 12 migrate-support-checkfail never pass test-amd64-i386-libvirt-vhd 11 migrate-support-checkfail never pass test-armhf-armhf-xl 12 migrate-support-checkfail never pass test-armhf-armhf-xl 13 saverestore-support-checkfail never pass test-amd64-i386-xl-qemut-win7-amd64 17 guest-stop fail never pass test-armhf-armhf-libvirt-vhd 9 debian-di-installfail never pass test-armhf-armhf-xl-raw 9 debian-di-installfail never pass test-amd64-amd64-libvirt-raw 11 migrate-support-checkfail never pass test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stop fail never pass test-amd64-amd64-libvirt-vhd 11 migrate-support-checkfail never pass test-armhf-armhf-libvirt 14 guest-saverestorefail never pass test-armhf-armhf-libvirt 12 migrate-support-checkfail never pass version targeted for testing: xen 201eac83831d94ba2e9a63a7eed4c128633fafb1 baseline version: xen 201eac83831d94ba2e9a63a7eed4c128633fafb1 Last test of basis60647 2015-08-10 08:59:04 Z0 days Testing same since0 1970-01-01 00:00:00 Z 16658 days0 attempts jobs: build-amd64-xsm pass build-armhf-xsm pass build-i386-xsm pass build-amd64 pass build-armhf pass build-i386 pass build-amd64-libvirt pass build-armhf-libvirt
Re: [Xen-devel] [PATCH v3 1/2] Differentiate IO/mem resources tracked by ioreq server
On 8/11/2015 4:25 PM, Paul Durrant wrote: -Original Message- From: Yu, Zhang [mailto:yu.c.zh...@linux.intel.com] Sent: 11 August 2015 08:57 To: Paul Durrant; Andrew Cooper; Wei Liu Cc: xen-devel@lists.xen.org; Ian Jackson; Stefano Stabellini; Ian Campbell; Keir (Xen.org); jbeul...@suse.com; Kevin Tian; zhiyuan...@intel.com Subject: Re: [PATCH v3 1/2] Differentiate IO/mem resources tracked by ioreq server On 8/10/2015 6:57 PM, Paul Durrant wrote: -Original Message- From: Andrew Cooper [mailto:andrew.coop...@citrix.com] Sent: 10 August 2015 11:56 To: Paul Durrant; Wei Liu; Yu Zhang Cc: xen-devel@lists.xen.org; Ian Jackson; Stefano Stabellini; Ian Campbell; Keir (Xen.org); jbeul...@suse.com; Kevin Tian; zhiyuan...@intel.com Subject: Re: [PATCH v3 1/2] Differentiate IO/mem resources tracked by ioreq server On 10/08/15 09:33, Paul Durrant wrote: -Original Message- From: Wei Liu [mailto:wei.l...@citrix.com] Sent: 10 August 2015 09:26 To: Yu Zhang Cc: xen-devel@lists.xen.org; Paul Durrant; Ian Jackson; Stefano Stabellini; Ian Campbell; Wei Liu; Keir (Xen.org); jbeul...@suse.com; Andrew Cooper; Kevin Tian; zhiyuan...@intel.com Subject: Re: [PATCH v3 1/2] Differentiate IO/mem resources tracked by ioreq server On Mon, Aug 10, 2015 at 11:33:40AM +0800, Yu Zhang wrote: Currently in ioreq server, guest write-protected ram pages are tracked in the same rangeset with device mmio resources. Yet unlike device mmio, which can be in big chunks, the guest write- protected pages may be discrete ranges with 4K bytes each. This patch uses a seperate rangeset for the guest ram pages. And a new ioreq type, IOREQ_TYPE_MEM, is defined. Note: Previously, a new hypercall or subop was suggested to map write-protected pages into ioreq server. However, it turned out handler of this new hypercall would be almost the same with the existing pair - HVMOP_[un]map_io_range_to_ioreq_server, and there's already a type parameter in this hypercall. So no new hypercall defined, only a new type is introduced. Signed-off-by: Yu Zhang yu.c.zh...@linux.intel.com --- tools/libxc/include/xenctrl.h| 39 +++--- tools/libxc/xc_domain.c | 59 ++-- FWIW the hypercall wrappers look correct to me. diff --git a/xen/include/public/hvm/hvm_op.h b/xen/include/public/hvm/hvm_op.h index 014546a..9106cb9 100644 --- a/xen/include/public/hvm/hvm_op.h +++ b/xen/include/public/hvm/hvm_op.h @@ -329,8 +329,9 @@ struct xen_hvm_io_range { ioservid_t id; /* IN - server id */ uint32_t type; /* IN - type of range */ # define HVMOP_IO_RANGE_PORT 0 /* I/O port range */ -# define HVMOP_IO_RANGE_MEMORY 1 /* MMIO range */ +# define HVMOP_IO_RANGE_MMIO 1 /* MMIO range */ # define HVMOP_IO_RANGE_PCI2 /* PCI segment/bus/dev/func range */ +# define HVMOP_IO_RANGE_MEMORY 3 /* MEMORY range */ This looks problematic. Maybe you can get away with this because this is a toolstack-only interface? Indeed, the old name is a bit problematic. Presumably re-use like this would require an interface version change and some if-defery. I assume it is an interface used by qemu, so this patch in its currently state will break things. If QEMU were re-built against the updated header, yes. Thank you, Andrew Paul. :) Are you referring to the xen_map/unmap_memory_section routines in QEMU? I noticed they are called by xen_region_add/del in QEMU. And I wonder, are these 2 routines used to track a memory region or to track a MMIO region? If the region to be added is a MMIO, I guess the new interface should be fine, but if it is memory region to be added into ioreq server, maybe a patch in QEMU is necessary(e.g. use some if-defery for this new interface version you suggested)? I was forgetting that QEMU uses libxenctrl so your change to xc_hvm_map_io_range_to_ioreq_server() means everything will continue to work as before. There is still the (admittedly academic) problem of some unknown emulator out there that rolls its own hypercalls and blindly updates to the new version of hvm_op.h suddenly starting to register memory ranges rather than mmio ranges though. I would leave the existing definitions as-is and come up with a new name. So, how about we keep the HVMOP_IO_RANGE_MEMORY name for MMIO, and use a new one, say HVMOP_IO_RANGE_WP_MEM, for write-protected rams only? :) Thanks Yu Paul Thanks Yu Paul ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v4 for Xen 4.6 1/4] xen: enable per-VCPU parameter settings for RTDS scheduler
On 09.08.15 at 17:45, lichong...@gmail.com wrote: On Mon, Jul 13, 2015 at 3:37 AM, Jan Beulich jbeul...@suse.com wrote: On 11.07.15 at 06:52, lichong...@gmail.com wrote: @@ -1162,8 +1176,82 @@ rt_dom_cntl( } spin_unlock_irqrestore(prv-lock, flags); break; +case XEN_DOMCTL_SCHEDOP_getvcpuinfo: +spin_lock_irqsave(prv-lock, flags); +for ( index = 0; index op-u.v.nr_vcpus; index++ ) +{ +if ( copy_from_guest_offset(local_sched, + op-u.v.vcpus, index, 1) ) +{ +rc = -EFAULT; +break; +} +if ( local_sched.vcpuid = d-max_vcpus || + d-vcpu[local_sched.vcpuid] == NULL ) +{ +rc = -EINVAL; +break; +} +svc = rt_vcpu(d-vcpu[local_sched.vcpuid]); + +local_sched.s.rtds.budget = svc-budget / MICROSECS(1); +local_sched.s.rtds.period = svc-period / MICROSECS(1); + +if ( __copy_to_guest_offset(op-u.v.vcpus, index, +local_sched, 1) ) +{ +rc = -EFAULT; +break; +} +if( hypercall_preempt_check() ) +{ +rc = -ERESTART; +break; +} I still don't see how this is supposed to work. I return -ERESTART here, and the upper layer function (do_domctl) will handle this error code by calling hypercall_create_continuation. I have no idea where you found the upper layer (i.e. the XEN_DOMCTL_scheduler_op case of do_domctl() to take care of this). +} xen_domctl_schedparam_vcpu_t; +DEFINE_XEN_GUEST_HANDLE(xen_domctl_schedparam_vcpu_t); + /* Set or get info? */ #define XEN_DOMCTL_SCHEDOP_putinfo 0 #define XEN_DOMCTL_SCHEDOP_getinfo 1 +#define XEN_DOMCTL_SCHEDOP_putvcpuinfo 2 +#define XEN_DOMCTL_SCHEDOP_getvcpuinfo 3 struct xen_domctl_scheduler_op { uint32_t sched_id; /* XEN_SCHEDULER_* */ uint32_t cmd; /* XEN_DOMCTL_SCHEDOP_* */ union { -struct xen_domctl_sched_sedf { -uint64_aligned_t period; -uint64_aligned_t slice; -uint64_aligned_t latency; -uint32_t extratime; -uint32_t weight; -} sedf; -struct xen_domctl_sched_credit { -uint16_t weight; -uint16_t cap; -} credit; -struct xen_domctl_sched_credit2 { -uint16_t weight; -} credit2; -struct xen_domctl_sched_rtds { -uint32_t period; -uint32_t budget; -} rtds; +xen_domctl_sched_sedf_t sedf; +xen_domctl_sched_credit_t credit; +xen_domctl_sched_credit2_t credit2; +xen_domctl_sched_rtds_t rtds; +struct { +XEN_GUEST_HANDLE_64(xen_domctl_schedparam_vcpu_t) vcpus; +uint16_t nr_vcpus; +} v; And there's still no explicit padding here at all (nor am I convinced that uint16_t is really a good choice for nr_vcpus - uint32_t would seem more natural without causing any problems or structure growth). I think the size of union u is equal to the size of xen_domctl_sched_sedf_t, which is 64*4 bits (if vcpus in struct v is just a pointer). Which doesn't in any way address to complaint about missing explicit padding - I'm not asking you to pad to the size of the union, but to the size of the unnamed structure you add. Jan The nr_vcpus is indeed better to be uint32_t. I'll change it in the next version. Chong ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v3 1/2] Differentiate IO/mem resources tracked by ioreq server
-Original Message- From: Yu, Zhang [mailto:yu.c.zh...@linux.intel.com] Sent: 11 August 2015 08:57 To: Paul Durrant; Andrew Cooper; Wei Liu Cc: xen-devel@lists.xen.org; Ian Jackson; Stefano Stabellini; Ian Campbell; Keir (Xen.org); jbeul...@suse.com; Kevin Tian; zhiyuan...@intel.com Subject: Re: [PATCH v3 1/2] Differentiate IO/mem resources tracked by ioreq server On 8/10/2015 6:57 PM, Paul Durrant wrote: -Original Message- From: Andrew Cooper [mailto:andrew.coop...@citrix.com] Sent: 10 August 2015 11:56 To: Paul Durrant; Wei Liu; Yu Zhang Cc: xen-devel@lists.xen.org; Ian Jackson; Stefano Stabellini; Ian Campbell; Keir (Xen.org); jbeul...@suse.com; Kevin Tian; zhiyuan...@intel.com Subject: Re: [PATCH v3 1/2] Differentiate IO/mem resources tracked by ioreq server On 10/08/15 09:33, Paul Durrant wrote: -Original Message- From: Wei Liu [mailto:wei.l...@citrix.com] Sent: 10 August 2015 09:26 To: Yu Zhang Cc: xen-devel@lists.xen.org; Paul Durrant; Ian Jackson; Stefano Stabellini; Ian Campbell; Wei Liu; Keir (Xen.org); jbeul...@suse.com; Andrew Cooper; Kevin Tian; zhiyuan...@intel.com Subject: Re: [PATCH v3 1/2] Differentiate IO/mem resources tracked by ioreq server On Mon, Aug 10, 2015 at 11:33:40AM +0800, Yu Zhang wrote: Currently in ioreq server, guest write-protected ram pages are tracked in the same rangeset with device mmio resources. Yet unlike device mmio, which can be in big chunks, the guest write- protected pages may be discrete ranges with 4K bytes each. This patch uses a seperate rangeset for the guest ram pages. And a new ioreq type, IOREQ_TYPE_MEM, is defined. Note: Previously, a new hypercall or subop was suggested to map write-protected pages into ioreq server. However, it turned out handler of this new hypercall would be almost the same with the existing pair - HVMOP_[un]map_io_range_to_ioreq_server, and there's already a type parameter in this hypercall. So no new hypercall defined, only a new type is introduced. Signed-off-by: Yu Zhang yu.c.zh...@linux.intel.com --- tools/libxc/include/xenctrl.h| 39 +++--- tools/libxc/xc_domain.c | 59 ++-- FWIW the hypercall wrappers look correct to me. diff --git a/xen/include/public/hvm/hvm_op.h b/xen/include/public/hvm/hvm_op.h index 014546a..9106cb9 100644 --- a/xen/include/public/hvm/hvm_op.h +++ b/xen/include/public/hvm/hvm_op.h @@ -329,8 +329,9 @@ struct xen_hvm_io_range { ioservid_t id; /* IN - server id */ uint32_t type; /* IN - type of range */ # define HVMOP_IO_RANGE_PORT 0 /* I/O port range */ -# define HVMOP_IO_RANGE_MEMORY 1 /* MMIO range */ +# define HVMOP_IO_RANGE_MMIO 1 /* MMIO range */ # define HVMOP_IO_RANGE_PCI2 /* PCI segment/bus/dev/func range */ +# define HVMOP_IO_RANGE_MEMORY 3 /* MEMORY range */ This looks problematic. Maybe you can get away with this because this is a toolstack-only interface? Indeed, the old name is a bit problematic. Presumably re-use like this would require an interface version change and some if-defery. I assume it is an interface used by qemu, so this patch in its currently state will break things. If QEMU were re-built against the updated header, yes. Thank you, Andrew Paul. :) Are you referring to the xen_map/unmap_memory_section routines in QEMU? I noticed they are called by xen_region_add/del in QEMU. And I wonder, are these 2 routines used to track a memory region or to track a MMIO region? If the region to be added is a MMIO, I guess the new interface should be fine, but if it is memory region to be added into ioreq server, maybe a patch in QEMU is necessary(e.g. use some if-defery for this new interface version you suggested)? I was forgetting that QEMU uses libxenctrl so your change to xc_hvm_map_io_range_to_ioreq_server() means everything will continue to work as before. There is still the (admittedly academic) problem of some unknown emulator out there that rolls its own hypercalls and blindly updates to the new version of hvm_op.h suddenly starting to register memory ranges rather than mmio ranges though. I would leave the existing definitions as-is and come up with a new name. Paul Thanks Yu Paul ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH RFC v2 0/5] Multi-queue support for xen-blkfront and xen-blkback
On 11/08/15 07:08, Bob Liu wrote: On 08/10/2015 11:52 PM, Jens Axboe wrote: On 08/10/2015 05:03 AM, Rafal Mielniczuk wrote: On 01/07/15 04:03, Jens Axboe wrote: On 06/30/2015 08:21 AM, Marcus Granado wrote: Hi, Our measurements for the multiqueue patch indicate a clear improvement in iops when more queues are used. The measurements were obtained under the following conditions: - using blkback as the dom0 backend with the multiqueue patch applied to a dom0 kernel 4.0 on 8 vcpus. - using a recent Ubuntu 15.04 kernel 3.19 with multiqueue frontend applied to be used as a guest on 4 vcpus - using a micron RealSSD P320h as the underlying local storage on a Dell PowerEdge R720 with 2 Xeon E5-2643 v2 cpus. - fio 2.2.7-22-g36870 as the generator of synthetic loads in the guest. We used direct_io to skip caching in the guest and ran fio for 60s reading a number of block sizes ranging from 512 bytes to 4MiB. Queue depth of 32 for each queue was used to saturate individual vcpus in the guest. We were interested in observing storage iops for different values of block sizes. Our expectation was that iops would improve when increasing the number of queues, because both the guest and dom0 would be able to make use of more vcpus to handle these requests. These are the results (as aggregate iops for all the fio threads) that we got for the conditions above with sequential reads: fio_threads io_depth block_size 1-queue_iops 8-queue_iops 8 32 512 158K 264K 8 321K 157K 260K 8 322K 157K 258K 8 324K 148K 257K 8 328K 124K 207K 8 32 16K84K 105K 8 32 32K50K 54K 8 32 64K24K 27K 8 32 128K11K 13K 8-queue iops was better than single queue iops for all the block sizes. There were very good improvements as well for sequential writes with block size 4K (from 80K iops with single queue to 230K iops with 8 queues), and no regressions were visible in any measurement performed. Great results! And I don't know why this code has lingered for so long, so thanks for helping get some attention to this again. Personally I'd be really interested in the results for the same set of tests, but without the blk-mq patches. Do you have them, or could you potentially run them? Hello, We rerun the tests for sequential reads with the identical settings but with Bob Liu's multiqueue patches reverted from dom0 and guest kernels. The results we obtained were *better* than the results we got with multiqueue patches applied: fio_threads io_depth block_size 1-queue_iops 8-queue_iops *no-mq-patches_iops* 8 32 512 158K 264K 321K 8 321K 157K 260K 328K 8 322K 157K 258K 336K 8 324K 148K 257K 308K 8 328K 124K 207K 188K 8 32 16K84K 105K 82K 8 32 32K50K 54K 36K 8 32 64K24K 27K 16K 8 32 128K11K 13K 11K We noticed that the requests are not merged by the guest when the multiqueue patches are applied, which results in a regression for small block sizes (RealSSD P320h's optimal block size is around 32-64KB). We observed similar regression for the Dell MZ-5EA1000-0D3 100 GB 2.5 Internal SSD As I understand blk-mq layer bypasses I/O scheduler which also effectively disables merges. Could you explain why it is difficult to enable merging in the blk-mq layer? That could help closing the performance gap we observed. Otherwise, the tests shows that the multiqueue patches does not improve the performance, at least when it comes to sequential read/writes operations. blk-mq still provides merging, there should be no difference there. Does the xen patches set BLK_MQ_F_SHOULD_MERGE? Yes. Is it possible that xen-blkfront driver dequeue requests too fast after we have multiple hardware queues? Because new requests don't have the chance merging with old requests which were already dequeued and issued. For some reason we don't see merges even when we set multiqueue to 1. Below are some stats from the guest system when doing sequential 4KB reads: $ fio --name=test --ioengine=libaio --direct=1 --rw=read --numjobs=8 --iodepth=32 --time_based=1 --runtime=300 --bs=4KB --filename=/dev/xvdb $ iostat -xt 5 /dev/xvdb avg-cpu: %user %nice %system %iowait
Re: [Xen-devel] Enormous size of libvirt libxl-driver.log with Xen 4.2 and 4.3
On Mon, 2015-08-03 at 11:47 +0100, Ian Campbell wrote: After the initial expected logging the file is simply full of: 2015-08-02 19:12:12 UTC libxl: debug: libxl.c:1004:domain_death_xswatch_callback: [evg=0x7f3cc44fa3f0:3] from domid=0 nentries=1 rc=1 2015-08-02 19:12:12 UTC libxl: debug: libxl.c:1015:domain_death_xswatch_callback: [evg=0x7f3cc44fa3f0:3] got=domaininfos[0] got-domain=0 2015-08-02 19:12:12 UTC libxl: debug: libxl.c:1015:domain_death_xswatch_callback: [evg=0x7f3cc44fa3f0:3] got=domaininfos[1] got-domain=-1 2015-08-02 19:12:12 UTC libxl: debug: libxl.c:1023:domain_death_xswatch_callback: got==gotend Repeated at around 51KHz. This sounds a lot like 4783c99aab8 (see below for full log message), which perhaps ought to be backported to the effected branches, i.e. 4.2 and 4.3. Looks like it was backported to 4.5 (as 0b19348f3cd1) and 4.4 (as 13623d5d8e85) already. Ian? Ian. commit 4783c99aab866f470bd59368cfbf5ad5f677b0ec Author: Ian Jackson ian.jack...@eu.citrix.com Date: Tue Mar 17 09:30:57 2015 -0600 libxl: In domain death search, start search at first domid we want From: Ian Jackson ian.jack...@eu.citrix.com When domain_death_xswatch_callback needed a further call to xc_domain_getinfolist it would restart it with the last domain it found rather than the first one it wants. If it only wants one it will also only ask for one domain. The result would then be that it gets the previous domain again (ie, the previous one to the one it wants), which still doesn't reveal the answer to the question, and it would therefore loop again. It's completely unclear to me why I thought it was a good idea to start the xc_domain_getinfolist with the last domain previously found rather than the first one left un-confirmed. The code has been that way since it was introduced. Instead, start each xc_domain_getinfolist at the next domain whose status we need to check. We also need to move the test for !evg into the loop, we now need evg to compute the arguments to getinfolist. Signed-off-by: Ian Jackson ian.jack...@eu.citrix.com Reported-by: Jim Fehlig jfeh...@suse.com Reviewed-by: Jim Fehlig jfeh...@suse.com Tested-by: Jim Fehlig jfeh...@suse.com Acked-by: Wei Liu wei.l...@citrix.com Acked-by: Ian Campbell ian.campb...@citrix.com ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [URGENT RFC] Branching and reopening -unstable
Hi all RC1 is going to be tagged this week (maybe today). We need to figure out when to branch / reopen -unstable for committing and what rules should be applied until 4.6 is out of the door. Ian, Ian and I had a conversation IRL. We discussed several things, but figured it is necessary to have more people involved before making any decision. Here is my recollection of the conversation. Branching should be done at one of the RC tags. It might not be enough time for us to reach consensus before tagging RC1, so I would say lets branch at RC2 if we don't observe blocker bugs. Maintainers should be responsible for both 4.6 branch and -unstable branch. As for bug fixes, here are two options. Option 1: bug fixes go into -unstable, backport / cherry-pick bug fixes back to 4.6. This seems to leave the tree in half frozen status because we need to reject refactoring patches in case they cause backporting failure. Option 2: bug fixes go into 4.6, merge them to -unstable. If merge has conflict and maintainers can't deal with that, the authors of those changes in -unstable which cause conflict is responsible for fixing up the conflict. Ian and Ian, anything I miss? Anything to add? Others, thoughts? Wei. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [RFC PATCH 0/7] domain snapshot implementation
Add vm snapshot implementation. Support snapshot-create and snapshot-revert. Current Limitations: About disk snapshot create, there are many cases: - qdisk, internal, should calls qmp command to do the work. - qdisk, external, should calls qmp command to do the work, qemu will replace disk backend file after creating external snapshot. - nonqdisk, internal, should call 'qemu-img snapshot' to do the work. - nonqdisk, external, should call 'qemu-img create' to create a new file with the original disk file as backing file. And libxl should replace domain disk from original disk to the new file. To the last case, during domain snapshot, between domain suspend and resume, how to replace the disk backend file from libxl? Especially if disk file format is changed (original disk backend file is 'raw', new file is 'qcow2')? Considering this, currently I exclude the non-qdisk cases, let the API support qdisk only. About the non-qdisk and external case, any suggestion? About disk snapshot revert: Reverting from external disk snapshot is actually starting domain from a specified backing file, since backing file should be kept read-only, that will involve block copy operation. Currently this case is not supported. Only support reverting from internal disk snapshot. Design document: Latest design document is just posted. Chunyan Liu (7): libxl_types.idl: add definitions for vm snapshot qmp: add qmp handlers to create disk snapshots libxl: save disk format to xenstore libxl: add snapshot APIs xl: add domain snapshot commands qmp: add qmp handlers to delete internal/external disk snapshot libxl: add APIs to delete internal/external disk snapshots Config.mk| 2 +- config/Paths.mk.in | 1 + configure| 3 + docs/man/xl.snapshot.conf.pod.5 | 59 +++ m4/paths.m4 | 3 + tools/configure | 3 + tools/examples/snapshot.cfg.external | 4 + tools/examples/snapshot.cfg.internal | 4 + tools/libxl/Makefile | 2 + tools/libxl/libxl.c | 10 +- tools/libxl/libxl.h | 51 +++ tools/libxl/libxl_internal.h | 38 ++ tools/libxl/libxl_qmp.c | 224 tools/libxl/libxl_snapshot.c | 321 + tools/libxl/libxl_types.idl | 31 ++ tools/libxl/libxl_types_internal.idl | 8 + tools/libxl/libxl_utils.c| 16 + tools/libxl/libxl_utils.h| 1 + tools/libxl/xl.h | 2 + tools/libxl/xl_cmdimpl.c | 677 +++ tools/libxl/xl_cmdtable.c| 16 + 21 files changed, 1474 insertions(+), 2 deletions(-) create mode 100644 docs/man/xl.snapshot.conf.pod.5 create mode 100644 tools/examples/snapshot.cfg.external create mode 100644 tools/examples/snapshot.cfg.internal create mode 100644 tools/libxl/libxl_snapshot.c -- 2.1.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [RFC Doc V11 0/4] domain snapshot document
Changes to V10: - several updates to xl design and libxl design to address comments on V10. - few updates to keep consitent with code implementation V10: http://lists.xenproject.org/archives/html/xen-devel/2015-01/msg03071.html Codes implementation is posted right after. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [RFC PATCH 5/7] xl: add domain snapshot commands
Add domain snapshot create/revert commands implementation. Since xl is expected to not maintain domain snapshot info itself, it has no idea about how many snapshots and snapshot related files and info, xl won't supply snapshot delete command. It'll depend on users to delete things. Signed-off-by: Chunyan Liu cy...@suse.com --- Config.mk| 2 +- config/Paths.mk.in | 1 + configure| 3 + docs/man/xl.snapshot.conf.pod.5 | 59 +++ m4/paths.m4 | 3 + tools/configure | 3 + tools/examples/snapshot.cfg.external | 4 + tools/examples/snapshot.cfg.internal | 4 + tools/libxl/Makefile | 1 + tools/libxl/xl.h | 2 + tools/libxl/xl_cmdimpl.c | 677 +++ tools/libxl/xl_cmdtable.c| 16 + 12 files changed, 774 insertions(+), 1 deletion(-) create mode 100644 docs/man/xl.snapshot.conf.pod.5 create mode 100644 tools/examples/snapshot.cfg.external create mode 100644 tools/examples/snapshot.cfg.internal diff --git a/Config.mk b/Config.mk index e9a7097..aa4884f 100644 --- a/Config.mk +++ b/Config.mk @@ -159,7 +159,7 @@ endef BUILD_MAKE_VARS := sbindir bindir LIBEXEC LIBEXEC_BIN libdir SHAREDIR \ XENFIRMWAREDIR XEN_CONFIG_DIR XEN_SCRIPT_DIR XEN_LOCK_DIR \ - XEN_RUN_DIR XEN_PAGING_DIR XEN_DUMP_DIR + XEN_RUN_DIR XEN_PAGING_DIR XEN_DUMP_DIR XEN_SNAPSHOT_DIR buildmakevars2file = $(eval $(call buildmakevars2file-closure,$(1))) define buildmakevars2file-closure diff --git a/config/Paths.mk.in b/config/Paths.mk.in index d36504f..8e7d2a8 100644 --- a/config/Paths.mk.in +++ b/config/Paths.mk.in @@ -49,6 +49,7 @@ BASH_COMPLETION_DIR := $(CONFIG_DIR)/bash_completion.d XEN_LOCK_DIR := @XEN_LOCK_DIR@ XEN_PAGING_DIR := @XEN_PAGING_DIR@ XEN_DUMP_DIR := @XEN_DUMP_DIR@ +XEN_SNAPSHOT_DIR := @XEN_SNAPSHOT_DIR@ XENFIRMWAREDIR := @XENFIRMWAREDIR@ diff --git a/configure b/configure index 80b27d6..e283d17 100755 --- a/configure +++ b/configure @@ -595,6 +595,7 @@ tools xen subdirs XEN_DUMP_DIR +XEN_SNAPSHOT_DIR XEN_PAGING_DIR XEN_LOCK_DIR XEN_SCRIPT_DIR @@ -1984,6 +1985,8 @@ XEN_PAGING_DIR=$localstatedir/lib/xen/xenpaging XEN_DUMP_DIR=$xen_dumpdir_path +XEN_SNAPSHOT_DIR=$localstatedir/lib/xen/snapshot + case $host_cpu in i[3456]86|x86_64) diff --git a/docs/man/xl.snapshot.conf.pod.5 b/docs/man/xl.snapshot.conf.pod.5 new file mode 100644 index 000..28c2196 --- /dev/null +++ b/docs/man/xl.snapshot.conf.pod.5 @@ -0,0 +1,59 @@ +=head1 NAME + +xl.snapshot.cfg - XL Domain Snapshot Configuration File Syntax + +=head1 DESCRIPTION + +Snapshot configuration file will be used in xl snapshot-create +and xl snapshot-revert. + +Without snapshot configuration file, xl snapshot-create could create +domain snapshot with default value. To create a user-defined domain +snapshot, xl requires a domain snapshot config file. + +For snapshot-revert, it's mandatory, each item should be specified. + +Two examples for internal domain snapshot and external domain snapshot +could be found in: +/etc/xen/examples/snapshot.cfg.internal +/etc/xen/examples/snapshot.cfg.external + +=head1 SYNTAX + +A domain config file consists of a series of CKEY=VALUE pairs. It +shares the same rules with xl.cfg + +=head1 OPTIONS + +=over 4 + +=item Bname=NAME + +Specifies the name of the domain snapshot. If ignored, it will be the +epoch second from 1, Jan 1970. It will be used for taking internal +disk snapshot, generate memory state file name and generate external +disk snapshot file name. + +=item Bmemory=0|1 + +Indicates whether to save memory state file. If not, it will take a +disk-only snapshot. Currently xl doesn't support disk-only snapshot, +so it can only be '1'. + +=item Bmemory_path=PATHNAME + +Location of memory state file. This state file is as same as the file +in xl save. The value is the full directory of the location of memory +state file. If ignored, it will be generated by default: +snapshot path/snapshot name.save + +=item Bdisks=[ DISK_SPEC_STRING, DISK_SPEC_STRING, ...] + +Disk snapshot description. +DISK_SPEC_STRING syntax is: +'external path, external format, target device' +If taking a internal disk snapshot, keep 'external path' and +'external format' to be '', e.g. [',,xvda',]. + +=back + diff --git a/m4/paths.m4 b/m4/paths.m4 index 63e0f6b..abd89d2 100644 --- a/m4/paths.m4 +++ b/m4/paths.m4 @@ -122,4 +122,7 @@ AC_SUBST(XEN_PAGING_DIR) XEN_DUMP_DIR=$xen_dumpdir_path AC_SUBST(XEN_DUMP_DIR) + +XEN_SNAPSHOT_DIR=$localstatedir/lib/xen/snapshot +AC_SUBST(XEN_SNAPSHOT_DIR) ]) diff --git a/tools/configure b/tools/configure index 1098f1f..cd604bb 100755 --- a/tools/configure +++ b/tools/configure @@ -716,6 +716,7 @@ monitors githttp rpath XEN_DUMP_DIR +XEN_SNAPSHOT_DIR
[Xen-devel] [RFC PATCH 1/7] add definitions for vm snapshot
Define libxl_disk_snapshot_type and libxl_disk_snapshot for VM snapshot usage. Signed-off-by: Chunyan Liu cy...@suse.com --- tools/libxl/libxl_types.idl | 31 +++ tools/libxl/libxl_types_internal.idl | 8 2 files changed, 39 insertions(+) diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl index ef346e7..f7a4c3e 100644 --- a/tools/libxl/libxl_types.idl +++ b/tools/libxl/libxl_types.idl @@ -793,3 +793,34 @@ libxl_psr_cat_info = Struct(psr_cat_info, [ (cos_max, uint32), (cbm_len, uint32), ]) + +libxl_disk_snapshot_type = Enumeration(disk_snapshot_type, [ +(0, INVALID), +(1, INTERNAL), +(2, EXTERNAL), +]) + +libxl_disk_snapshot = Struct(disk_snapshot,[ +# target disk +(disk, libxl_device_disk), + +# disk snapshot name +(name, string), + +(u, KeyedUnion(None, libxl_disk_snapshot_type, type, + [(external, Struct(None, [ + +# disk format for external files. Since external disk snapshot is +# implemented with backing file mechanism, the external file disk +# format must support backing file. This field can be NULL, then +# a proper disk format will be used by default according to the +# orignal disk format. +(external_format, libxl_disk_format), + +# external file path. This field should be non-NULL and a new path. +(external_path, string), +])), + (internal, None), + (invalid, None), + ])), +]) diff --git a/tools/libxl/libxl_types_internal.idl b/tools/libxl/libxl_types_internal.idl index 5e55685..60dce1d 100644 --- a/tools/libxl/libxl_types_internal.idl +++ b/tools/libxl/libxl_types_internal.idl @@ -45,3 +45,11 @@ libxl__device_action = Enumeration(device_action, [ (1, ADD), (2, REMOVE), ]) + +libxl_disk_snapshot_op = Enumeration(disk_snapshot_op, [ +(1, CREATE), +(2, DELETE), +(3, REVERT), +(4, LIST), +]) + -- 2.1.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [RFC Doc V11 2/5] domain snapshot introduction
1. Introduction There are several types of snapshots: disk snapshot Contents of disks are saved at a given point of time, and can be restored back to that state. On a running guest, a disk snapshot is likely to be only crash-consistent rather than clean (that is, it represents the state of the disk on a sudden power outage, and may need fsck or journal replays to be made consistent). On a paused guest, with mechanism of quiesing disks (that is, all cached data written to disk), a disk snapshot is clean. On an inactive guest, a disk snapshot is clean if the disks were clean when the guest was last shut down. Disk snapshots exist in two forms: internal (file formats such as qcow2 track both the snapshot and changes since the snapshot in a single file) and external (the snapshot is one file, and the changes since the snapshot are in another file). memory state (or VM state) Tracks only the state of RAM and all other resources in use by the VM. If the disks are unmodified between the time a VM state snapshot is taken and restored, then the guest will resume in a consistent state; but if the disks are modified externally in the meantime, this is likely to lead to data corruption. system checkpoint (domain snapshot) A combination of disk snapshots for all disks as well as VM memory state, which can be used to resume the guest from where it left off with symptoms similar to hibernation (that is, TCP connections in the guest may have timed out, but no files or processes are lost). A system checkpoint can contain disk snapshots + VM state; or contains disk snapshots only without VM state, in this case, it should quiesce all disks before taking disk snapshots. The latter case is also referred as 'disk-only domain snapshot'. VM state (memory) snapshots are created by 'domain save', and restore via 'domain restore'. Disk snapshot can be created by many external tools, like qemu-img, vhd-util and lvm, etc. Domain snapshot (including disk-only domain snapshot) will be handled by 'domain snapshot' functionality. Domain snapshot with memory state (as VM state) includes live and non-live mode according to the VM downtime difference. Live mode will try best to reduce downtime of the guest, but as a result will increase size of the memory dump file. 2. Domain Snapshot User Cases Domain snapshot can be used in following cases: * Domain snapshot can be used as a domain backup. It can preserve the VM status at a certain point and able to roll back to it. * Domain snapshot can support 'gold image' type deployments, i.e. where you create one baseline single disk image and then clone it multiple times to deploy lots of guests; when you create a domain snapshot, with it as gold domain snapshot (duplicate multiple times), one can restore from the gold domain snapshot mulitple times for different reasons. * Disk-only domain snapshot can be used as backup out of domain, i.e. taking a disk-only domain snapshot and then run you usual backup software on the disk snapshots (which is now unchanging, which is handy); one can backup that static version of the disk out of band from the domain itself (e.g. can attach it to a separate backup VM). 3. Domain Snapshot Operations Generally, domain snapshot includes 4 kinds of operations: * create a domain snapshot create domain snapshot under different conditions: - domain is live, save vm state (live), disk snapshot - domain is live, save vm state (non-live), disk snapshot - domain is live, disk-only snapshot (need quiecing disks) - domain is offline, disk-only snapshot (under each above condition, disk snapshot can be internal/external.) * revert (roll back to) a domain snapshot revert domain snapshot under different conditions: - domain is live, has vm state, all internal disk snapshots - domain is live, has vm state, has external disk snapshots - domain is live, no vm state, all internal disk snapshots - domain is live, no vm state, has external disk snapshots - domain is offline, has vm state, all internal disk snapshots - domain is offline, has vm state, has external disk snapshots - domain is offline, no vm state, all internal disk snapshots - domain is offline, no vm state, has external disk snapshots * delete a domain snapshot delete domain snapshot under following conditions: - domain is live, not in a snapshot chain - domain is live, in a snapshot chain - domain is offline, not in a snapshot chain - domain is offline, in a snapshot chain * list domain snapshot(s) list domain snapshot(s) contains: - list a single domain snapshot - list all domain snapshots - list snapshot(s) in details 4. Disk Snapshot operations Also 4 kinds: * Create disk snapshot * Delete disk snapshot * Revert (apply) disk snapshot * List disk
[Xen-devel] [RFC PATCH 4/7] libxl: add snapshot APIs
Add snapshot related APIs for xl, including: create disk snapshots, revert disk snapshots. Together with existing memory save/restore APIs, xl can create domain snapshot and revert from a domain snapshot. Limitations: About disk snapshot create, there are many cases: - qdisk, internal, should calls qmp command to do the work. - qdisk, external, should calls qmp command to do the work, qemu will replace disk backend file after creating external snapshot. - nonqdisk, internal, should call 'qemu-img snapshot' to do the work. - nonqdisk, external, should call 'qemu-img create' to create a new file with the original disk file as backing file. And libxl should replace domain disk from original disk to the new file. Problem is: to the last case, during domain snapshot, between domain suspend and resume, how to replace the disk backend file from libxl? Especially if disk file format is changed (original disk backend file is 'raw', new file is 'qcow2') ? Considering this, currently the API only support qdisk, non-qdisk cases are not included. About disk snapshot revert: Reverting from external disk snapshot is actually starting domain from a specified backing file, since backing file should be kept read-only, that will involve block copy operation. Currently this case is not supported. Only support reverting from internal disk snapshot. Signed-off-by: Chunyan Liu cy...@suse.com --- tools/libxl/Makefile | 1 + tools/libxl/libxl.h | 6 ++ tools/libxl/libxl_internal.h | 6 ++ tools/libxl/libxl_snapshot.c | 219 +++ 4 files changed, 232 insertions(+) create mode 100644 tools/libxl/libxl_snapshot.c diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile index 9036076..0917326 100644 --- a/tools/libxl/Makefile +++ b/tools/libxl/Makefile @@ -105,6 +105,7 @@ LIBXL_OBJS = flexarray.o libxl.o libxl_create.o libxl_dm.o libxl_pci.o \ libxl_qmp.o libxl_event.o libxl_fork.o \ libxl_dom_suspend.o $(LIBXL_OBJS-y) LIBXL_OBJS += libxl_genid.o +LIBXL_OBJS += libxl_snapshot.o LIBXL_OBJS += _libxl_types.o libxl_flask.o _libxl_types_internal.o LIBXL_TESTS += timedereg diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h index 5f9047c..d60f139 100644 --- a/tools/libxl/libxl.h +++ b/tools/libxl/libxl.h @@ -1729,6 +1729,12 @@ int libxl_psr_cat_get_l3_info(libxl_ctx *ctx, libxl_psr_cat_info **info, void libxl_psr_cat_info_list_free(libxl_psr_cat_info *list, int nr); #endif +/* Domain snapshot related APIs */ +int libxl_disk_snapshot_create(libxl_ctx *ctx, uint32_t domid, + libxl_disk_snapshot *snapshot, int nb); +int libxl_disk_snapshot_revert(libxl_ctx *ctx, uint32_t domid, + libxl_disk_snapshot *snapshot, int nb); + /* misc */ /* Each of these sets or clears the flag according to whether the diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h index c3dec85..f24e0af 100644 --- a/tools/libxl/libxl_internal.h +++ b/tools/libxl/libxl_internal.h @@ -1749,6 +1749,12 @@ _hidden void libxl__qmp_cleanup(libxl__gc *gc, uint32_t domid); _hidden int libxl__qmp_initializations(libxl__gc *gc, uint32_t domid, const libxl_domain_config *guest_config); +typedef struct libxl__ao_snapshot libxl__ao_snapshot; +struct libxl__ao_snapshot { +libxl__ao *ao; +libxl__ev_child child; +}; + /* on failure, logs */ int libxl__sendmsg_fds(libxl__gc *gc, int carrier, const void *data, size_t datalen, diff --git a/tools/libxl/libxl_snapshot.c b/tools/libxl/libxl_snapshot.c new file mode 100644 index 000..34d36ef --- /dev/null +++ b/tools/libxl/libxl_snapshot.c @@ -0,0 +1,219 @@ +/* + * libxl_snapshot.c: code domain snapshot related APIs + * + * Copyright (C) 2015 SUSE LINUX Products GmbH, Nuernberg, Germany. + * Author Chunyan Liu cy...@suse.com + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU Lesser General Public License as published + * by the Free Software Foundation; version 2.1 only. with the special + * exception on linking described in file LICENSE. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU Lesser General Public License for more details. + */ + +#include libxl_osdeps.h /* must come before any other headers */ + +#include libxl_internal.h + +/* Replace domain disk to external path after taking external disk + * snapshot, since original disk becomes backing file. It will need + * to update xenstore information as well as domain config. + */ +static int libxl__update_disk_configuration(libxl__gc *gc, uint32_t domid, +libxl_disk_snapshot snapshot) +{ +char *backend_path, *path, *value;
[Xen-devel] [RFC PATCH 7/7] libxl: add APIs to delete internal/external disk snapshots
Currently this group of APIs are not used by xl toolstack since xl doesn't maintain domain snapshot info and so depends on user to delete things. But for libvirt, they are very useful since libvirt maintains domain snapshot info itself and needs these APIs to delete internal/external disk snapshots. Signed-off-by: Chunyan Liu cy...@suse.com --- tools/libxl/libxl.h | 28 tools/libxl/libxl_snapshot.c | 102 +++ 2 files changed, 130 insertions(+) diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h index 412a42f..1383b92 100644 --- a/tools/libxl/libxl.h +++ b/tools/libxl/libxl.h @@ -1752,6 +1752,34 @@ int libxl_disk_snapshot_create(libxl_ctx *ctx, uint32_t domid, int libxl_disk_snapshot_revert(libxl_ctx *ctx, uint32_t domid, libxl_disk_snapshot *snapshot, int nb); +/* delete internal disk snapshot */ +int libxl_disk_snapshot_delete(libxl_ctx *ctx, uint32_t domid, + libxl_disk_snapshot *snapshot, int nb); +/* next 4 functions deal with external disk snapshot */ +/* shorten backing file chain. Merge from top to base */ +int libxl_domain_block_rebase(libxl_ctx *ctx, uint32_t domid, + libxl_device_disk *disk, + const char *base, + const char *backing_file, + unsigned long long bandwidth); +/* shorten backing file chain. Merge from base to top */ +int libxl_domain_block_commit(libxl_ctx *ctx, uint32_t domid, + libxl_device_disk *disk, + const char *top, + const char *base, + const char *backing_file, + unsigned long long bandwidth); +/* query a block job status, can get job type, speed, progress status */ +int libxl_domain_block_job_query(libxl_ctx *ctx, uint32_t domid, + libxl_device_disk *disk, + libxl_block_job_info *info); +/* abort a block job. If the job is finished, complete it. + * otherwise, cancel it. + */ +int libxl_domain_block_job_abort(libxl_ctx *ctx, uint32_t domid, + libxl_device_disk *disk, + bool force); + /* misc */ /* Each of these sets or clears the flag according to whether the diff --git a/tools/libxl/libxl_snapshot.c b/tools/libxl/libxl_snapshot.c index 34d36ef..9b139e6 100644 --- a/tools/libxl/libxl_snapshot.c +++ b/tools/libxl/libxl_snapshot.c @@ -217,3 +217,105 @@ int libxl_disk_snapshot_revert(libxl_ctx *ctx, uint32_t domid, } return rc; } + +int libxl_disk_snapshot_delete(libxl_ctx *ctx, uint32_t domid, + libxl_disk_snapshot *snapshot, int nb) +{ +int rc = 0; +int i; + +GC_INIT(ctx); +for(i = 0; i nb; i++ ) { +if (snapshot[i].type == LIBXL_DISK_SNAPSHOT_TYPE_EXTERNAL) { +LOG(WARN, libxl_disk_snapshot_delete: external disk snapshot +cannot be deleted. Please use libxl_domain_block_commit and +libxl_domain_block_rebase to handle that.); +continue; +} + +rc = libxl__qmp_disk_snapshot_delete(gc, domid, snapshot[i]); +if ( rc ) +goto err; +} +err: +if (rc) +LOG(ERROR, domain disk snapshot delete fail\n); + +GC_FREE; +return rc; +} + +int libxl_domain_block_rebase(libxl_ctx *ctx, uint32_t domid, + libxl_device_disk *disk, + const char *base, + const char *backing_file, + unsigned long long bandwidth) +{ +GC_INIT(ctx); +int rc; + +rc = libxl__qmp_block_stream(gc, domid, disk, base, + backing_file, bandwidth, NULL); +GC_FREE; +return rc; +} + +int libxl_domain_block_commit(libxl_ctx *ctx, uint32_t domid, + libxl_device_disk *disk, + const char *top, + const char *base, + const char *backing_file, + unsigned long long bandwidth) + +{ +GC_INIT(ctx); +int rc; + +rc = libxl__qmp_block_commit(gc, domid, disk, top, base, + backing_file, bandwidth); + +GC_FREE; +return rc; +} + +/* query block job status */ +int libxl_domain_block_job_query(libxl_ctx *ctx, uint32_t domid, + libxl_device_disk *disk, + libxl_block_job_info *info) +{ +GC_INIT(ctx); +int rc; + +rc = libxl__qmp_query_block_job(gc, domid, disk, info); + +GC_FREE; +return rc; +} + +/* Abort block job: + * If block job is already finished, call block_job_complete qmp; + * otherwise, call
[Xen-devel] [RFC Doc V11 4/5] domain snapshot libxl design
libxl Design 1. New Structures libxl_disk_snapshot_type = Enumeration(disk_snapshot_type, [ (0, invalid), (1, internal), (2, external), ]) libxl_disk_snapshot = Struct(disk_snapshot,[ # target disk (disk, libxl_device_disk), # disk snapshot name (name, string), (u, KeyedUnion(None, libxl_disk_snapshot_type, type, [(external, Struct(None, [ # disk format for external files. Since external disk snapshot is # implemented with backing file mechanism, the external file disk # format must support backing file. This field can be NULL, then # a proper disk format will be used by default according to the # orignal disk format. (external_format, libxl_disk_format), # external file path. This field should be non-NULL and a new path. (external_path, string), ])), (internal, None), (invalid, None), ])), ]) 2. New Functions Since there're already APIs for saving memory (libxl_domain_suspend) and restoring domain from saved memory (libxl_domain_create_restore), to xl domain snapshot tasks, the missing part is disk snapshot functionality. And the disk snapshot functionality would be used by libvirt too. ## disk snapshot create /** * libxl_disk_snaphost_create: * @ctx: libxl context * @domid: domain id * @snapshot: array of disk snapshot configuration. Has nb members. * - libxl_device_disk: * structure to represent which disk. * - name: * snapshot name. * - type: *disk snapshot type: internal or external. * - u.external.external_format: * Format of external file. * After disk snapshot, original file will become a backing * file, while external file will keep the delta, so * external_format should support backing file, like: cow, * qcow, qcow2, etc. * If it is NULL, then it will use proper format by default * according to original disk format. * - u.external.external_path: * path to external file. non-NULL. * @nb: number of disks that need to take disk snapshot. * * createing internal/external disk snapshot * * Taking disk snapshots to a group of domain disks according to * configuration. Support both internal disk snapshot and external * disk snapshot. For qdisk backend type, it will call qmp * transaction command to do the work. For other disk backend types, * might call other external commands. * * Returns 0 on success, 0 on failure. */ int libxl_disk_snapshot_create(libxl_ctx *ctx, uint32_t domid, libxl_disk_snapshot *snapshot, int nb); ## disk snapshot revert /** * libxl_disk_snapshot_revert: * @snapshot: array of disk snapshot configuration. Has nb members. * @nb: number of disks. * * Revert disks to specified snapshot according to configuration. To * different disk backend types, call different external commands to do * the work. * * Returns 0 on success, 0 on failure. */ int libxl_disk_snapshot_revert(libxl_disk_snapshot *snapshot, int nb); For disk snapshot revert, since domain snapshot revert is essentially destroy, revert disks and restore from RAM. There is no qemu process to speak to during reverting disks. So, it always calls external commands to finish the work: ## disk snapshot delete Since xl won't supply domain snapshot delete functionality, this group of functions won't be used by xl, but will be used by libvirt. /** * libxl_disk_snaphost_delete: * @ctx: libxl context * @domid: domain id * @snapshot: array of disk snapshot configuration. Has nb members. * @nb: number of disks. * * Delete disk snapshot of a group of domain disks according to * configuration. Can only handle internal disk snapshot. Currently * only valid for 'qcow2' disk, by calling qmp command if it is qdisk * backend or by calling qemu-img if it is other backend type. * * To delete external disk snapshots, means shorten backing file chain * and merge snapshot data, must know snapshot chain info. Functions * libxl_domain_block_rebase and libxl_domain_block_commit would help. * * Returns 0 on success, 0 on failure. */ int libxl_disk_snapshot_delete(libxl_ctx *ctx, uint32_t domid, libxl_disk_snapshot *snapshot, int nb); Following functions would help to delete external disk snapshots. They are actually two directions to shorten backing file chain. One is from base to top merge, the other is from top to base merge. Both need caller to know the backing file chain information. /** * libxl_domain_block_rebase: * @ctx: libxl context * @domid: domain id * @disk: path to the block device * @base: path to backing file to keep, or NULL for no backing file * @bandwidth: (optional) bandwidth limit in B/s, 0 for no limit. * * Merge data from base to top * * Populate a disk image with data from its backing
[Xen-devel] [RFC PATCH 6/7] qmp: add qmp handlers to delete internal/external disk snapshot
Xl doesn't maintain domain snapshot info and has no idea of snapshot info and related files after creation, so it doesn't supply domain snapshot delete command. These qmp handlers won't be used by xl. But for libvirt, it maintains domain snapshot info itself and supplies snapshot delete commands, and so needs APIs from libxl to delete internal/external disk snapshots. To libvirt, these qmp handlers are userful. Signed-off-by: Chunyan Liu cy...@suse.com --- tools/libxl/libxl.h | 17 + tools/libxl/libxl_internal.h | 28 tools/libxl/libxl_qmp.c | 158 +++ 3 files changed, 203 insertions(+) diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h index d60f139..412a42f 100644 --- a/tools/libxl/libxl.h +++ b/tools/libxl/libxl.h @@ -1730,6 +1730,23 @@ void libxl_psr_cat_info_list_free(libxl_psr_cat_info *list, int nr); #endif /* Domain snapshot related APIs */ + +/* structure to retrieve qmp block job status */ +typedef struct libxl_block_job_info +{ + char *disk_vdev; + const char *type; + unsigned long speed; + /* The following fields provide an indication of block job progress. + * @current indicates the current position and will be between 0 and @end. + * @end is the final cursor position for this operation and represents + * completion. + * To approximate progress, divide @cur by @end. + */ + unsigned long long current; + unsigned long long end; +} libxl_block_job_info; + int libxl_disk_snapshot_create(libxl_ctx *ctx, uint32_t domid, libxl_disk_snapshot *snapshot, int nb); int libxl_disk_snapshot_revert(libxl_ctx *ctx, uint32_t domid, diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h index f24e0af..a6456e8 100644 --- a/tools/libxl/libxl_internal.h +++ b/tools/libxl/libxl_internal.h @@ -1739,6 +1739,34 @@ _hidden int libxl__qmp_cpu_add(libxl__gc *gc, int domid, int index); _hidden int libxl__qmp_disk_snapshot_transaction(libxl__gc *gc, int domid, libxl_disk_snapshot *snapshot, int nb); +/* Delete a disk snapshot */ +_hidden int libxl__qmp_disk_snapshot_delete(libxl__gc *gc, int domid, + libxl_disk_snapshot *snapshot); +/* shorten backing file chain. Merge base to top */ +_hidden int libxl__qmp_block_commit(libxl__gc *gc, uint32_t domid, +libxl_device_disk *disk, +const char *base, const char *top, +const char *backing_file, +unsigned long bandwidth); +/* shorten backing file chain. Merge top to base */ +_hidden int libxl__qmp_block_stream(libxl__gc *gc, uint32_t domid, +libxl_device_disk *disk, +const char *base, +const char *backing_file, +unsigned long long bandwidth, +const char *error); +/* query qmp block job status */ +_hidden int libxl__qmp_query_block_job(libxl__gc *gc, uint32_t domid, + libxl_device_disk *disk, + libxl_block_job_info *info); +/* cancel a qmp block job */ +_hidden int libxl__qmp_block_job_cancel(libxl__gc *gc, uint32_t domid, +libxl_device_disk *disk, +bool force); + +/* complete a qmp block job */ +_hidden int libxl__qmp_block_job_complete(libxl__gc *gc, uint32_t domid, + libxl_device_disk *disk); /* close and free the QMP handler */ _hidden void libxl__qmp_close(libxl__qmp_handler *qmp); /* remove the socket file, if the file has already been removed, diff --git a/tools/libxl/libxl_qmp.c b/tools/libxl/libxl_qmp.c index 2216511..09cb628 100644 --- a/tools/libxl/libxl_qmp.c +++ b/tools/libxl/libxl_qmp.c @@ -1034,6 +1034,164 @@ int libxl__qmp_disk_snapshot_transaction(libxl__gc *gc, int domid, return qmp_run_command(gc, domid, transaction, args, NULL, NULL); } +int libxl__qmp_disk_snapshot_delete(libxl__gc *gc, int domid, + libxl_disk_snapshot *snapshot) +{ +libxl__json_object *args = NULL; + +if (snapshot-type == LIBXL_DISK_SNAPSHOT_TYPE_EXTERNAL) { +LOG(ERROR, QMP doesn't support deleting external disk snapshot); +return -1; +} + +if (snapshot-type != LIBXL_DISK_SNAPSHOT_TYPE_INTERNAL) { +LOG(ERROR, Invalid disk snapshot type); +return -1; +} + +qmp_parameters_add_string(gc, args, device, snapshot-disk.vdev); +qmp_parameters_add_string(gc, args, name, snapshot-name); + +return qmp_run_command(gc, domid, blockdev-snapshot-delete-internal-sync,
[Xen-devel] [RFC Doc V11 3/5] domain snapshot xl design
XL Design 1. User Interface xl snapshot-create: Create a snapshot (disk and RAM) of a domain. SYNOPSIS: snapshot-create [--live] [--internal|--external] [--path=path] Domain [ConfigFile] OPTIONS: -l,--livetake a live snapshot -i,--internaltake internal disk snapshots to all disks -e,--externaltake external disk snapshots to all disks -p,--pathpath to store snapshot data If no options specified and no @ConfigFile specified: e.g. # xl snapshot-create domain By default, it will create a domain snapshot with default name generated according to creation time. This name will be used to generate default RAM snaphsot name and disk snapshot name, and generate the default directory to store all the snapshot data (RAM snapshot file, external disk snapshot files, etc.) e.g. result of above command would be: default snapshot root directory: /var/lib/xen/snapshots/ default snapshot name generated : 20150122xxx default subdirectory to save data of this snapshot: /var/lib/xen/snapshots/domain_uuid/20150122xxx/ RAM snapshot file: By default, it will save memory. Location is here: /var/lib/xen/snapshots/domain_uuid/20150122xxx/20150122xxx.save disk snapshots: By default, to each domain disk, take internal disk snapshot if that disk supports, otherwise, take external disk snapshot. Internal disk snapshot: take disk snapshot with name 20150122xxx External disk snapshot: external file is: /var/lib/xen/snapshots/domain_uuid/20150122xxx/vda_20150122xxx.qcow2 /var/lib/xen/snapshots/domain_uuid/20150122xxx/vdb_20150122xxx.qcow2 If option includes --live, then the domain is not paused while creating the snapshot, like live migration. This increases size of the memory dump file, but reducess downtime of the guest. If option includes --path, all snapshot data will be saved in this @path. If no @ConfigFile:name specified, then use default name (generated by time). User could specify snapshot information in details through @ConfigFile, see following ConfigFile syntax. If configuration in @ConfigFile conflicts with options, use options. xl snapshot-revert: Revert domain to status of a snapshot. SYNOPSIS: snapshot-revert [--pause] [--force] Domain ConfigFile OPTIONS: -p,--pausekeep domain paused after the revert -f,--forcetry harder on risky revert About domain snapshot delete: xl doesn't have snapshot chain information, so it couldn't do the full work. If supply: xl snapshot-delete domain cfgfile For internal disk snapshot, deleting disk snapshot doesn't need snapshot chain info, this commands can finish the work. But for external disk snapshot, deleting disk snapshot will need to merge backing file chain, then will need the backing file chain information, this command can not finish that. So, deleting domain snapshots will be left to user: user could delete RAM snapshots and disk snapshots by themselves: RAM snapshot file: user could remove it directly. Disk snapshots: - Internal disk snapshot, issue 'qemu-img snapshot -d' - External disk snapshot, basically it is implemented as backing file chain. Use 'qemu-img commit' to remove one file from the chain and merge its data forward. 2. cfgfile syntax # snapshot name. If user doesn't provide a VM snapshot name, xl will generate # a name automatically by creation time or by @path basename. name= # save memory or disk-only. # If memory is '0', doesn't save memory, take disk-only domain snapshot. # If memory is '1', domain memory is saved. # Default if 1. memory=1 # memory location. This field is valid when memory=1. # If it is set to , xl will generate a path by creation time or by @path # basename. memory_path= # disk snapshot specification # # Syntax: 'external path, external format, target device' # # By default, if no disks is specified here, it will take disk snapshot # to all disks: take internal disk snapshot if disk support internal disk # snapshot; and external disk snapshot to other disks. #disks=['/tmp/hda_snapshot.qcow2,qcow2,hda', ',,hdb',] 3. xl snapshot-xxx implementation xl snapshot-create 1), parse args or user configuration file. 2), if saveing memory: save domain (store saved memory to memory_path) if taking disk-only snapshot: pause domain, quiece disks. (not supported now, maybe in future.) 3), create disk snapshots according to disk snapshot configuration 4), unpause domain xl snapshot-revert 1), parse user configuration file 2), destroy current domain 3), revert disk snapshots according to disk snapshot configuration 4), restore domain from saved memory. 4. Notes * user should take care of snapshot data: saved memory file, disk
[Xen-devel] [RFC PATCH 2/7] qmp: add qmp handlers to create disk snapshots
Add qmp handlers to take disk snapshots. This will be used when creating a domain snapshots. Signed-off-by: Chunyan Liu cy...@suse.com --- tools/libxl/libxl_internal.h | 4 +++ tools/libxl/libxl_qmp.c | 66 2 files changed, 70 insertions(+) diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h index 6ea6c83..c3dec85 100644 --- a/tools/libxl/libxl_internal.h +++ b/tools/libxl/libxl_internal.h @@ -1735,6 +1735,10 @@ _hidden int libxl__qmp_set_global_dirty_log(libxl__gc *gc, int domid, bool enabl _hidden int libxl__qmp_insert_cdrom(libxl__gc *gc, int domid, const libxl_device_disk *disk); /* Add a virtual CPU */ _hidden int libxl__qmp_cpu_add(libxl__gc *gc, int domid, int index); +/* Create disk snapshots for a group of disks in a transaction */ +_hidden int libxl__qmp_disk_snapshot_transaction(libxl__gc *gc, int domid, +libxl_disk_snapshot *snapshot, +int nb); /* close and free the QMP handler */ _hidden void libxl__qmp_close(libxl__qmp_handler *qmp); /* remove the socket file, if the file has already been removed, diff --git a/tools/libxl/libxl_qmp.c b/tools/libxl/libxl_qmp.c index 965c507..2216511 100644 --- a/tools/libxl/libxl_qmp.c +++ b/tools/libxl/libxl_qmp.c @@ -968,6 +968,72 @@ int libxl__qmp_cpu_add(libxl__gc *gc, int domid, int idx) return qmp_run_command(gc, domid, cpu-add, args, NULL, NULL); } +/* + * requires QEMU version = 1.1 + * qmp command example: + * - { execute: transaction, + * arguments: { actions: [ + * { type: blockdev-snapshot-sync, data : { device: ide-hd0, + * snapshot-file: /some/place/my-image, + * format: qcow2 } }, + * { 'type': 'blockdev-snapshot-internal-sync', 'data' : { + * device: ide-hd1, + * name: snapshot0 } } ] } } + * { 'type': 'blockdev-snapshot-internal-sync', 'data' : { + * device: ide-hd2, + * name: snapshot0 } } ] } } + * - { return: {} } + */ +int libxl__qmp_disk_snapshot_transaction(libxl__gc *gc, int domid, + libxl_disk_snapshot *snapshot, + int nb) +{ +libxl__json_object *args = NULL; +libxl__json_object *actions = NULL; +libxl__json_object **type = NULL; +libxl__json_object **data = NULL; +int i; + +type = (libxl__json_object**)calloc(nb, sizeof(libxl__json_object*)); +data = (libxl__json_object**)calloc(nb, sizeof(libxl__json_object*)); +actions = libxl__json_object_alloc(gc, JSON_ARRAY); + +for (i = 0; i nb; i++) { +switch (snapshot[i].type) { +case LIBXL_DISK_SNAPSHOT_TYPE_INTERNAL: +/* internal disk snapshot */ +qmp_parameters_add_string(gc, type[i], type, + blockdev-snapshot-internal-sync); +qmp_parameters_add_string(gc, data[i], name, + snapshot[i].name); +qmp_parameters_add_string(gc, data[i], device, + snapshot[i].disk.vdev); +qmp_parameters_common_add(gc, type[i], data, data[i]); +flexarray_append(actions-u.array, (void*)type[i]); +break; +case LIBXL_DISK_SNAPSHOT_TYPE_EXTERNAL: +/* external disk snapshot */ +qmp_parameters_add_string(gc, type[i], type, + blockdev-snapshot-sync); +qmp_parameters_add_string(gc, data[i], device, + snapshot[i].disk.vdev); +qmp_parameters_add_string(gc, data[i], snapshot-file, + snapshot[i].u.external.external_path); +qmp_parameters_add_string(gc, data[i], format, + libxl_disk_format_to_string(snapshot[i].u.external.external_format)); +qmp_parameters_common_add(gc, type[i], data, data[i]); +flexarray_append(actions-u.array, (void*)type[i]); +break; +default: +LOG(ERROR, Invalid disk snapshot type); +return -1; +} +} + +qmp_parameters_common_add(gc, args, actions, actions); +return qmp_run_command(gc, domid, transaction, args, NULL, NULL); +} + int libxl__qmp_initializations(libxl__gc *gc, uint32_t domid, const libxl_domain_config *guest_config) { -- 2.1.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [RFC PATCH 3/7] libxl: save disk format to xenstore
Disk snapshot handling depends on disk format. Currently since disk format is not saved into xenstore, when getting device disk list, disk-format is LIBXL_DISK_FORMAT_UNKNOWN. Disk snapshot cannot continue without correct disk format information, so adding code to save disk format to xenstore so that when getting device disk list, disk-format contains correct information. Signed-off-by: Chunyan Liu cy...@suse.com --- tools/libxl/libxl.c | 10 +- tools/libxl/libxl_utils.c | 16 tools/libxl/libxl_utils.h | 1 + 3 files changed, 26 insertions(+), 1 deletion(-) diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c index 083f099..dce43d6 100644 --- a/tools/libxl/libxl.c +++ b/tools/libxl/libxl.c @@ -2524,6 +2524,8 @@ static void device_disk_add(libxl__egc *egc, uint32_t domid, goto out; } +flexarray_append(back, format); +flexarray_append(back, libxl__device_disk_string_of_format(disk-format)); flexarray_append(back, frontend-id); flexarray_append(back, libxl__sprintf(gc, %d, domid)); flexarray_append(back, online); @@ -2682,7 +2684,13 @@ static int libxl__device_disk_from_xs_be(libxl__gc *gc, } disk-is_cdrom = !strcmp(tmp, cdrom); -disk-format = LIBXL_DISK_FORMAT_UNKNOWN; +tmp = libxl__xs_read(gc, XBT_NULL, + libxl__sprintf(gc, %s/format, be_path)); +if (!tmp) { +LOG(ERROR, Missing xenstore node %s/format, be_path); +goto cleanup; +} +libxl_string_to_format(ctx, tmp, (disk-format)); return 0; cleanup: diff --git a/tools/libxl/libxl_utils.c b/tools/libxl/libxl_utils.c index bfc9699..067a9fc 100644 --- a/tools/libxl/libxl_utils.c +++ b/tools/libxl/libxl_utils.c @@ -322,6 +322,22 @@ out: return rc; } +int libxl_string_to_format(libxl_ctx *ctx, char *s, libxl_disk_format *format) +{ +int rc = 0; + +if (!strcmp(s, aio)) { +*format = LIBXL_DISK_FORMAT_RAW; +} else if (!strcmp(s, vhd)) { +*format = LIBXL_DISK_FORMAT_VHD; +} else if (!strcmp(s, qcow)) { +*format = LIBXL_DISK_FORMAT_QCOW; +} else if (!strcmp(s, qcow2)) { +*format = LIBXL_DISK_FORMAT_QCOW2; +} +return rc; +} + int libxl_read_file_contents(libxl_ctx *ctx, const char *filename, void **data_r, int *datalen_r) { GC_INIT(ctx); diff --git a/tools/libxl/libxl_utils.h b/tools/libxl/libxl_utils.h index 1e5ca8a..0897069 100644 --- a/tools/libxl/libxl_utils.h +++ b/tools/libxl/libxl_utils.h @@ -37,6 +37,7 @@ int libxl_get_stubdom_id(libxl_ctx *ctx, int guest_domid); int libxl_is_stubdom(libxl_ctx *ctx, uint32_t domid, uint32_t *target_domid); int libxl_create_logfile(libxl_ctx *ctx, const char *name, char **full_name); int libxl_string_to_backend(libxl_ctx *ctx, char *s, libxl_disk_backend *backend); +int libxl_string_to_format(libxl_ctx *ctx, char *s, libxl_disk_format *format); int libxl_read_file_contents(libxl_ctx *ctx, const char *filename, void **data_r, int *datalen_r); -- 2.1.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [URGENT RFC] Branching and reopening -unstable
Wei Liu writes ([URGENT RFC] Branching and reopening -unstable): Branching should be done at one of the RC tags. It might not be enough time for us to reach consensus before tagging RC1, so I would say lets branch at RC2 if we don't observe blocker bugs. Maintainers should be responsible for both 4.6 branch and -unstable branch. As for bug fixes, here are two options. I think this conflates the three questions which should be answered: Q1: What is the status of the newly branched -unstable ? Should we avoid (some or all) big sets of changes ? (a) Don't branch (b) Branch but don't allow /any/ big changes. Seems to make branching rather pointless. (c) Branch but allow /some/ big changes. Tree is `half open', which is not ideal. (d) Branch and allow /all/ changes. Q2: If we don't avoid such changes, and a bugfix has a conflict with a change in the new unstable, who is responsible for fixing it up ? Options include: (a) the relevant maintainers (double whammy for maintainers) (b) the submitter of the bugfix (very undesirable) (c) the submitter of the big set of changes (but what do we do if they don't respond?) (d) the stable tree maintainers (already ruled out, so included in this list for completeness; out of the question IMO) Q3: What workflow should we use, for bugfixes for bugs in 4.6-pre ? There are three options, not two: (a) Bugfixes go to 4.6 first, cherry pick to unstable This keeps our focus on 4.6, which is good. (b) Bugfixes go to 4.6 first, merge 4.6 to unstable. Not tenable if we have big changes in unstable. (c) Bugfixes to to unstable, cherry pick to 4.6. Undesirable IMO because it shifts focus to unstable. Of these 2(c)/3(a) would be ideal but we don't have a good answer to the problem posted in Q2(c). I think that leaves us with 2(a): maintainers have to deal with the fallout. That makes 1(d) untenable in my view. As a maintainer, I do not want that additional workload. That leaves us with 1(a) or 1(c)/2(a)/3(a). With 1(c), who should decide on a particular series ? Well, who is taking the risk ? The maintainer, who will have to pick up the pieces. I therefore conclude, we have two options: A 1(a)/-/- Do not branch yet: defer divergence until the risk of bugfixes is much lower. B 1(c)(maintainer)/2(a)/3(a) Branch. Maintainers may choose to defer patch series based on risk of conflicts with bugfixes required for 4.6. Clear communication with submitters is required. Bugfixes for bugs in 4.6 will be accepted onto the 4.6 branch. Maintainers are required to cherry pick them onto unstable. Bugfixes will not be accepted for unstable unless it is clear that the bug was introduced in unstable since 4.6 branched. I am happy with B because it gives the relevant maintainers the option. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH for-4.6] tools: Don't try to update the firmware directory on ARM
On Tue, Aug 11, 2015 at 01:22:24PM +0100, Ian Campbell wrote: On Sun, 2015-08-09 at 14:49 +0100, Julien Grall wrote: Hi Wei, On 08/08/2015 16:16, Wei Liu wrote: On Fri, Aug 07, 2015 at 06:27:18PM +0100, Julien Grall wrote: The firmware directory is not built at all on ARM. Attempting to update it using the target subtree-force-update will fail when try to update seabios. Signed-off-by: Julien Grall julien.gr...@citrix.com --- Cc: Ian Jackson ian.jack...@eu.citrix.com Cc: Stefano Stabellini stefano.stabell...@eu.citrix.com Cc: Ian Campbell ian.campb...@citrix.com Cc: Wei Liu wei.l...@citrix.com I've noticed it while trying to update the QEMU tree used by Xen on a platform where iasl is not present (required by seabios in order to update it). I think this should go in Xen 4.6 and possibly backport to Xen 4.5 --- tools/Makefile | 2 ++ 1 file changed, 2 insertions(+) diff --git a/tools/Makefile b/tools/Makefile index 45cb4b2..2618559 100644 --- a/tools/Makefile +++ b/tools/Makefile @@ -305,7 +305,9 @@ endif ifeq ($(CONFIG_QEMU_TRAD),y) $(MAKE) qemu-xen-traditional-dir-force-update endif +ifeq ($(CONFIG_X86),y) $(MAKE) -C firmware subtree-force-update +endif This is not optimal. What if you want to build OVMF on arm in the future? Slight aside, but I already looked at doing this but concluded that the right answer was to add this to raisin not xen.git. As it happens on ARM we would boot the UEFI binary directly, so we don't need to compile it into hvmloader or just through other hoops, so it is a bit easier than on x86. Right. Makes sense. You also can't preclude you don't have any other firmwares that need to be built on ARM in the future. I think a proper way of doing this is to make CONFIG_SEABIOS=n when you're building on ARM. See tools/configure.ac. tools/Makefile only build the firmware directory for x86 see: SUBDIRS-$(CONFIG_X86) += firmware Hence why I wrote the patch in the current way. I think having the update rule match (in spirit at least) the SUBDIRS rules make sense as a patch for now, so I'm in favour of taking this patch as it is. Fine by me then. Acked-by: Wei Liu wei.l...@citrix.com Building the firmware directory for would require more work than replace SUBDIRS-$(CONFIG_X86) to SUBDIRS-y. In general, I do agree that we enable this with configure.ac but, IHMO this is not Xen 4.6 material... Although I would be happy to fix it for Xen 4.7. Regards, ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [URGENT RFC] Branching and reopening -unstable
On 11.08.15 at 12:44, wei.l...@citrix.com wrote: As for bug fixes, here are two options. Option 1: bug fixes go into -unstable, backport / cherry-pick bug fixes back to 4.6. This seems to leave the tree in half frozen status because we need to reject refactoring patches in case they cause backporting failure. Option 2: bug fixes go into 4.6, merge them to -unstable. If merge has conflict and maintainers can't deal with that, the authors of those changes in -unstable which cause conflict is responsible for fixing up the conflict. I don't see why even on #2 bug fixes shouldn't go into -unstable first - as usual backports should carry a reference to the master commit. And personally I'd favor the revised #2 over #1 or unrevised #2. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v4 16/17] vmx: Add some scheduler hooks for VT-d posted interrupts
On 30.07.15 at 20:26, dario.faggi...@citrix.com wrote: On Thu, 2015-07-30 at 02:04 +, Wu, Feng wrote: -Original Message- From: Dario Faggioli [mailto:dario.faggi...@citrix.com] Since this is one of the differences between the two, was it the cause of the issues you were seeing? If yes, can you elaborate on how and why? In the end, I'm not too opposed to the hook being at the beginning rather than at the end, but there has to be a reason, which may well end up better be stated in a comment... Here is the reason I put arch_vcpu_wake() ahead of vcpu_wake(): arch_vcpu_wake() does some prerequisites for a vCPU which is about to run, such as, setting SN again, changing NV filed back to ' posted_intr_vector ', which should be finished before the vCPU is actually scheduled to run. However, if we put arch_vcpu_wake() later in vcpu_wake() right before ' vcpu_schedule_unlock_irqrestore', after the 'wake' hook get finished, the vcpu can run at any time (maybe in another pCPU since the current pCPU is protected by the lock), if this can happen, it is incorrect. Does my understanding make sense? It's safe in any case. In fact, the spinlock will prevent both the vcpu's processor to schedule, as well as any other processors to steal the waking vcpu from the runqueue to run it. That's actually why I wanted to double check you changing the position of the hook (wrt the draft), as it felt weird that the issue were in there. :-) So, now that we know that safety is not an issue, where should we put the hook? Having it before SCHED_OP(wake) may make people think that arch specific code is (or can, at some point) somehow influencing the scheduler specific wakeup code, which is not (and should not become, if possible) the case. However, I kind of like the fact that the spinlock is released as soon as possible, after the call to SCHED_OP(wake). That will make it more likely, for the processors we may have sent IPIs to, during the scheduler specific wakeup code, to find the spinlock free. So, looking at things from this angle, it would be better to avoid putting stuff in between SCHED_OP(wake) and vcpu_schedule_unlock(). So, all in all, I'd say leave it on top, where it is in this patch. Of course, if others have opinions, I'm all ears. :-) If it is kept at the beginning, the hook should be renamed to something like arch_vcpu_wake_prepare(). Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [RFC 4/4] HVM x86 deprivileged mode: Trap handlers for deprivileged mode
On 10/08/15 11:07, Tim Deegan wrote: Hi, @@ -685,8 +685,17 @@ static int hap_page_fault(struct vcpu *v, unsigned long va, { struct domain *d = v-domain; +/* If we get a page fault whilst in HVM security user mode */ +if( v-user_mode == 1 ) +{ +printk(HVM: #PF (%u:%u) whilst in user mode\n, + d-domain_id, v-vcpu_id); +domain_crash_synchronous(); +} + This should happen in paging_fault() so it can guard the shadow-pagetable paths too. Once it's there, it'll need a check for is_hvm_vcpu() as well as for user_mode. Maybe have a helper function 'is_hvm_deprivileged_vcpu()' to do both checks, also used in hvm_deprivileged_check_trap() c. Ok, I'll make this change. HAP_ERROR(Intercepted a guest #PF (%u:%u) with HAP enabled.\n, d-domain_id, v-vcpu_id); + domain_crash(d); return 0; } diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c index 9f5a6c6..19d465f 100644 --- a/xen/arch/x86/traps.c +++ b/xen/arch/x86/traps.c @@ -74,6 +74,7 @@ #include asm/vpmu.h #include public/arch-x86/cpuid.h #include xsm/xsm.h +#include xen/hvm/deprivileged.h /* * opt_nmi: one of 'ignore', 'dom0', or 'fatal'. @@ -500,6 +501,11 @@ static void do_guest_trap( struct trap_bounce *tb; const struct trap_info *ti; +/* If we take the trap whilst in HVM deprivileged mode + * then we should crash the domain. + */ +hvm_deprivileged_check_trap(__FUNCTION__); I wonder whether it would be better to switch to an IDT with all unacceptable traps stubbed out, rather than have to blacklist them all separately. Probably not - this check is cheap, and maintaining the parallel tables would be a pain. Or maybe there's some single point upstream of here, in the asm handlers, that would catch all the cases where this check is needed? Yep, I think this can be done. In any case, the check needs to return an error code so the caller knows to return without running the rest of the handler (and likewise elsewhere). understood. Cheers, Tim. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] OSSTEST -- nested test case development, RFC: ts-guest-destroy doesn't call guest_await_dhcp_tcp() if guest has fixed IP
Ian Campbell writes (Re: OSSTEST -- nested test case development, RFC: ts-guest-destroy doesn't call guest_await_dhcp_tcp() if guest has fixed IP): However by reconfiguring things to be static the L1 host will no longer be generating DHCP RENEW requests when the lease times out, so the DHCP server is at liberty to release the lease when it times out or, worse, reuse the IP address for something else. Indeed. This is wrong. So I think we do actually need to start supporting a dynamic mode for at least L1 hosts (and that may well easily extend to L0 hosts too). Although it is not 100% accurate I think we can assume that DHCP renewal will always work, i.e. once a host is given a particular IP address so long as it keeps renewing the lease it will keep the same address. It isn't clear to me that we need to make this assumption, in the general case. We probably need to assume that the DHCP-assigned IP addresses don't change unexpectedly during the execution of a particular ts-* script (where `unexpectedly' means `other than as an obvious consequence of actions such as rebooting). So I think we can still discover the DHCP address assigned to the L1 guest, and propagate it into $r{${l1ident}_ip} when we convert it to an L1 host, but we then also need to modify the Xen installation runs to use dhcp mode for such cases and not switch to static as we do for an L0 host. This would be the right approach, but ... I'm not quite sure how this should be recorded in the runvars. I think we may want to wait for Ian to return from vacation next week. ... having looked at it like this, I think recording the L1 IP addresss in the runvars is wrong. It should be looked up each time (by something called by selecthost). The alternative would be that selecthost needs to query the DHCP leases file for these kinds of hosts, that would have the benefit of handling potential lease expiry over a reboot. Exactly. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH V3 1/6] x86/xsaves: enable xsaves/xrstors for pv guest
On 11/08/15 08:50, Shuai Ruan wrote: On Fri, Aug 07, 2015 at 01:44:41PM +0100, Andrew Cooper wrote: On 07/08/15 09:00, Shuai Ruan wrote: +goto skip; +} + +if ( !guest_kernel_mode(v, regs) || (regs-edi 0x3f) ) What does edi have to do with xsaves? only edx:eax are special according to the manual. regs-edi is the guest_linear_address Whyso? xsaves takes an unconditional memory parameter, not a pointer in %rdi. (regs-edi is only correct for ins/outs because the pointer is architecturally required to be in %rdi.) You are right. The linear_address should be decoded from the instruction. There is nothing currently in emulate_privileged_op() which does ModRM decoding for memory references, nor SIB decoding. xsaves/xrstors would be the first such operations. I am also not sure that adding arbitrary memory decode here is sensible. In an ideal world, we would have what is currently x86_emulate() split in 3 stages. Stage 1 does straight instruction decode to some internal representation. Stage 2 does an audit to see whether the decoded instruction is plausible for the reason why an emulation was needed. We have had a number of security issues with emulation in the past where guests cause one instruction to trap for emulation, then rewrite the instruction to be something else, and exploit a bug in the emulator. Stage 3 performs the actions required for emulation. Currently, x86_emulate() is limited to instructions which might legitimately fault for emulation, but with the advent of VM introspection, this is proving to be insufficient. With my x86 maintainers hat on, I would like to avoid the current situation we have with multiple bits of code doing x86 instruction decode and emulation (which are all different). I think the 3-step approach above caters suitably to all usecases, but it is a large project itself. It allows the introspection people to have a full and complete x86 emulation infrastructure, while also preventing areas like the shadow paging from being opened up to potential vulnerabilities in unrelated areas of the x86 architecture. I would even go so far as to say that it is probably ok not to support xsaves/xrestors in PV guests until something along the above lines is sorted. The first feature in XSS is processor trace which a PV guest couldn't use anyway. I suspect the same applies to most of the other Why PV guest couldn't use precessor trace? After more consideration, Xen should not expose xsaves/xrstors to PV guests at all. XSS features, or they wouldn't need to be privileged in the first place. Thanks for your such detail suggestions. For xsaves/xrstors would also bring other benefits for PV guest such as saving memory of XSAVE area. If we do not support xsaves/xrstors in PV , PV guest would lose these benefits. What's your opinions toward this? PV guests running under Xen are exactly the same as regular user processes running under Linux. There is a reason everything covered by xsaves/xrstors is restricted to ring0; it would be a security hole to allow guests to configure the features themselves. Features such as Processor Trace would need a hypercall interface for guests to use. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [RFC 3/4] HVM x86 deprivileged mode: Code for switching into/out of deprivileged mode
On 10/08/15 10:49, Tim Deegan wrote: Hi, At 17:45 +0100 on 06 Aug (1438883118), Ben Catterall wrote: The process to switch into and out of deprivileged mode can be likened to setjmp/longjmp. To enter deprivileged mode, we take a copy of the stack from the guest's registers up to the current stack pointer. This copy is pretty unfortunate, but I can see that avoiding it will be a bit complex. Could we do something with more stacks? AFAICS there have to be three stacks anyway: - one to hold the depriv execution context; - one to hold the privileged execution context; and - one to take interrupts on. So maybe we could do some fiddling to make Xen take interrupts on a different stack while we're depriv'd? If we do have to copy, we could track whether the original stack has been clobbered by an interrupt, and so avoid (at least some of) the copy back afterwards? One nit in the assembler - if I've followed correctly, this saved IP: +/* Perform a near call to push rip onto the stack */ +call 1f is returned to (with adjustments) here: +/* Go to user mode return code */ +jmp*(%rsi) It would be good to make this a matched pair of call/ret if we can; the CPU has special branch prediction tracking for function calls that gets confused by a call that's not returned to. sure, will do. Cheers, Tim. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH V6 3/7] libxl: add pvusb API
On Mon, Aug 10, 2015 at 06:35:24PM +0800, Chunyan Liu wrote: Add pvusb APIs, including: - attach/detach (create/destroy) virtual usb controller. - attach/detach usb device - list usb controller and usb devices - some other helper functions Signed-off-by: Chunyan Liu cy...@suse.com Signed-off-by: Simon Cao caobosi...@gmail.com --- changes: - Address George's comments: * Update libxl_device_usb_getinfo to read ctrl/port only and get other information. * Update backend path according to xenstore frontend 'xxx/backend' entry instead of using TOOLSTACK_DOMID. * Use 'type' to indicate qemu/pv instead of previous naming 'protocol'. * Add USB 'devtype' union, currently only includes hostdev I will leave this to Ian and George since they had strong opinions on this. I only skimmed this patch. Some comments below. [...] + +int libxl_device_usb_getinfo(libxl_ctx *ctx, uint32_t domid, + libxl_device_usb *usb, + libxl_usbinfo *usbinfo); + /* Network Interfaces */ int libxl_device_nic_add(libxl_ctx *ctx, uint32_t domid, libxl_device_nic *nic, const libxl_asyncop_how *ao_how) diff --git a/tools/libxl/libxl_device.c b/tools/libxl/libxl_device.c index bee5ed5..935f25b 100644 --- a/tools/libxl/libxl_device.c +++ b/tools/libxl/libxl_device.c @@ -676,6 +676,10 @@ void libxl__devices_destroy(libxl__egc *egc, libxl__devices_remove_state *drs) aodev-action = LIBXL__DEVICE_ACTION_REMOVE; aodev-dev = dev; aodev-force = drs-force; +if (dev-backend_kind == LIBXL__DEVICE_KIND_VUSB) { +libxl__initiate_device_usbctrl_remove(egc, aodev); +continue; +} Is there a risk that this races with individual device removal? I think you get away with it because removal of individual device is idempotent? libxl__initiate_device_remove(egc, aodev); } } diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h index f98f089..5be3b3a 100644 --- a/tools/libxl/libxl_internal.h +++ b/tools/libxl/libxl_internal.h @@ -2553,6 +2553,14 @@ _hidden void libxl__device_vtpm_add(libxl__egc *egc, uint32_t domid, libxl_device_vtpm *vtpm, libxl__ao_device *aodev); +_hidden void libxl__device_usbctrl_add(libxl__egc *egc, uint32_t domid, + libxl_device_usbctrl *usbctrl, + libxl__ao_device *aodev); + +_hidden void libxl__device_usb_add(libxl__egc *egc, uint32_t domid, + libxl_device_usb *usb, + libxl__ao_device *aodev); + /* Internal function to connect a vkb device */ _hidden int libxl__device_vkb_add(libxl__gc *gc, uint32_t domid, libxl_device_vkb *vkb); @@ -2585,6 +2593,13 @@ _hidden void libxl__wait_device_connection(libxl__egc*, _hidden void libxl__initiate_device_remove(libxl__egc *egc, libxl__ao_device *aodev); +_hidden int libxl__device_from_usbctrl(libxl__gc *gc, uint32_t domid, [...] +void libxl__device_usb_add(libxl__egc *egc, uint32_t domid, + libxl_device_usb *usb, + libxl__ao_device *aodev) +{ +STATE_AO_GC(aodev-ao); +int rc = -1; +char *busid = NULL; + +assert(usb-u.hostdev.hostbus 0 usb-u.hostdev.hostaddr 0); + +busid = usb_busaddr_to_busid(gc, usb-u.hostdev.hostbus, + usb-u.hostdev.hostaddr); +if (!busid) { +LOG(ERROR, USB device doesn't exist in sysfs); +goto out; +} + +if (!is_usb_assignable(gc, usb)) { +LOG(ERROR, USB device is not assignable.); +goto out; +} + +/* check usb device is already assigned */ +if (is_usb_assigned(gc, usb)) { +LOG(ERROR, USB device is already attached to a domain.); +goto out; +} + +rc = libxl__device_usb_setdefault(gc, domid, usb, aodev-update_json); +if (rc) goto out; + +rc = libxl__device_usb_add_xenstore(gc, domid, usb, aodev-update_json); +if (rc) goto out; + +rc = usbback_dev_assign(gc, usb); +if (rc) { +libxl__device_usb_remove_xenstore(gc, domid, usb); +goto out; +} + +libxl__ao_complete(egc, ao, 0); +rc = 0; + +out: You forget to complete ao in failure path. But I'm not very familiar with the AO machinery, I will let Ian comment on this. Wei. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH V6 7/7] domcreate: support pvusb in configuration file
On Mon, Aug 10, 2015 at 06:35:28PM +0800, Chunyan Liu wrote: Add code to support pvusb in domain config file. One could specify usbctrl and usb in domain's configuration file and create domain, then usb controllers will be created and usb device would be attached to guest automatically. One could specify usb controllers and usb devices in config file like this: usbctrl=['version=2,ports=4', 'version=1, ports=4', ] usbdev=['2.1,controller=0,port=1', ] Signed-off-by: Chunyan Liu cy...@suse.com Signed-off-by: Simon Cao caobosi...@gmail.com --- [...] } +if (!xlu_cfg_get_list(config, usbctrl, usbctrls, 0, 0)) { +d_config-num_usbctrls = 0; +d_config-usbctrls = NULL; +while ((buf = xlu_cfg_get_listitem(usbctrls, d_config-num_usbctrls)) + != NULL) { +libxl_device_usbctrl *usbctrl; + +d_config-usbctrls = +(libxl_device_usbctrl *)realloc(d_config-usbctrls, +sizeof(libxl_device_usbctrl) * (d_config-num_usbctrls + 1)); +usbctrl = d_config-usbctrls + d_config-num_usbctrls; +libxl_device_usbctrl_init(usbctrl); + Use ARRAY_EXTEND_INIT macro. +parse_usbctrl_config(usbctrl, buf); + +d_config-num_usbctrls++; +} +} + +if (!xlu_cfg_get_list(config, usbdev, usbs, 0, 0)) { +d_config-num_usbs = 0; +d_config-usbs = NULL; +while ((buf = xlu_cfg_get_listitem(usbs, d_config-num_usbs)) != NULL) { +libxl_device_usb *usb; + +d_config-usbs = (libxl_device_usb *)realloc(d_config-usbs, +sizeof(libxl_device_usb) * (d_config-num_usbs + 1)); +usb = d_config-usbs + d_config-num_usbs; +libxl_device_usb_init(usb); + Ditto. Wei. +parse_usb_config(usb, buf); + +d_config-num_usbs++; +} +} + switch (xlu_cfg_get_list(config, cpuid, cpuids, 0, 1)) { case 0: { -- 2.1.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH V6 2/7] libxl_read_file_contents: add new entry to read sysfs file
On Mon, Aug 10, 2015 at 06:35:23PM +0800, Chunyan Liu wrote: Sysfs file has size=4096 but actual file content is less than that. Current libxl_read_file_contents will treat it as error when file size and actual file content differs, so reading sysfs file content with this function always fails. Add a new entry libxl_read_sysfs_file_contents to handle sysfs file specially. It would be used in later pvusb work. Signed-off-by: Chunyan Liu cy...@suse.com --- Changes: - read one more byte to check bigger size problem. tools/libxl/libxl_internal.h | 2 ++ tools/libxl/libxl_utils.c| 51 ++-- 2 files changed, 42 insertions(+), 11 deletions(-) diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h index 6013628..f98f089 100644 --- a/tools/libxl/libxl_internal.h +++ b/tools/libxl/libxl_internal.h @@ -4001,6 +4001,8 @@ void libxl__bitmap_copy_best_effort(libxl__gc *gc, libxl_bitmap *dptr, int libxl__count_physical_sockets(libxl__gc *gc, int *sockets); #endif +_hidden int libxl_read_sysfs_file_contents(libxl_ctx *ctx, const char *filename, + void **data_r, int *datalen_r); Indentation looks wrong. /* * Local variables: diff --git a/tools/libxl/libxl_utils.c b/tools/libxl/libxl_utils.c index bfc9699..9234efb 100644 --- a/tools/libxl/libxl_utils.c +++ b/tools/libxl/libxl_utils.c @@ -322,8 +322,10 @@ out: return rc; } -int libxl_read_file_contents(libxl_ctx *ctx, const char *filename, - void **data_r, int *datalen_r) { +static int libxl_read_file_contents_core(libxl_ctx *ctx, const char *filename, + void **data_r, int *datalen_r, + bool tolerate_shrinking_file) +{ GC_INIT(ctx); FILE *f = 0; uint8_t *data = 0; @@ -359,20 +361,34 @@ int libxl_read_file_contents(libxl_ctx *ctx, const char *filename, datalen = stab.st_size; if (stab.st_size data_r) { -data = malloc(datalen); +data = malloc(datalen + 1); if (!data) goto xe; -rs = fread(data, 1, datalen, f); -if (rs != datalen) { -if (ferror(f)) +rs = fread(data, 1, datalen + 1, f); +if (rs datalen) { +LOG(ERROR, %s increased size while we were reading it, +filename); +goto xe; +} + +if (rs datalen) { +if (ferror(f)) { LOGE(ERROR, failed to read %s, filename); -else if (feof(f)) -LOG(ERROR, %s changed size while we were reading it, - filename); -else +goto xe; +} else if (feof(f)) { +if (tolerate_shrinking_file) { +datalen = rs; +} else { +LOG(ERROR, %s shrunk size while we were reading it, +filename); +goto xe; +} +} else { abort(); -goto xe; +} This is a bit bikeshedding, but you can leave goto xe out of two `if' to reduce patch size. } + +data = realloc(data, datalen); Should check return value of realloc. The logic of this function reflects what has been discussed so far. Wei. } if (fclose(f)) { @@ -396,6 +412,19 @@ int libxl_read_file_contents(libxl_ctx *ctx, const char *filename, return e; } +int libxl_read_file_contents(libxl_ctx *ctx, const char *filename, + void **data_r, int *datalen_r) +{ +return libxl_read_file_contents_core(ctx, filename, data_r, datalen_r, 0); +} + +int libxl_read_sysfs_file_contents(libxl_ctx *ctx, const char *filename, + void **data_r, int *datalen_r) +{ +return libxl_read_file_contents_core(ctx, filename, data_r, datalen_r, 1); +} + + #define READ_WRITE_EXACTLY(rw, zero_is_eof, constdata)\ \ int libxl_##rw##_exactly(libxl_ctx *ctx, int fd, \ -- 2.1.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [URGENT RFC] Branching and reopening -unstable
CCing Hongyang, I missed him when I copy-n-paste emails from MAINTAINERS. On Tue, Aug 11, 2015 at 11:44:07AM +0100, Wei Liu wrote: Hi all RC1 is going to be tagged this week (maybe today). We need to figure out when to branch / reopen -unstable for committing and what rules should be applied until 4.6 is out of the door. Ian, Ian and I had a conversation IRL. We discussed several things, but figured it is necessary to have more people involved before making any decision. Here is my recollection of the conversation. Branching should be done at one of the RC tags. It might not be enough time for us to reach consensus before tagging RC1, so I would say lets branch at RC2 if we don't observe blocker bugs. Maintainers should be responsible for both 4.6 branch and -unstable branch. As for bug fixes, here are two options. Option 1: bug fixes go into -unstable, backport / cherry-pick bug fixes back to 4.6. This seems to leave the tree in half frozen status because we need to reject refactoring patches in case they cause backporting failure. Option 2: bug fixes go into 4.6, merge them to -unstable. If merge has conflict and maintainers can't deal with that, the authors of those changes in -unstable which cause conflict is responsible for fixing up the conflict. Ian and Ian, anything I miss? Anything to add? Others, thoughts? Wei. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH OSSTEST v2] Arrange to test migration from the previous Xen version
On Mon, 2015-08-03 at 17:01 +0100, Ian Campbell wrote: On Fri, 2015-07-24 at 17:28 +0100, Ian Campbell wrote: @@ -191,6 +208,27 @@ create_build_jobs () { revision_ovmf=$REVISION_OVMF done +if [ x$want_prevxen = xy ] ; then +if [ x$REVISION_PREVXEN = x ] ; then +echo 2 prevxen ?; exit 1 +fi This breaks things with standalone mode, or any make-flight which didn't come from cr-daily-branch. In such cases we don't have REVISION_XEN or TREE_XEN either, we just get the defaults. I think we need to do something like select_prevxenbranch but to pick a xen.git branch name rather than an osstest branch name. Or we quietly skip this test if REVISION_PREVXEN is not set. One to chew on I think. At the moment I'm somewhat inclined towards omitting the build-$ARCH-prev job in this case but still creating the associated test jobs. In standalone mode this may still be useful (maybe your hosts are already configured and you want to run an individual step). In production mode the test jobs will then fail their ts-build-check step, which correctly reflects what has happened. I think this is the effect of the following incremental patch. Ian. diff --git a/mfi-common b/mfi-common index 737db99..810e533 100644 --- a/mfi-common +++ b/mfi-common @@ -208,10 +208,7 @@ create_build_jobs () { revision_ovmf=$REVISION_OVMF done -if [ x$want_prevxen = xy ] ; then -if [ x$REVISION_PREVXEN = x ] ; then -echo 2 prevxen ?; exit 1 -fi +if [ x$want_prevxen = xy -a x$REVISION_PREVXEN != x ] ; then # TODO could find latest pass on that branch and attempt to reuse. #bfiprevxen=... # ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH xen-tip] xen/PMU: __pcpu_scope_xenpmu_shared can be static
Signed-off-by: Fengguang Wu fengguang...@intel.com --- pmu.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/xen/pmu.c b/arch/x86/xen/pmu.c index 7218cea..1d1ae1b 100644 --- a/arch/x86/xen/pmu.c +++ b/arch/x86/xen/pmu.c @@ -15,7 +15,7 @@ /* Shared page between hypervisor and domain */ -DEFINE_PER_CPU(struct xen_pmu_data *, xenpmu_shared); +static DEFINE_PER_CPU(struct xen_pmu_data *, xenpmu_shared); #define get_xenpmu_data()per_cpu(xenpmu_shared, smp_processor_id()) /* perf callbacks */ ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [xen-tip:linux-next 19/23] arch/x86/xen/pmu.c:18:1: sparse: symbol '__pcpu_scope_xenpmu_shared' was not declared. Should it be static?
tree: git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip linux-next head: 0d26d72cab825a0227c8d8e0e42161125b3116fd commit: 9cd3857a7d89a259870c6ee6994f5ef41511654c [19/23] xen/PMU: Initialization code for Xen PMU reproduce: # apt-get install sparse git checkout 9cd3857a7d89a259870c6ee6994f5ef41511654c make ARCH=x86_64 allmodconfig make C=1 CF=-D__CHECK_ENDIAN__ sparse warnings: (new ones prefixed by ) arch/x86/xen/pmu.c:18:1: sparse: symbol '__pcpu_scope_xenpmu_shared' was not declared. Should it be static? Please review and possibly fold the followup patch. --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v4 11/17] vt-d: Add API to update IRTE when VT-d PI is used
On 28.07.15 at 09:34, feng...@intel.com wrote: From: Jan Beulich [mailto:jbeul...@suse.com] Sent: Friday, July 24, 2015 11:28 PM On 23.07.15 at 13:35, feng...@intel.com wrote: +GET_IREMAP_ENTRY(ir_ctrl-iremap_maddr, remap_index, iremap_entries, p); + +old_ire = new_ire = *p; + +/* Setup/Update interrupt remapping table entry. */ +setup_posted_irte(new_ire, pi_desc, gvec); +ret = cmpxchg16b(p, old_ire, new_ire); + +ASSERT(ret == *(__uint128_t *)old_ire); + +iommu_flush_cache_entry(p, sizeof(struct iremap_entry)); sizeof(*p) please. +iommu_flush_iec_index(iommu, 0, remap_index); + +if ( iremap_entries ) +unmap_vtd_domain_page(iremap_entries); The conditional comes way too late: Either GET_IREMAP_ENTRY() can produce NULL, in which case you're hosed above. Or it can't, in which case the check here is pointless. I cannot find the case GET_IREMAP_ENTRY() produce NULL for iremap_entries, And I didn't say it would - I simply listed both possibilities and their respective consequences for your code. if it is, GET_IREMAP_ENTRY() itself will get a big problem, right? So this check is not needed, maybe I can add an ASSERT() after GET_IREMAP_ENTRY(). You might, but iirc no other uses do so, so you could as well omit any such checks. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH for-4.6] tools: Don't try to update the firmware directory on ARM
On Sun, 2015-08-09 at 14:49 +0100, Julien Grall wrote: Hi Wei, On 08/08/2015 16:16, Wei Liu wrote: On Fri, Aug 07, 2015 at 06:27:18PM +0100, Julien Grall wrote: The firmware directory is not built at all on ARM. Attempting to update it using the target subtree-force-update will fail when try to update seabios. Signed-off-by: Julien Grall julien.gr...@citrix.com --- Cc: Ian Jackson ian.jack...@eu.citrix.com Cc: Stefano Stabellini stefano.stabell...@eu.citrix.com Cc: Ian Campbell ian.campb...@citrix.com Cc: Wei Liu wei.l...@citrix.com I've noticed it while trying to update the QEMU tree used by Xen on a platform where iasl is not present (required by seabios in order to update it). I think this should go in Xen 4.6 and possibly backport to Xen 4.5 --- tools/Makefile | 2 ++ 1 file changed, 2 insertions(+) diff --git a/tools/Makefile b/tools/Makefile index 45cb4b2..2618559 100644 --- a/tools/Makefile +++ b/tools/Makefile @@ -305,7 +305,9 @@ endif ifeq ($(CONFIG_QEMU_TRAD),y) $(MAKE) qemu-xen-traditional-dir-force-update endif +ifeq ($(CONFIG_X86),y) $(MAKE) -C firmware subtree-force-update +endif This is not optimal. What if you want to build OVMF on arm in the future? Slight aside, but I already looked at doing this but concluded that the right answer was to add this to raisin not xen.git. As it happens on ARM we would boot the UEFI binary directly, so we don't need to compile it into hvmloader or just through other hoops, so it is a bit easier than on x86. You also can't preclude you don't have any other firmwares that need to be built on ARM in the future. I think a proper way of doing this is to make CONFIG_SEABIOS=n when you're building on ARM. See tools/configure.ac. tools/Makefile only build the firmware directory for x86 see: SUBDIRS-$(CONFIG_X86) += firmware Hence why I wrote the patch in the current way. I think having the update rule match (in spirit at least) the SUBDIRS rules make sense as a patch for now, so I'm in favour of taking this patch as it is. Building the firmware directory for would require more work than replace SUBDIRS-$(CONFIG_X86) to SUBDIRS-y. In general, I do agree that we enable this with configure.ac but, IHMO this is not Xen 4.6 material... Although I would be happy to fix it for Xen 4.7. Regards, ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [URGENT RFC] Branching and reopening -unstable
On 11/08/15 12:13, Ian Jackson wrote: Wei Liu writes ([URGENT RFC] Branching and reopening -unstable): Branching should be done at one of the RC tags. It might not be enough time for us to reach consensus before tagging RC1, so I would say lets branch at RC2 if we don't observe blocker bugs. Maintainers should be responsible for both 4.6 branch and -unstable branch. As for bug fixes, here are two options. I think this conflates the three questions which should be answered: Q1: What is the status of the newly branched -unstable ? Should we avoid (some or all) big sets of changes ? (a) Don't branch (b) Branch but don't allow /any/ big changes. Seems to make branching rather pointless. (c) Branch but allow /some/ big changes. Tree is `half open', which is not ideal. (d) Branch and allow /all/ changes. Q2: If we don't avoid such changes, and a bugfix has a conflict with a change in the new unstable, who is responsible for fixing it up ? Options include: (a) the relevant maintainers (double whammy for maintainers) (b) the submitter of the bugfix (very undesirable) (c) the submitter of the big set of changes (but what do we do if they don't respond?) (d) the stable tree maintainers (already ruled out, so included in this list for completeness; out of the question IMO) Q3: What workflow should we use, for bugfixes for bugs in 4.6-pre ? There are three options, not two: (a) Bugfixes go to 4.6 first, cherry pick to unstable This keeps our focus on 4.6, which is good. (b) Bugfixes go to 4.6 first, merge 4.6 to unstable. Not tenable if we have big changes in unstable. (c) Bugfixes to to unstable, cherry pick to 4.6. Undesirable IMO because it shifts focus to unstable. Of these 2(c)/3(a) would be ideal but we don't have a good answer to the problem posted in Q2(c). I think that leaves us with 2(a): maintainers have to deal with the fallout. That makes 1(d) untenable in my view. As a maintainer, I do not want that additional workload. That leaves us with 1(a) or 1(c)/2(a)/3(a). With 1(c), who should decide on a particular series ? Well, who is taking the risk ? The maintainer, who will have to pick up the pieces. I therefore conclude, we have two options: A 1(a)/-/- Do not branch yet: defer divergence until the risk of bugfixes is much lower. B 1(c)(maintainer)/2(a)/3(a) Branch. Maintainers may choose to defer patch series based on risk of conflicts with bugfixes required for 4.6. Clear communication with submitters is required. Bugfixes for bugs in 4.6 will be accepted onto the 4.6 branch. Maintainers are required to cherry pick them onto unstable. Bugfixes will not be accepted for unstable unless it is clear that the bug was introduced in unstable since 4.6 branched. I am happy with B because it gives the relevant maintainers the option. Very much A. By definition, 1(c) will destabilise the tree and generate artificial work for the maintainers and committers. The most important action at this point is to stabilise 4.6 for release, and peoples efforts are far better spent pursuing that, rather than continuing work on unstable. For the sake of a couple of weeks, contributors can keep their patches for a little while longer. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [URGENT RFC] Branching and reopening -unstable
On Tue, 11 Aug 2015, Ian Jackson wrote: Wei Liu writes ([URGENT RFC] Branching and reopening -unstable): Branching should be done at one of the RC tags. It might not be enough time for us to reach consensus before tagging RC1, so I would say lets branch at RC2 if we don't observe blocker bugs. Maintainers should be responsible for both 4.6 branch and -unstable branch. As for bug fixes, here are two options. I think this conflates the three questions which should be answered: Q1: What is the status of the newly branched -unstable ? Should we avoid (some or all) big sets of changes ? (a) Don't branch (b) Branch but don't allow /any/ big changes. Seems to make branching rather pointless. (c) Branch but allow /some/ big changes. Tree is `half open', which is not ideal. (d) Branch and allow /all/ changes. Q2: If we don't avoid such changes, and a bugfix has a conflict with a change in the new unstable, who is responsible for fixing it up ? Options include: (a) the relevant maintainers (double whammy for maintainers) (b) the submitter of the bugfix (very undesirable) Why is it very undesirable? In the Linux community for example is customary to provide a patch for each of the stable trees you need backports to, in case there are any merge conflicts. This would be the same. (c) the submitter of the big set of changes (but what do we do if they don't respond?) (d) the stable tree maintainers (already ruled out, so included in this list for completeness; out of the question IMO) Q3: What workflow should we use, for bugfixes for bugs in 4.6-pre ? There are three options, not two: (a) Bugfixes go to 4.6 first, cherry pick to unstable This keeps our focus on 4.6, which is good. (b) Bugfixes go to 4.6 first, merge 4.6 to unstable. Not tenable if we have big changes in unstable. (c) Bugfixes to to unstable, cherry pick to 4.6. Undesirable IMO because it shifts focus to unstable. Of these 2(c)/3(a) would be ideal but we don't have a good answer to the problem posted in Q2(c). I think that leaves us with 2(a): maintainers have to deal with the fallout. That makes 1(d) untenable in my view. As a maintainer, I do not want that additional workload. That leaves us with 1(a) or 1(c)/2(a)/3(a). With 1(c), who should decide on a particular series ? Well, who is taking the risk ? The maintainer, who will have to pick up the pieces. I therefore conclude, we have two options: A 1(a)/-/- Do not branch yet: defer divergence until the risk of bugfixes is much lower. B 1(c)(maintainer)/2(a)/3(a) Branch. Maintainers may choose to defer patch series based on risk of conflicts with bugfixes required for 4.6. Clear communication with submitters is required. Bugfixes for bugs in 4.6 will be accepted onto the 4.6 branch. Maintainers are required to cherry pick them onto unstable. Bugfixes will not be accepted for unstable unless it is clear that the bug was introduced in unstable since 4.6 branched. I am happy with B because it gives the relevant maintainers the option. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Design doc of adding ACPI support for arm64 on Xen - version 2
On 2015/8/11 17:46, Julien Grall wrote: On 11/08/15 03:09, Shannon Zhao wrote: Hi Julien, Hi Shannon, On 2015/8/7 18:33, Julien Grall wrote: Hi Shannon, Just some clarification questions. On 07/08/15 03:11, Shannon Zhao wrote: 3. Dom0 gets grant table and event channel irq information --- As said above, we assign the hypervisor_id be XenVMM to tell Dom0 that it runs on Xen hypervisor. For grant table, add two new HVM_PARAMs: HVM_PARAM_GNTTAB_START_ADDRESS and HVM_PARAM_GNTTAB_SIZE. For event channel irq, reuse HVM_PARAM_CALLBACK_IRQ and add a new delivery type: val[63:56] == 3: val[15:8] is flag: val[7:0] is a PPI (ARM and ARM64 only) Can you describe the content of flag? This needs definition as well. I think it could use the definition of xenv table. Bit 0 stands interrupt mode and bit 1 stands interrupt polarity. And explain it in the comment of HVM_PARAM_CALLBACK_IRQ. That would be fine for me. When constructing Dom0 in Xen, save these values. Then Dom0 could get them through hypercall HVMOP_get_param. 4. Map MMIO regions --- Register a bus_notifier for platform and amba bus in Linux. Add a new XENMAPSPACE XENMAPSPACE_dev_mmio. Within the register, check if the device is newly added, then call hypercall XENMEM_add_to_physmap to map the mmio regions. 5. Route device interrupts to Dom0 -- Route all the SPI interrupts to Dom0 before Dom0 booting. Not all the SPI will be routed to DOM0. Some are used by Xen and should never be used by any guest. I have in mind the UART and SMMU interrupts. You will have to find away to skip them nicely. Note that not all the IRQs used by Xen are properly registered when we build DOM0 (see the SMMU). To uart, we can get the interrupt information from SPCR table and hide it from Dom0. Can you clarify your meaning of hide from DOM0? Did you mean avoid to route the SPI to DOM0? Yes. IIUC, currently Xen (as well as Linux) doesn't support use SMMU when booting with ACPI. When it supports, it could read the interrupts information from IORT table and Hide them from Dom0. Well for Xen we don't even have ACPI supported upstream ;). For Linux there is some on-going work. Anyway, this is not important right now. Yeah, that could be done after this patchset upstream. Thanks, -- Shannon ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH V6 1/7] libxl: export some functions for pvusb use
On Mon, Aug 10, 2015 at 06:35:22PM +0800, Chunyan Liu wrote: Signed-off-by: Chunyan Liu cy...@suse.com Signed-off-by: Simon Cao caobosi...@gmail.com Acked-by: Wei Liu wei.l...@citrix.com --- tools/libxl/libxl.c | 4 ++-- tools/libxl/libxl_internal.h | 3 +++ 2 files changed, 5 insertions(+), 2 deletions(-) diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c index 083f099..006e8da 100644 --- a/tools/libxl/libxl.c +++ b/tools/libxl/libxl.c @@ -1995,7 +1995,7 @@ out: } /* common function to get next device id */ -static int libxl__device_nextid(libxl__gc *gc, uint32_t domid, char *device) +int libxl__device_nextid(libxl__gc *gc, uint32_t domid, char *device) { char *dompath, **l; unsigned int nb; @@ -2014,7 +2014,7 @@ static int libxl__device_nextid(libxl__gc *gc, uint32_t domid, char *device) return nextid; } -static int libxl__resolve_domid(libxl__gc *gc, const char *name, +int libxl__resolve_domid(libxl__gc *gc, const char *name, uint32_t *domid) Nit: please adjust indentation. { if (!name) diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h index 6ea6c83..6013628 100644 --- a/tools/libxl/libxl_internal.h +++ b/tools/libxl/libxl_internal.h @@ -1152,6 +1152,9 @@ _hidden int libxl__init_console_from_channel(libxl__gc *gc, libxl__device_console *console, int dev_num, libxl_device_channel *channel); +_hidden int libxl__device_nextid(libxl__gc *gc, uint32_t domid, char *device); +_hidden int libxl__resolve_domid(libxl__gc *gc, const char *name, + uint32_t *domid); /* * For each aggregate type which can be used as an input we provide: -- 2.1.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [linux-next test] 60648: regressions - FAIL
flight 60648 linux-next real [real] http://logs.test-lab.xenproject.org/osstest/logs/60648/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm 9 debian-hvm-install fail REGR. vs. 60637 test-amd64-i386-qemut-rhel6hvm-amd 9 redhat-install fail REGR. vs. 60637 Regressions which are regarded as allowable (not blocking): test-armhf-armhf-xl-rtds 11 guest-start fail REGR. vs. 60637 test-amd64-i386-xl 14 guest-saverestorefail like 60637 test-amd64-i386-xl-xsm 14 guest-saverestorefail like 60637 test-amd64-i386-pair21 guest-migrate/src_host/dst_host fail like 60637 test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop fail like 60637 test-amd64-i386-xl-qemut-win7-amd64 17 guest-stop fail like 60637 test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stop fail like 60637 Tests which did not succeed, but are not blocking: test-amd64-amd64-xl-pvh-intel 14 guest-saverestorefail never pass test-amd64-amd64-xl-pvh-amd 11 guest-start fail never pass test-armhf-armhf-libvirt-raw 9 debian-di-installfail never pass test-armhf-armhf-xl-qcow2 9 debian-di-installfail never pass test-armhf-armhf-libvirt-vhd 9 debian-di-installfail never pass test-armhf-armhf-xl-raw 9 debian-di-installfail never pass test-armhf-armhf-xl-vhd 9 debian-di-installfail never pass test-armhf-armhf-libvirt-qcow2 9 debian-di-installfail never pass test-amd64-i386-libvirt-xsm 12 migrate-support-checkfail never pass test-amd64-i386-libvirt-xsm 14 guest-saverestorefail never pass test-amd64-amd64-libvirt 12 migrate-support-checkfail never pass test-amd64-amd64-libvirt-pair 21 guest-migrate/src_host/dst_host fail never pass test-amd64-amd64-libvirt-xsm 12 migrate-support-checkfail never pass test-amd64-i386-libvirt 14 guest-saverestorefail never pass test-amd64-i386-libvirt 12 migrate-support-checkfail never pass test-amd64-i386-libvirt-pair 21 guest-migrate/src_host/dst_host fail never pass test-armhf-armhf-xl-arndale 12 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 13 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-qcow2 11 migrate-support-checkfail never pass test-armhf-armhf-libvirt-xsm 12 migrate-support-checkfail never pass test-armhf-armhf-libvirt-xsm 14 guest-saverestorefail never pass test-amd64-i386-libvirt-vhd 11 migrate-support-checkfail never pass test-amd64-amd64-libvirt-raw 11 migrate-support-checkfail never pass test-amd64-amd64-libvirt-vhd 11 migrate-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check fail never pass test-armhf-armhf-xl-xsm 13 saverestore-support-checkfail never pass test-armhf-armhf-xl-xsm 12 migrate-support-checkfail never pass test-armhf-armhf-xl 12 migrate-support-checkfail never pass test-armhf-armhf-xl 13 saverestore-support-checkfail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check fail never pass test-armhf-armhf-xl-cubietruck 12 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 13 saverestore-support-checkfail never pass test-armhf-armhf-xl-multivcpu 13 saverestore-support-checkfail never pass test-armhf-armhf-xl-multivcpu 12 migrate-support-checkfail never pass test-armhf-armhf-libvirt 14 guest-saverestorefail never pass test-armhf-armhf-libvirt 12 migrate-support-checkfail never pass test-amd64-i386-libvirt-raw 11 migrate-support-checkfail never pass test-amd64-amd64-xl-qemut-win7-amd64 17 guest-stop fail never pass test-amd64-i386-libvirt-qcow2 11 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 13 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit2 12 migrate-support-checkfail never pass version targeted for testing: linuxb195df50400676bdeaacdca27051e1a71ccd570f baseline version: linuxdd2384a75d1c046faf068a6352732a204814b86d Last test of basis (not found) Failing since 0 1970-01-01 00:00:00 Z 16658 days Testing same since60648 2015-08-10 09:20:46 Z1 days1 attempts jobs: build-amd64-xsm pass build-armhf-xsm pass build-i386-xsm pass build-amd64
Re: [Xen-devel] [URGENT RFC] Branching and reopening -unstable
On Tue, 2015-08-11 at 12:13 +0100, Ian Jackson wrote: Wei Liu writes ([URGENT RFC] Branching and reopening -unstable): Branching should be done at one of the RC tags. It might not be enough time for us to reach consensus before tagging RC1, so I would say lets branch at RC2 if we don't observe blocker bugs. Maintainers should be responsible for both 4.6 branch and -unstable branch. As for bug fixes, here are two options. I think this conflates the three questions which should be answered: Q1: What is the status of the newly branched -unstable ? Should we avoid (some or all) big sets of changes ? (a) Don't branch (b) Branch but don't allow /any/ big changes. Seems to make branching rather pointless. (c) Branch but allow /some/ big changes. Tree is `half open', which is not ideal. (d) Branch and allow /all/ changes. Q2: If we don't avoid such changes, and a bugfix has a conflict with a change in the new unstable, who is responsible for fixing it up ? Options include: (a) the relevant maintainers (double whammy for maintainers) (b) the submitter of the bugfix (very undesirable) (c) the submitter of the big set of changes (but what do we do if they don't respond?) (d) the stable tree maintainers (already ruled out, so included in this list for completeness; out of the question IMO) Q3: What workflow should we use, for bugfixes for bugs in 4.6-pre ? There are three options, not two: (a) Bugfixes go to 4.6 first, cherry pick to unstable This keeps our focus on 4.6, which is good. (b) Bugfixes go to 4.6 first, merge 4.6 to unstable. Not tenable if we have big changes in unstable. (c) Bugfixes to to unstable, cherry pick to 4.6. Undesirable IMO because it shifts focus to unstable. FWIW I think historically we have always done (c) here. That's not to say we shouldn't change but thought it worth noting. Of these 2(c)/3(a) would be ideal but we don't have a good answer to the problem posted in Q2(c). I think that leaves us with 2(a): maintainers have to deal with the fallout. That makes 1(d) untenable in my view. As a maintainer, I do not want that additional workload. That leaves us with 1(a) or 1(c)/2(a)/3(a). With 1(c), who should decide on a particular series ? Well, who is taking the risk ? The maintainer, who will have to pick up the pieces. I therefore conclude, we have two options: A 1(a)/-/- Do not branch yet: defer divergence until the risk of bugfixes is much lower. B 1(c)(maintainer)/2(a)/3(a) Branch. Maintainers may choose to defer patch series based on risk of conflicts with bugfixes required for 4.6. Clear communication with submitters is required. Bugfixes for bugs in 4.6 will be accepted onto the 4.6 branch. Maintainers are required to cherry pick them onto unstable. Bugfixes will not be accepted for unstable unless it is clear that the bug was introduced in unstable since 4.6 branched. I am happy with B because it gives the relevant maintainers the option. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [URGENT RFC] Branching and reopening -unstable
On Tue, 11 Aug 2015, Wei Liu wrote: Hi all RC1 is going to be tagged this week (maybe today). We need to figure out when to branch / reopen -unstable for committing and what rules should be applied until 4.6 is out of the door. Ian, Ian and I had a conversation IRL. We discussed several things, but figured it is necessary to have more people involved before making any decision. Here is my recollection of the conversation. Branching should be done at one of the RC tags. It might not be enough time for us to reach consensus before tagging RC1, so I would say lets branch at RC2 if we don't observe blocker bugs. Maintainers should be responsible for both 4.6 branch and -unstable branch. As for bug fixes, here are two options. Option 1: bug fixes go into -unstable, backport / cherry-pick bug fixes back to 4.6. This seems to leave the tree in half frozen status because we need to reject refactoring patches in case they cause backporting failure. Option 2: bug fixes go into 4.6, merge them to -unstable. If merge has conflict and maintainers can't deal with that, the authors of those changes in -unstable which cause conflict is responsible for fixing up the conflict. Ian and Ian, anything I miss? Anything to add? Others, thoughts? I don't see why Option 1 should be different from Option 2 in terms of dealing with conflicts. I think that in both cases we should just ask contributors for help to fix the conflict. So I would go for a revised Option 1: Option 1b: bug fixes go into -unstable, backport / cherry-pick bug fixes back to 4.6. If merge has conflict and maintainers can't deal with that, the authors of those changes in -unstable which cause conflict is responsible for fixing up the conflict. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v4 07/11] x86/intel_pstate: the main boby of the intel_pstate driver
On 27.07.15 at 11:30, wei.w.w...@intel.com wrote: On 24/07/2015 21:54, Jan Beulich wrote: On 25.06.15 at 13:16, wei.w.w...@intel.com wrote: +int __init intel_pstate_init(void) +{ + int cpu, rc = 0; + const struct x86_cpu_id *id; + struct cpu_defaults *cpu_info; + + id = x86_match_cpu(intel_pstate_cpu_ids); + if (!id) + return -ENODEV; + + cpu_info = (struct cpu_defaults *)id-driver_data; + + copy_pid_params(cpu_info-pid_policy); + copy_cpu_funcs(cpu_info-funcs); + + if (intel_pstate_msrs_not_valid()) + return -ENODEV; + + all_cpu_data = xzalloc_array(struct cpudata *, NR_CPUS); + if (!all_cpu_data) + return -ENOMEM; + + rc = cpufreq_register_driver(intel_pstate_driver); + if (rc) + goto out; + + return rc; +out: + for_each_online_cpu(cpu) { + if (all_cpu_data[cpu]) { + kill_timer(all_cpu_data[cpu]-timer); + xfree(all_cpu_data[cpu]); + } + } I have a hard time seeing where in this function the setup happens that is being undone here (keeping in mind that the notifier registration inside cpufreq_register_driver() doesn't actually call the notifier function). And then, looking at the diff between this and what Linux 4.2-rc3 has (which admittedly looks a little newer than what you sent, so I already subtract some of the delta), it is significantly larger than the source file itself. That surely doesn't suggest a clone-with- minimal-delta. Yet as said before - either you do that, or you accept us picking at things you inherited from Linux. I think it's better to choose the latter - picking out things that are useful for us from Linux. Can you please take a look this patch and summarize the comments? Thanks. I'm sorry, but for a first round I'd rather expect _you_ to go through the code you intend to add and spot possible problems. Only then, on a submission where you state that you did so, would I want to invest time in sanity checking things. And then I hope you realize that the clone-with-minimal-delta would have benefits on the maintenance side going forward (fewer manual adjustments needed due to non-applying Linux side changes). Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] RFC: HVM de-privileged mode scheduling considerations
On 04/08/15 14:46, George Dunlap wrote: On Mon, Aug 3, 2015 at 3:34 PM, Ian Campbell ian.campb...@citrix.com wrote: On Mon, 2015-08-03 at 14:54 +0100, Andrew Cooper wrote: On 03/08/15 14:35, Ben Catterall wrote: Hi all, I am working on an x86 proof-of-concept to evaluate if it is feasible to move device models and x86 emulation code for HVM guests into a de-privileged context. I was hoping to get feedback from relevant maintainers on scheduling considerations for this system to mitigate potential DoS attacks. Many thanks in advance, Ben This is intended as a proof-of-concept, with the aim of determining if this idea is feasible within performance constraints. Motivation -- The motivation for moving the device models and x86 emulation code into ring 3 is to mitigate a system compromise due a bug in any of these systems. These systems are currently part of the hypervisor and, consequently, a bug in any of these could allow an attacker to gain control (or perform a DOS) of Xen and/or guests. Migrating between PCPUs --- There is a need to support migration between pcpus so that the scheduler can still perform this operation. However, there is an issue to resolve. Currently, I have a per-vcpu copy of the Xen ring 0 stack up to the point of entering the de-privileged mode. This allows us to restore this stack and then continue from the entry point when we have finished in de-privileged mode. There will be per-pcpu data on these per-vcpu stacks such as saved stack frame pointers for the per-pcpu stack, smp_processor_id() responses etc. Therefore, it will be necessary to lock the vcpu to the current pcpu when it enters this user mode so that it does not wake up on a different pcpu where such pointers and other data are invalid. We can do this by setting a hard affinity to the pcpu that the vcpu is executing on. See common/wait.c which does something similar to what I am doing. However, needing to have hard affinity to a pcpu leads to the following problem: - An attacker could lock multiple vcpus to a single pcpu, leading to a DoS. This could be achieved by spinning in a loop in Xen de-privileged mode (assuming a bug in this mode) and performing this operation on multiple vcpus at once. The attacker could wait until all of their vcpus were on the same pcpu and then execute this attack. This could cause the pcpu to, effectively, lock up, as it will be under heavy load, and we would be unable to move work elsewhere. A solution to the DoS would be to force migration to another pcpu, if after, say, 100 quanta have passed where the vcpu has remained in de-privileged mode. This forcing of migration would require us to forcibly complete the de-privileged operation, and then, just before returning into the guest, force a cpu change. We could not just force a migration at the schedule call point as the Xen stack needs to unwind to free up resources. We would reset this count each time we completed a de-privileged mode operation. A legitimate long-running de-privileged operation would trigger this forced migration mechanism. However, it is unlikely that such operations will be needed and the count can be adjusted appropriately to mitigate this. Any suggestions or feedback would be appreciated! I don't see why any scheduling support is needed. Currently all operations like this are run synchronously in the vmexit context of the vcpu. Any current DoS is already a real issue. The point is that this work is supposed to mitigate (or eliminate) such issues, so we would like to remove this existing real issue. IOW while it might be expected that an in-Xen DM can DoS the system, an in -Xen-ring3 DM should not be able to do so. In any reasonable situation, emulation of a device is a small state mutation and occasionally kicking off a further action to perform. (The far bigger risk from this kind of emulation is following bad pointers/etc, rather than long loops.) I think it would be entirely reasonable to have a deadline for a single execution of depriv mode, after which the domain is declared malicious and killed. I think this could make sense, it's essentially a harsher variant of Ben's suggestion to abort an attempt to process the MMIO in order to migrate to another pcpu, but it has the benefit of being easier to implement and easier to reason about in terms of interactions with other aspects of the system (i.e. it seems to remove the need to think of ways an attacker might game that other system). We already have this for host pcpus - the watchdog defaults to 5 seconds. Having a similar cutoff for depriv mode should be fine. That's a reasonable analogy. Perhaps we would want the depriv-watchdog to be some 1/N fraction of the pcpu -watchdog, for a smallish N, to avoid the risk of any slop in the timing allowing the pcpu watchdog to fire. N=3 for example (on the grounds that N=2 is probably sufficient, so N=3 must be awesome). +1 -George Thanks all! I'll do
[Xen-devel] [PATCH xen-tip] xen/PMU: pmu_modes[] can be static
Signed-off-by: Fengguang Wu fengguang...@intel.com --- sys-hypervisor.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/xen/sys-hypervisor.c b/drivers/xen/sys-hypervisor.c index 0907275..b5a7342 100644 --- a/drivers/xen/sys-hypervisor.c +++ b/drivers/xen/sys-hypervisor.c @@ -377,7 +377,7 @@ struct pmu_mode { uint32_t mode; }; -struct pmu_mode pmu_modes[] = { +static struct pmu_mode pmu_modes[] = { {off, XENPMU_MODE_OFF}, {self, XENPMU_MODE_SELF}, {hv, XENPMU_MODE_HV}, ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [xen-tip:linux-next 18/23] drivers/xen/sys-hypervisor.c:380:17: sparse: symbol 'pmu_modes' was not declared. Should it be static?
tree: git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip linux-next head: 0d26d72cab825a0227c8d8e0e42161125b3116fd commit: 3ad90fe1671a12522e3360aa4c39094360a10b38 [18/23] xen/PMU: Sysfs interface for setting Xen PMU mode reproduce: # apt-get install sparse git checkout 3ad90fe1671a12522e3360aa4c39094360a10b38 make ARCH=x86_64 allmodconfig make C=1 CF=-D__CHECK_ENDIAN__ sparse warnings: (new ones prefixed by ) drivers/xen/sys-hypervisor.c:380:17: sparse: symbol 'pmu_modes' was not declared. Should it be static? Please review and possibly fold the followup patch. --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [xen-tip:linux-next 21/23] arch/x86/xen/pmu.c:211:20: sparse: symbol 'xen_amd_read_pmc' was not declared. Should it be static?
tree: git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip linux-next head: 0d26d72cab825a0227c8d8e0e42161125b3116fd commit: 80ef65bb2362fd9eedcb4ec1d41d8a6d0b99dfbb [21/23] xen/PMU: Intercept PMU-related MSR and APIC accesses reproduce: # apt-get install sparse git checkout 80ef65bb2362fd9eedcb4ec1d41d8a6d0b99dfbb make ARCH=x86_64 allmodconfig make C=1 CF=-D__CHECK_ENDIAN__ sparse warnings: (new ones prefixed by ) arch/x86/xen/pmu.c:18:1: sparse: symbol '__pcpu_scope_xenpmu_shared' was not declared. Should it be static? arch/x86/xen/pmu.c:211:20: sparse: symbol 'xen_amd_read_pmc' was not declared. Should it be static? arch/x86/xen/pmu.c:220:20: sparse: symbol 'xen_intel_read_pmc' was not declared. Should it be static? Please review and possibly fold the followup patch. --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH xen-tip] xen/PMU: xen_amd_read_pmc() can be static
Signed-off-by: Fengguang Wu fengguang...@intel.com --- pmu.c |4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/x86/xen/pmu.c b/arch/x86/xen/pmu.c index cbd68dd..2b81722 100644 --- a/arch/x86/xen/pmu.c +++ b/arch/x86/xen/pmu.c @@ -208,7 +208,7 @@ bool pmu_msr_write(unsigned int msr, uint32_t low, uint32_t high, int *err) return false; } -unsigned long long xen_amd_read_pmc(int counter) +static unsigned long long xen_amd_read_pmc(int counter) { uint32_t msr; int err; @@ -217,7 +217,7 @@ unsigned long long xen_amd_read_pmc(int counter) return native_read_msr_safe(msr, err); } -unsigned long long xen_intel_read_pmc(int counter) +static unsigned long long xen_intel_read_pmc(int counter) { int err; uint32_t msr; ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH for-4.6] libxl: fix libxl__build_hvm error code return path
On Tue, 2015-08-11 at 09:57 +0100, Wei Liu wrote: On Fri, Aug 07, 2015 at 06:08:25PM +0200, Roger Pau Monne wrote: This is a simple fix to make sure libxl__build_hvm returns an error code in case of failure. Signed-off-by: Roger Pau Monné roger@citrix.com Cc: Ian Jackson ian.jack...@eu.citrix.com Cc: Stefano Stabellini stefano.stabell...@eu.citrix.com Cc: Ian Campbell ian.campb...@citrix.com Cc: Wei Liu wei.l...@citrix.com Acked-by: Wei Liu wei.l...@citrix.com Unfortunately I think this will result in any valid rc's any path happens to have being discarded in favour of a generic ERROR_FAIL. If we are going to band aid this for 4.6 then I think setting rc = ERROR_FAIL just after the libxl__domain_device_construct_rdm error handling might be better. Even better would be to put the rc = ERROR_FAIL into the various if (ret) blocks. I don't think that would be an unacceptably large patch (it's 3-4 sites from what I can see) and it would be closer to heading in the right direction. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH for-4.6] libxl: fix libxl__build_hvm error code return path
On Tue, Aug 11, 2015 at 01:44:45PM +0100, Ian Campbell wrote: On Tue, 2015-08-11 at 09:57 +0100, Wei Liu wrote: On Fri, Aug 07, 2015 at 06:08:25PM +0200, Roger Pau Monne wrote: This is a simple fix to make sure libxl__build_hvm returns an error code in case of failure. Signed-off-by: Roger Pau Monné roger@citrix.com Cc: Ian Jackson ian.jack...@eu.citrix.com Cc: Stefano Stabellini stefano.stabell...@eu.citrix.com Cc: Ian Campbell ian.campb...@citrix.com Cc: Wei Liu wei.l...@citrix.com Acked-by: Wei Liu wei.l...@citrix.com Unfortunately I think this will result in any valid rc's any path happens to have being discarded in favour of a generic ERROR_FAIL. Don't worry, this is the original behaviour. If we are going to band aid this for 4.6 then I think setting rc = ERROR_FAIL just after the libxl__domain_device_construct_rdm error handling might be better. Even better would be to put the rc = ERROR_FAIL into the various if (ret) blocks. I don't think that would be an unacceptably large patch (it's 3-4 sites from what I can see) and it would be closer to heading in the right direction. I can do this as well, since Roger is on vacation at the moment. Wei. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH for-4.6] tools/libxc: linux: Don't use getpagesize() when unmapping the grants
On Fri, 2015-08-07 at 22:45 +0100, Wei Liu wrote: On Fri, Aug 07, 2015 at 07:53:55PM +0100, Julien Grall wrote: The grants are based on the Xen granularity (i.e 4KB). While the function to map grants for Linux (linux_gnttab_grant_map) is using the correct size (XC_PAGE_SIZE), the unmap one (linux_gnttab_munmap) is using getpagesize(). On domain using a page granularity different than Xen (this is the case for AARCH64 guest using 64KB page), the unmap will be called with the wrong size. Signed-off-by: Julien Grall julien.gr...@citrix.com --- Cc: Ian Jackson ian.jack...@eu.citrix.com Cc: Stefano Stabellini stefano.stabell...@eu.citrix.com Cc: Ian Campbell ian.campb...@citrix.com Cc: Wei Liu wei.l...@citrix.com Acked-by: Wei Liu wei.l...@citrix.com Acked-by: Ian Campbell ian.campb...@citrix.com I think this is a bug fix and should be applied for 4.6. Agreed. WRT to backports for 4.5 I'd appreciate being given a full list of required fixes once everything is in place and working for 4.6/devbranch rather than my tracking it piecemeal. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] x86/HVM: honor p2m_ram_ro in hvm_map_guest_frame_rw()
On 31.07.15 at 18:06, boris.ostrov...@oracle.com wrote: On 07/24/2015 05:41 AM, Jan Beulich wrote: @@ -1693,14 +1703,22 @@ int nvmx_handle_vmclear(struct cpu_user_ else { /* Even if this VMCS isn't the current one, we must clear it. */ -vvmcs = hvm_map_guest_frame_rw(gpa PAGE_SHIFT, 0); +bool_t writable; + +vvmcs = hvm_map_guest_frame_rw(gpa PAGE_SHIFT, 0, writable); Since you replaced 'gpa PAGE_SHIFT' with 'paddr_to_pfn(gpa' above, perhaps it should be replaced here too. Yes indeed. Other than that, Reviewed-by: Boris Ostrovsky boris.ostrov...@oracle.com Thanks, Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v7 0/7] xen/PMU: PMU support for Xen PV(H) guests
Applied to for-linus-4.3, thanks. David ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v3] xen-apic: Enable on domU as well
On 10/08/15 14:40, Jason A. Donenfeld wrote: It turns out that domU also requires the Xen APIC driver. Otherwise we get stuck in busy loops that never exit, such as in this stack trace: Applied to for-linus-4.2 and tagged for stable, thanks. David ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [URGENT RFC] Branching and reopening -unstable
On 11 Aug 2015, at 12:13, Ian Jackson ian.jack...@eu.citrix.com wrote: Wei Liu writes ([URGENT RFC] Branching and reopening -unstable): Branching should be done at one of the RC tags. It might not be enough time for us to reach consensus before tagging RC1, so I would say lets branch at RC2 if we don't observe blocker bugs. Maintainers should be responsible for both 4.6 branch and -unstable branch. As for bug fixes, here are two options. What do other projects that are similar to us do? And how does it work for them? Any reference points? I think this conflates the three questions which should be answered: Q1: What is the status of the newly branched -unstable ? Should we avoid (some or all) big sets of changes ? (a) Don't branch (b) Branch but don't allow /any/ big changes. Seems to make branching rather pointless. (c) Branch but allow /some/ big changes. Tree is `half open', which is not ideal. (d) Branch and allow /all/ changes. Q2: If we don't avoid such changes, and a bugfix has a conflict with a change in the new unstable, who is responsible for fixing it up ? Options include: (a) the relevant maintainers (double whammy for maintainers) (b) the submitter of the bugfix (very undesirable) (c) the submitter of the big set of changes (but what do we do if they don't respond?) (d) the stable tree maintainers (already ruled out, so included in this list for completeness; out of the question IMO) Q3: What workflow should we use, for bugfixes for bugs in 4.6-pre ? There are three options, not two: (a) Bugfixes go to 4.6 first, cherry pick to unstable This keeps our focus on 4.6, which is good. (b) Bugfixes go to 4.6 first, merge 4.6 to unstable. Not tenable if we have big changes in unstable. (c) Bugfixes to to unstable, cherry pick to 4.6. Undesirable IMO because it shifts focus to unstable. Of these 2(c)/3(a) would be ideal but we don't have a good answer to the problem posted in Q2(c). I think that leaves us with 2(a): maintainers have to deal with the fallout. That makes 1(d) untenable in my view. As a maintainer, I do not want that additional workload. That leaves us with 1(a) or 1(c)/2(a)/3(a). With 1(c), who should decide on a particular series ? Well, who is taking the risk ? The maintainer, who will have to pick up the pieces. I therefore conclude, we have two options: A 1(a)/-/- Do not branch yet: defer divergence until the risk of bugfixes is much lower. B 1(c)(maintainer)/2(a)/3(a) Branch. Maintainers may choose to defer patch series based on risk of conflicts with bugfixes required for 4.6. Clear communication with submitters is required. Bugfixes for bugs in 4.6 will be accepted onto the 4.6 branch. Maintainers are required to cherry pick them onto unstable. Bugfixes will not be accepted for unstable unless it is clear that the bug was introduced in unstable since 4.6 branched. I am happy with B because it gives the relevant maintainers the option. Ian. It may be helpful, to evaluate this proposal against a couple of the outstanding patch series which were close and didn't make it into 4.6. In other words, change sets which we reasonably expect to turn up in the next 4-8 weeks or so. Regards Lars ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH for-4.6 v2 1/4] cxenstored: fix systemd socket activation
On Mon, 2015-08-10 at 09:00 +0100, Wei Liu wrote: There were two problems with original code: 1. sd_booted() was used to determined if the process was started by systemd, which was wrong. 2. Exit with error if pidfile was specified, which was too harsh. These two combined made cxenstored unable to start by hand if it ran on a system which had systemd. Fix issues with following changes: 1. Use sd_listen_fds to determine if the process is started by systemd. 2. Don't exit if pidfile is specified. Rename function and restructure code to make things clearer. A side effect of this patch is that gcc 4.8 with -Wmaybe-uninitialized in non-debug build spits out spurious warning about sock and ro_sock might be uninitialized. Since CentOS 7 ships gcc 4.8, we need to work around that by setting sock and ro_sock to NULL at the beginning of main. Signed-off-by: Wei Liu wei.l...@citrix.com Tested-by: George Dunlap george.dun...@eu.citrix.com Acked-by: Ian Campbell ian.campb...@citrix.com ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [URGENT RFC] Branching and reopening -unstable
On Tue, 2015-08-11 at 13:55 +0100, Andrew Cooper wrote: On 11/08/15 12:13, Ian Jackson wrote: Wei Liu writes ([URGENT RFC] Branching and reopening -unstable): Branching should be done at one of the RC tags. It might not be enough time for us to reach consensus before tagging RC1, so I would say lets branch at RC2 if we don't observe blocker bugs. Maintainers should be responsible for both 4.6 branch and -unstable branch. As for bug fixes, here are two options. I think this conflates the three questions which should be answered: Q1: What is the status of the newly branched -unstable ? Should we avoid (some or all) big sets of changes ? (a) Don't branch (b) Branch but don't allow /any/ big changes. Seems to make branching rather pointless. (c) Branch but allow /some/ big changes. Tree is `half open', which is not ideal. (d) Branch and allow /all/ changes. Q2: If we don't avoid such changes, and a bugfix has a conflict with a change in the new unstable, who is responsible for fixing it up ? Options include: (a) the relevant maintainers (double whammy for maintainers) (b) the submitter of the bugfix (very undesirable) (c) the submitter of the big set of changes (but what do we do if they don't respond?) (d) the stable tree maintainers (already ruled out, so included in this list for completeness; out of the question IMO) Q3: What workflow should we use, for bugfixes for bugs in 4.6-pre ? There are three options, not two: (a) Bugfixes go to 4.6 first, cherry pick to unstable This keeps our focus on 4.6, which is good. (b) Bugfixes go to 4.6 first, merge 4.6 to unstable. Not tenable if we have big changes in unstable. (c) Bugfixes to to unstable, cherry pick to 4.6. Undesirable IMO because it shifts focus to unstable. Of these 2(c)/3(a) would be ideal but we don't have a good answer to the problem posted in Q2(c). I think that leaves us with 2(a): maintainers have to deal with the fallout. That makes 1(d) untenable in my view. As a maintainer, I do not want that additional workload. That leaves us with 1(a) or 1(c)/2(a)/3(a). With 1(c), who should decide on a particular series ? Well, who is taking the risk ? The maintainer, who will have to pick up the pieces. I therefore conclude, we have two options: A 1(a)/-/- Do not branch yet: defer divergence until the risk of bugfixes is much lower. B 1(c)(maintainer)/2(a)/3(a) Branch. Maintainers may choose to defer patch series based on risk of conflicts with bugfixes required for 4.6. Clear communication with submitters is required. Bugfixes for bugs in 4.6 will be accepted onto the 4.6 branch. Maintainers are required to cherry pick them onto unstable. Bugfixes will not be accepted for unstable unless it is clear that the bug was introduced in unstable since 4.6 branched. I am happy with B because it gives the relevant maintainers the option. Very much A. By definition, 1(c) will destabilise the tree and generate artificial work for the maintainers and committers. The most important action at this point is to stabilise 4.6 for release, and peoples efforts are far better spent pursuing that, rather than continuing work on unstable. While I agree that people who have things to do for the release should prioritise the release not all contributors have a stake in the stable releases and even those that do may not have anything which they are able to help with etc (or e.g. have other pressures which prevent them dropping all development work to dedicate full time to the release). Realistically even those with 4.6-ish tasks and responsibilities aren't going to have enough such things to do to fill their time 100% between now and the release. For the sake of a couple of weeks, contributors can keep their patches for a little while longer. A full freeze cycle is more like 6-8 weeks not a couple, which is where the tension arises between the stable release and other developers. What seems to have been missed (or gotten a bit mislaid) in the current analysis is _when_ to branch, the analysis assumes at rc1 while the status quo for the last few releases has been just before release (or very late in the rc cycle at least), which are two opposite ends of the spectrum. There is of course plenty of middle ground between those two points. In your use of a couple of weeks are you making a counter proposal to branch at (say) rc3 or are you arguing to keep the development branch closed until 9 October? Depending on where in the rc cycle we branch different options may have different weights of up or down side. Ian. ~Andrew ___ Xen-devel mailing
Re: [Xen-devel] [PATCH for-4.6] libxl: fix libxl__build_hvm error code return path
On Tue, 2015-08-11 at 14:48 +0100, Wei Liu wrote: In 25652f23 (tools/libxl: detect and avoid conflicts with RDM), new code was added to use rc to store libxl function call return value, which complied to libxl coding style. That patch, however, didn't change other locations where return value was stored in ret. In the end libxl__build_hvm could return 0 when it failed. Explicitly set rc to ERROR_FAIL in all error paths to fix this. Signed-off-by: Wei Liu wei.l...@citrix.com You missed the path from libxl__domain_firmware, which incorrectly relies on rc being already initialised by the declaration (which per CODING_STYLE ought to be removed too). However perhaps you prefer to leave those other two hunks until 4.7 and this patch is at least an improvement of sorts so: Acked-by: Ian Campbell ian.campb...@citrix.com ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 1/4] tools: Update sonames for 4.6 RCs
On Tue, Aug 11, 2015 at 03:27:46PM +0100, Ian Jackson wrote: Update libxc to 4.6. Update libxl to 4.6. Update libxlu to 4.6. I did git-grep 'MAJOR.*=' and also to check I had everything git-grep 'SONAME_LDFLAG' | egrep -v 'MAJOR' |less The other, un-updated, libraries are: blktap2 (control, libvhd) 1.0 in-tree users only, no ABI changes libfsimage1.0 no ABI changes libvchan 1.0 no ABI changes libxenstat0.0 (!) no ABI changes libxenstore 3.0 no ABI changes My assertions no ABI changes are based on the output of git-diff origin/stable-4.5..staging . and similar runes, sometimes limited to .h files. Signed-off-by: Ian Jackson ian.jack...@eu.citrix.com CC: Ian Campbell ian.campb...@citrix.com CC: Wei Liu wei.l...@citrix.com Acked-by: Wei Liu wei.l...@citrix.com --- v2: Bump libxlu too. [ Reported by Wei Liu. ] [ not resending the remaining patches ] --- tools/libxc/Makefile |2 +- tools/libxl/Makefile |4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/tools/libxc/Makefile b/tools/libxc/Makefile index 8ae0ea0..a0f899b 100644 --- a/tools/libxc/Makefile +++ b/tools/libxc/Makefile @@ -1,7 +1,7 @@ XEN_ROOT = $(CURDIR)/../.. include $(XEN_ROOT)/tools/Rules.mk -MAJOR= 4.5 +MAJOR= 4.6 MINOR= 0 ifeq ($(CONFIG_LIBXC_MINIOS),y) diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile index 9036076..c5ecec1 100644 --- a/tools/libxl/Makefile +++ b/tools/libxl/Makefile @@ -5,10 +5,10 @@ XEN_ROOT = $(CURDIR)/../.. include $(XEN_ROOT)/tools/Rules.mk -MAJOR = 4.5 +MAJOR = 4.6 MINOR = 0 -XLUMAJOR = 4.3 +XLUMAJOR = 4.6 XLUMINOR = 0 CFLAGS += -Werror -Wno-format-zero-length -Wmissing-declarations \ -- 1.7.10.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v2 1/4] tools: Update sonames for 4.6 RCs
Update libxc to 4.6. Update libxl to 4.6. Update libxlu to 4.6. I did git-grep 'MAJOR.*=' and also to check I had everything git-grep 'SONAME_LDFLAG' | egrep -v 'MAJOR' |less The other, un-updated, libraries are: blktap2 (control, libvhd) 1.0 in-tree users only, no ABI changes libfsimage1.0 no ABI changes libvchan 1.0 no ABI changes libxenstat0.0 (!) no ABI changes libxenstore 3.0 no ABI changes My assertions no ABI changes are based on the output of git-diff origin/stable-4.5..staging . and similar runes, sometimes limited to .h files. Signed-off-by: Ian Jackson ian.jack...@eu.citrix.com CC: Ian Campbell ian.campb...@citrix.com CC: Wei Liu wei.l...@citrix.com --- v2: Bump libxlu too. [ Reported by Wei Liu. ] [ not resending the remaining patches ] --- tools/libxc/Makefile |2 +- tools/libxl/Makefile |4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/tools/libxc/Makefile b/tools/libxc/Makefile index 8ae0ea0..a0f899b 100644 --- a/tools/libxc/Makefile +++ b/tools/libxc/Makefile @@ -1,7 +1,7 @@ XEN_ROOT = $(CURDIR)/../.. include $(XEN_ROOT)/tools/Rules.mk -MAJOR= 4.5 +MAJOR= 4.6 MINOR= 0 ifeq ($(CONFIG_LIBXC_MINIOS),y) diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile index 9036076..c5ecec1 100644 --- a/tools/libxl/Makefile +++ b/tools/libxl/Makefile @@ -5,10 +5,10 @@ XEN_ROOT = $(CURDIR)/../.. include $(XEN_ROOT)/tools/Rules.mk -MAJOR = 4.5 +MAJOR = 4.6 MINOR = 0 -XLUMAJOR = 4.3 +XLUMAJOR = 4.6 XLUMINOR = 0 CFLAGS += -Werror -Wno-format-zero-length -Wmissing-declarations \ -- 1.7.10.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] x86/HVM: honor p2m_ram_ro in hvm_map_guest_frame_rw()
At 07:51 -0600 on 11 Aug (1439279513), Jan Beulich wrote: On 27.07.15 at 13:09, t...@xen.org wrote: At 13:02 +0100 on 24 Jul (1437742964), Andrew Cooper wrote: On 24/07/15 10:41, Jan Beulich wrote: Beyond that log-dirty handling in _hvm_map_guest_frame() looks bogus too: What if a XEN_DOMCTL_SHADOW_OP_* gets issued and acted upon between the setting of the dirty flag and the actual write happening? I.e. shouldn't the flag instead be set in hvm_unmap_guest_frame()? It does indeed. (Ideally the dirty bit should probably be held high for the duration that a mapping exists, but that is absolutely infeasible to do). IMO that would not be very useful -- a well-behaved toolstack will have to make sure that relevant mappings are torn down before stop-and-copy. Forcing the dirty bit high in the meantime just makes every intermediate pass send a wasted copy of the page, without actually closing the race window if the tools are buggy. Making sure such mappings got torn down in time doesn't help when the most recent write happened _after_ the most recent clearing of the dirty flag in a pass prior to stop-and-copy. This is why e.g. __gnttab_unmap_common sets the dirty bit again as it unmaps. Cheers, Tim. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] Commit moratorium for 4.6rc1
Please avoid committing anything just now. We need the push gate clear for a patch to update the tools library sonames, which is needed for rc1. Thanks, Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 2/4] Update version to Xen 4.6 RC
On Tue, Aug 11, 2015 at 03:09:18PM +0100, Ian Jackson wrote: * Change README to say `Xen 4.6-rc' * Change XEN_EXTRAVERSION so that we are `4.6.0-rc' Note that the RC number (eg, 1 for rc1) is not in the version string, so that we do not need to update this again when we cut the next RC. Signed-off-by: Ian Jackson ian.jack...@eu.citrix.com CC: Jan Beulich jbeul...@suse.com CC: Konrad Rzeszutek Wilk konrad.w...@oracle.com CC: Ian Campbell ian.campb...@citrix.com CC: Wei Liu wei.l...@citrix.com Acked-by: Wei Liu wei.l...@citrix.com --- README | 12 ++-- xen/Makefile |2 +- 2 files changed, 7 insertions(+), 7 deletions(-) diff --git a/README b/README index 0e456b8..522f1a2 100644 --- a/README +++ b/README @@ -1,10 +1,10 @@ # -__ ___ ___ __ _ -\ \/ /___ _ __ | || | / /__ _ _ __ ___| |_ __ _| |__ | | ___ - \ // _ \ '_ \ | || |_| '_ \ _| | | | '_ \/ __| __/ _` | '_ \| |/ _ \ - / \ __/ | | | |__ _| (_) |_| |_| | | | \__ \ || (_| | |_) | | __/ -/_/\_\___|_| |_||_|(_)___/ \__,_|_| |_|___/\__\__,_|_.__/|_|\___| - +__ ___ ___ +\ \/ /___ _ __ | || | / /__ __ ___ + \ // _ \ '_ \ | || |_| '_ \ _| '__/ __| + / \ __/ | | | |__ _| (_) |_| | | (__ +/_/\_\___|_| |_||_|(_)___/ |_| \___| + # http://www.xen.org/ diff --git a/xen/Makefile b/xen/Makefile index 6305880..6088c9d 100644 --- a/xen/Makefile +++ b/xen/Makefile @@ -2,7 +2,7 @@ # All other places this is stored (eg. compile.h) should be autogenerated. export XEN_VERSION = 4 export XEN_SUBVERSION= 6 -export XEN_EXTRAVERSION ?= -unstable$(XEN_VENDORVERSION) +export XEN_EXTRAVERSION ?= .0-rc$(XEN_VENDORVERSION) export XEN_FULLVERSION = $(XEN_VERSION).$(XEN_SUBVERSION)$(XEN_EXTRAVERSION) -include xen-version -- 1.7.10.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 4/4] Update QEMU_UPSTREAM_REVISION for 4.6 RC1
On Tue, Aug 11, 2015 at 03:09:20PM +0100, Ian Jackson wrote: When we make RC1 we arrange to get a specific version of qemu-xen-upstream. Signed-off-by: Ian Jackson ian.jack...@eu.citrix.com CC: Wei Liu wei.l...@citrix.com CC: Stefano Stabellini stefano.stabell...@eu.citrix.com Acked-by: Wei Liu wei.l...@citrix.com --- Config.mk |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Config.mk b/Config.mk index d8b650e..75b49a3 100644 --- a/Config.mk +++ b/Config.mk @@ -254,7 +254,7 @@ SEABIOS_UPSTREAM_URL ?= git://xenbits.xen.org/seabios.git MINIOS_UPSTREAM_URL ?= git://xenbits.xen.org/mini-os.git endif OVMF_UPSTREAM_REVISION ?= cb9a7ebabcd6b8a49dc0854b2f9592d732b5afbd -QEMU_UPSTREAM_REVISION ?= master +QEMU_UPSTREAM_REVISION ?= qemu-xen-4.6.0-rc1 MINIOS_UPSTREAM_REVISION ?= b36bcb370d611ad7f41e8c21d061e6291e088c58 # Fri Jun 26 11:58:40 2015 +0100 # Correct printf formatting for tpm_tis message. -- 1.7.10.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 1/4] tools: Update sonames for 4.6 RCs
On Tue, Aug 11, 2015 at 03:09:17PM +0100, Ian Jackson wrote: Update libxc to 4.6. Update libxl to 4.6. I did git-grep 'MAJOR.*=' and also to check I had everything git-grep 'SONAME_LDFLAG' | egrep -v 'MAJOR' |less The other, un-updated, libraries are: blktap2 (control, libvhd) 1.0 in-tree users only, no ABI changes libfsimage1.0 no ABI changes libvchan 1.0 no ABI changes libxenstat0.0 (!) no ABI changes libxenstore 3.0 no ABI changes My assertions no ABI changes are based on the output of git-diff origin/stable-4.5..staging . and similar runes, sometimes limited to .h files. Signed-off-by: Ian Jackson ian.jack...@eu.citrix.com CC: Ian Campbell ian.campb...@citrix.com CC: Wei Liu wei.l...@citrix.com --- tools/libxc/Makefile |2 +- tools/libxl/Makefile |2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/libxc/Makefile b/tools/libxc/Makefile index 8ae0ea0..a0f899b 100644 --- a/tools/libxc/Makefile +++ b/tools/libxc/Makefile @@ -1,7 +1,7 @@ XEN_ROOT = $(CURDIR)/../.. include $(XEN_ROOT)/tools/Rules.mk -MAJOR= 4.5 +MAJOR= 4.6 MINOR= 0 ifeq ($(CONFIG_LIBXC_MINIOS),y) diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile index 9036076..a5ffa01 100644 --- a/tools/libxl/Makefile +++ b/tools/libxl/Makefile @@ -5,7 +5,7 @@ XEN_ROOT = $(CURDIR)/../.. include $(XEN_ROOT)/tools/Rules.mk -MAJOR = 4.5 +MAJOR = 4.6 MINOR = 0 XLUMAJOR = 4.3 What about libxlutil? I'm pretty sure its ABI has changed. Wei. -- 1.7.10.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 3/4] Update QEMU_TRADITIONAL_REVISION for 4.6 RC1
On Tue, Aug 11, 2015 at 03:09:19PM +0100, Ian Jackson wrote: (We will not necessarily bump this tag number for future RCs, unless something has changed in qemu-xen-traditional.) Signed-off-by: Ian Jackson ian.jack...@eu.citrix.com CC: Wei Liu wei.l...@citrix.com Acked-by: Wei Liu wei.l...@citrix.com --- Config.mk |3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/Config.mk b/Config.mk index e9a7097..d8b650e 100644 --- a/Config.mk +++ b/Config.mk @@ -266,7 +266,8 @@ SEABIOS_UPSTREAM_REVISION ?= rel-1.8.2 ETHERBOOT_NICS ?= rtl8139 8086100e -QEMU_TRADITIONAL_REVISION ?= 7f057440b31da38196e3398fd1b618fc36ad97d6 +QEMU_TRADITIONAL_REVISION ?= xen-4.6.0-rc1 +# 7f057440b31da38196e3398fd1b618fc36ad97d6 # Wed Jun 3 14:41:27 2015 +0200 # ide: Clear DRQ after handling all expected accesses -- 1.7.10.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Design doc of adding ACPI support for arm64 on Xen - version 2
On Fri, 2015-08-07 at 10:11 +0800, Shannon Zhao wrote: This document is going to explain the design details of Xen booting with ACPI on ARM. Maybe parts of it may not be appropriate. Any comments are welcome. Some small subsets of this seem like they might overlap with what will be required for PVH on x86 (a new x86 guest mode not dissimilar to the sole ARM guest mode). If so then it would be preferable IMHO if PVH x86 could use the same interfaces. I've trimmed the quotes to just those bits and CCd some of the PVH people (Boris and Roger[0]) in case they have any thoughts. Actually, having done the trimming there is only one such bit: [...] 4. Map MMIO regions --- Register a bus_notifier for platform and amba bus in Linux. Add a new XENMAPSPACE XENMAPSPACE_dev_mmio. Within the register, check if the device is newly added, then call hypercall XENMEM_add_to_physmap to map the mmio regions. Ian. [0] Roger is away for a week or so, but I'm expect feedback to be of the we could use one extra field type rather than this needs to be done some totally different way for x86/PVH (in which case we wouldn't want to share the interface anyway I suppose) so need to block on awaiting that feedback. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Design doc of adding ACPI support for arm64 on Xen - version 2
On 11/08/15 15:12, Ian Campbell wrote: On Fri, 2015-08-07 at 10:11 +0800, Shannon Zhao wrote: [...] 3. Dom0 gets grant table and event channel irq information --- As said above, we assign the hypervisor_id be XenVMM to tell Dom0 that it runs on Xen hypervisor. For grant table, add two new HVM_PARAMs: HVM_PARAM_GNTTAB_START_ADDRESS and HVM_PARAM_GNTTAB_SIZE. The reason we expose this range is essentially to allow OS authors to take a short cut by telling them about an IPA range which is unused, so it is available for remapping the grant table into. On x86 there is a BAR on the Xen platform PCI device which serves a similar purpose. IIRC somebody (perhaps David V, CCd) had proposed at some point to make it so that Linux was able to pick such an IPA itself by examining the memory map or by some other scheme. PVH in Linux uses ballooned pages which are vmap()'d into a virtually contiguous region. See xlated_setup_gnttab_pages(). David ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Design doc of adding ACPI support for arm64 on Xen - version 2
On Tue, 2015-08-11 at 15:51 +0100, David Vrabel wrote: On 11/08/15 15:12, Ian Campbell wrote: On Fri, 2015-08-07 at 10:11 +0800, Shannon Zhao wrote: [...] 3. Dom0 gets grant table and event channel irq information --- As said above, we assign the hypervisor_id be XenVMM to tell Dom0 that it runs on Xen hypervisor. For grant table, add two new HVM_PARAMs: HVM_PARAM_GNTTAB_START_ADDRESS and HVM_PARAM_GNTTAB_SIZE. The reason we expose this range is essentially to allow OS authors to take a short cut by telling them about an IPA range which is unused, so it is available for remapping the grant table into. On x86 there is a BAR on the Xen platform PCI device which serves a similar purpose. IIRC somebody (perhaps David V, CCd) had proposed at some point to make it so that Linux was able to pick such an IPA itself by examining the memory map or by some other scheme. PVH in Linux uses ballooned pages which are vmap()'d into a virtually contiguous region. See xlated_setup_gnttab_pages(). So somewhat more concrete than a proposal then ;-) I don't see anything there which would be a problem on ARM, so we should probably go that route there too (at least for ACPI, if not globally for all ARM guests). Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Design doc of adding ACPI support for arm64 on Xen - version 2
On 11/08/15 15:59, Ian Campbell wrote: On Tue, 2015-08-11 at 15:51 +0100, David Vrabel wrote: On 11/08/15 15:12, Ian Campbell wrote: On Fri, 2015-08-07 at 10:11 +0800, Shannon Zhao wrote: [...] 3. Dom0 gets grant table and event channel irq information --- As said above, we assign the hypervisor_id be XenVMM to tell Dom0 that it runs on Xen hypervisor. For grant table, add two new HVM_PARAMs: HVM_PARAM_GNTTAB_START_ADDRESS and HVM_PARAM_GNTTAB_SIZE. The reason we expose this range is essentially to allow OS authors to take a short cut by telling them about an IPA range which is unused, so it is available for remapping the grant table into. On x86 there is a BAR on the Xen platform PCI device which serves a similar purpose. IIRC somebody (perhaps David V, CCd) had proposed at some point to make it so that Linux was able to pick such an IPA itself by examining the memory map or by some other scheme. PVH in Linux uses ballooned pages which are vmap()'d into a virtually contiguous region. See xlated_setup_gnttab_pages(). So somewhat more concrete than a proposal then ;-) I don't see anything there which would be a problem on ARM, so we should probably go that route there too (at least for ACPI, if not globally for all ARM guests). If someone does this please move xlated_setup_gnttab_pages() into drivers/xen/xlate_mmu.c, and not copy it into an arm specific file. David ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [RFC PATCH v3.1 2/2] xsplice: Add hook for build_id
On 27.07.15 at 21:20, kon...@kernel.org wrote: --- a/xen/include/xen/compile.h.in +++ b/xen/include/xen/compile.h.in @@ -10,4 +10,5 @@ #define XEN_EXTRAVERSION @@extraversion@@ #define XEN_CHANGESET@@changeset@@ +#define XEN_BUILD_ID@@changeset@@ How can the changset be a valid / sufficient build ID (even if maybe this is intended to only be a default / fallback)? Wasn't this meant to specifically account for rebuilds (with, say, a compiler slightly updated from the original one, and hence possibly producing slightly different code)? Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH for-4.6 URGENT 0/4] Prepare for RC1
This is the result of me going through the relevant (pre-tagging) part of the release checklist. The qemu tags referred to have just been created. Thanks, Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [RFC PATCH v3.1 1/2] xsplice: rfc.v3.1
On 31.07.15 at 17:46, konrad.w...@oracle.com wrote: On Thu, Jul 30, 2015 at 09:47:40AM -0700, Johannes Erdfelt wrote: On Mon, Jul 27, 2015, Konrad Rzeszutek Wilk kon...@kernel.org wrote: +struct xsplice_reloc_howto { +uint32_thowto; /* XSPLICE_HOWTO_* */ +uint32_tflag; /* XSPLICE_HOWTO_FLAG_* */ +uint32_tsize; /* Size, in bytes, of the item to be relocated. */ +uint32_tr_shift; /* The value the final relocation is shifted right by; used to drop unwanted data from the relocation. */ +uint64_tmask; /* Bitmask for which parts of the instruction or data are replaced with the relocated value. */ +uint8_t pad[8]; /* Must be zero. */ +}; I'm curious how r_shift and mask are used. I'm familiar with x86 and x86_64 and I'm not sure how these fit in. Is this to support other architectures? It is to patch up data. We can specify the exact mask for an unsigned int - so we only patch specific bits. Ditto if we want to remove certain values. Still I don't see a practical use: What relocated item would (on x86) be stored starting at other than bit 0 of a byte/word? Also, wouldn't a shift count be redundant with the mask value anyway? Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel