Re: [Xen-devel] [PATCH v2 21/23] x86/boot: implement early command line parser in C
On 20.07.15 at 16:29, daniel.ki...@oracle.com wrote: Current early command line parser implementation in assembler is very difficult to change to relocatable stuff using segment registers. This requires a lot of changes in very weird and fragile code. So, reimplement this functionality in C. This way code will be relocatable out of the box and much easier to maintain. All appreciated and nice, but the goal of making the code relocatable by playing with segment registers sounds fragile: This breaks assumptions the compiler may validly make. xen/arch/x86/boot/cmdline.S| 367 - xen/arch/x86/boot/cmdline.c| 396 A fundamental expectation I would have had is for the C file to be noticeably smaller than the assembly file. --- /dev/null +++ b/xen/arch/x86/boot/cmdline.c [...] +#define VESA_WIDTH 0 +#define VESA_HEIGHT 1 +#define VESA_DEPTH 2 + +#define VESA_SIZE3 These should go away in favor of using individual (sub)structure fields. +#define __cdecl __attribute__((__cdecl__)) ??? +#define __packed __attribute__((__packed__)) +#define __text __attribute__((__section__(.text))) +#define __used __attribute__((__used__)) Likely better to include compiler.h instead. +#define max(x,y) ({ \ +const typeof(x) _x = (x); \ +const typeof(y) _y = (y); \ +(void) (_x == _y);\ +_x _y ? _x : _y; }) I also wonder whether -imacros .../xen/kernel.h wouldn't be a better approach here. Please really think hard on how to avoid duplications like these. +#define strlen_static(s) (sizeof(s) - 1) What is this good for? A decent compiler should be able to deal with strlen(...). Plus your macro is longer that what it tries to abbreviate. +static const char empty_chars[] __text = \n\r\t; What is empty about them? DYM blank or (white) space or separator or delimiter? I also wonder whether \n and \r are actually usefully here, as they should (if at all) only end the line. +/** + * strlen - Find the length of a string + * @s: The string to be sized + */ +static size_t strlen(const char *s) Comments are certainly nice, but in this special case I'd rather suggest against bloating the code by commenting standard library functions. +static int strtoi(const char *s, const char *stop, const char **next) +{ +int base = 10, i, ores = 0, res = 0; + +if ( *s == '0' ) + base = (tolower(*++s) == 'x') ? (++s, 16) : 8; + +for ( ; *s != '\0'; ++s ) +{ +for ( i = 0; stop stop[i] != '\0'; ++i ) +if ( *s == stop[i] ) +goto out; + +if ( *s '0' || (*s '7' base == 8) ) +{ +res = -1; +goto out; +} + +if ( *s '9' (base != 16 || tolower(*s) 'a' || tolower(*s) 'f') ) +{ +res = -1; +goto out; +} + +res *= base; +res += (tolower(*s) = 'a') ? (tolower(*s) - 'a' + 10) : (*s - '0'); + +if ( ores res ) +{ +res = -1; +goto out; +} + +ores = res; +} + +out: C labels intended by at least one space please. +static const char *find_opt(const char *cmdline, const char *opt, int arg) +{ +size_t lc, lo; +static const char mm[] __text = --; I'd be surprised if there weren't compiler/assembler versions complaining about a section type conflict here. I can see why you want everything in one section, but I'd rather suggest achieving this at the linking step (which I would suppose to already be taking care of this). +static u8 skip_realmode(const char *cmdline) +{ +static const char nrm[] __text = no-real-mode; +static const char tboot[] __text = tboot=; + +if ( find_opt(cmdline, nrm, 0) || find_opt(cmdline, tboot, 1) ) +return 1; + +return 0; return find_opt(cmdline, nrm, 0) || find_opt(cmdline, tboot, 1); +static u8 edd_parse(const char *cmdline) +{ +const char *c; +size_t la; +static const char edd[] __text = edd=; +static const char edd_off[] __text = off; +static const char edd_skipmbr[] __text = skipmbr; + +c = find_opt(cmdline, edd, 1); + +if ( !c ) +return 0; + +c += strlen_static(edd); +la = strcspn(c, empty_chars); + +if ( !strncmp(c, edd_off, max(la, strlen_static(edd_off))) ) +return 2; +else if ( !strncmp(c, edd_skipmbr, max(la, strlen_static(edd_skipmbr))) ) Pointless else. +return 1; + +return 0; And the last two returns can be folded again anyway. +static void __cdecl __used cmdline_parse_early(const char *cmdline, early_boot_opts_t *ebo) I don't see the point of the __cdecl, and (as said before) dislike the static __used pair. --- a/xen/arch/x86/boot/head.S +++ b/xen/arch/x86/boot/head.S @@
Re: [Xen-devel] [PATCH v2 4/8] tmem: Remove xc_tmem_control mystical arg3
On 27/08/15 12:01, Konrad Rzeszutek Wilk wrote: It mentions it but it is never used. The hypercall interface knows nothing of this sort of thing either. Lets just remove it. Signed-off-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com It would be nice if you could take the opportunity of changing every caller to fix the style issues in affected code. There are a huge number of missing spaces in the parameter lists of calls to xc_tmem_control() ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2] Tmem bug-fixes and cleanups.
On 27/08/15 12:01, Konrad Rzeszutek Wilk wrote: Hey! At the Xenhackathon we spoke that the tmem code needs a bit of cleanups and simplification. One of the things that Andrew mentioned was that the TMEM_CONTROL should really be part of the sysctl hypercall. As I ventured this path I realized there were some other issues that need to be taken care of (like shared pools blowing up). This patchset has been tested with 32/64 guests, with a 64 hypervisor and 32 bit toolstack (and also with 64bit toolstack) with success. For fun I've also created an Linux module: http://xenbits.xen.org/gitweb/?p=xentesttools/bootstrap.git;a=blob;f=root_image/drivers/tmem_test/tmem_test.c that I will expand to cover in the future more interesting hypercall uses. Going forward the next step will be to move the 'tmem_control' function to its own file to simplify the code. The patches are also in my git tree: Patches 1-4, 6-8 are definitely a net improvement, so Reviewed-by: Andrew Cooper andrew.coop...@citrix.com, with the proviso that I am not knowlegable about tmem internals. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 5/8] tmem: Move TMEM_CONTROL subop of tmem hypercall to sysctl.
On 27/08/15 12:02, Konrad Rzeszutek Wilk wrote: --- a/tools/python/xen/lowlevel/xc/xc.c +++ b/tools/python/xen/lowlevel/xc/xc.c @@ -1808,25 +1808,25 @@ static PyObject *pyxc_tmem_control(XcObject *self, pool_id, subop, cli_id, arg1, arg2, buf) ) return NULL; -if ( (subop == TMEMC_LIST) (arg1 32768) ) +if ( (subop == XEN_SYSCTL_TMEM_OP_LIST) (arg1 32768) ) arg1 = 32768; if ( (rc = xc_tmem_control(self-xc_handle, pool_id, subop, cli_id, arg1, arg2, buffer)) 0 ) return Py_BuildValue(i, rc); switch (subop) { -case TMEMC_LIST: +case XEN_SYSCTL_TMEM_OP_LIST: return Py_BuildValue(s, buffer); -case TMEMC_FLUSH: +case XEN_SYSCTL_TMEM_OP_FLUSH: return Py_BuildValue(i, rc); -case TMEMC_QUERY_FREEABLE_MB: +case XEN_SYSCTL_TMEM_OP_QUERY_FREEABLE_MB: return Py_BuildValue(i, rc); -case TMEMC_THAW: -case TMEMC_FREEZE: -case TMEMC_DESTROY: -case TMEMC_SET_WEIGHT: -case TMEMC_SET_CAP: -case TMEMC_SET_COMPRESS: +case XEN_SYSCTL_TMEM_OP_THAW: +case XEN_SYSCTL_TMEM_OP_FREEZE: +case XEN_SYSCTL_TMEM_OP_DESTROY: +case XEN_SYSCTL_TMEM_OP_SET_WEIGHT: +case XEN_SYSCTL_TMEM_OP_SET_CAP: +case XEN_SYSCTL_TMEM_OP_SET_COMPRESS: Any case statements falling through into default like this can simply be dropped. @@ -2555,77 +2556,72 @@ static int tmemc_restore_flush_page(int cli_id, uint32_t pool_id, struct oid *oi return do_tmem_flush_page(pool,oidp,index); } -static int do_tmem_control(struct tmem_op *op) +int tmem_control(struct xen_sysctl_tmem_op *op) { int ret; uint32_t pool_id = op-pool_id; -uint32_t subop = op-u.ctrl.subop; -struct oid *oidp = (struct oid *)(op-u.ctrl.oid[0]); +uint32_t cmd = op-cmd; +struct oid *oidp = (struct oid *)(op-oid[0]); Please put a struct xen_sysctl_tmem_oid definition in sysctl.h and avoid this typesystem abuse. --- a/xen/include/public/sysctl.h +++ b/xen/include/public/sysctl.h @@ -710,6 +710,48 @@ struct xen_sysctl_psr_cat_op { typedef struct xen_sysctl_psr_cat_op xen_sysctl_psr_cat_op_t; DEFINE_XEN_GUEST_HANDLE(xen_sysctl_psr_cat_op_t); +#define XEN_SYSCTL_TMEM_OP_ALL_CLIENTS 0xU + +#define XEN_SYSCTL_TMEM_OP_THAW 0 +#define XEN_SYSCTL_TMEM_OP_FREEZE 1 +#define XEN_SYSCTL_TMEM_OP_FLUSH 2 +#define XEN_SYSCTL_TMEM_OP_DESTROY3 +#define XEN_SYSCTL_TMEM_OP_LIST 4 +#define XEN_SYSCTL_TMEM_OP_SET_WEIGHT 5 +#define XEN_SYSCTL_TMEM_OP_SET_CAP6 +#define XEN_SYSCTL_TMEM_OP_SET_COMPRESS 7 +#define XEN_SYSCTL_TMEM_OP_QUERY_FREEABLE_MB 8 +#define XEN_SYSCTL_TMEM_OP_SAVE_BEGIN 10 +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_VERSION 11 +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_MAXPOOLS 12 +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_CLIENT_WEIGHT 13 +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_CLIENT_CAP14 +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_CLIENT_FLAGS 15 +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_POOL_FLAGS16 +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_POOL_NPAGES 17 +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_POOL_UUID 18 +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_NEXT_PAGE 19 +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_NEXT_INV 20 +#define XEN_SYSCTL_TMEM_OP_SAVE_END 21 +#define XEN_SYSCTL_TMEM_OP_RESTORE_BEGIN 30 +#define XEN_SYSCTL_TMEM_OP_RESTORE_PUT_PAGE 32 +#define XEN_SYSCTL_TMEM_OP_RESTORE_FLUSH_PAGE 33 + +struct xen_sysctl_tmem_op { +uint32_t cmd; /* IN: XEN_SYSCTL_TMEM_OP_* . */ +int32_t pool_id;/* IN: 0 by default unless _SAVE_*, RESTORE_* .*/ +uint32_t cli_id;/* IN: client id, 0 for XEN_SYSCTL_TMEM_QUERY_FREEABLE_MB + for all others can be the domain id or + XEN_SYSCTL_TMEM_OP_ALL_CLIENTS for all. */ +uint32_t arg1; /* IN: If not applicable to command use 0. */ +uint32_t arg2; /* IN: If not applicable to command use 0. */ Please can this interface be fixed as part of the move, even if it is in subsequent patches for clarity. Part of the issue with the XSA-17 security audit was that these args are completely opaque. +uint8_t pad[4];/* Padding so structure is the same under 32 and 64. */ probably better to use a uint32_t here. +uint64_t oid[3];/* IN: If not applicable to command use 0. */ +XEN_GUEST_HANDLE_64(char) buf; /* IN/OUT: Buffer to save and restore ops. */ +}; +typedef struct xen_sysctl_tmem_op xen_sysctl_tmem_op_t; +DEFINE_XEN_GUEST_HANDLE(xen_sysctl_tmem_op_t); + struct xen_sysctl { uint32_t cmd; #define XEN_SYSCTL_readconsole1 @@ -734,6 +776,7 @@ struct xen_sysctl { #define
[Xen-devel] [linux-linus test] 60877: regressions - FAIL
flight 60877 linux-linus real [real] http://logs.test-lab.xenproject.org/osstest/logs/60877/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-i386-xl 14 guest-saverestore fail REGR. vs. 59254 test-amd64-i386-xl-xsm 14 guest-saverestore fail REGR. vs. 59254 test-amd64-i386-pair 21 guest-migrate/src_host/dst_host fail REGR. vs. 59254 Regressions which are regarded as allowable (not blocking): test-amd64-amd64-rumpuserxen-amd64 15 rumpuserxen-demo-xenstorels/xenstorels.repeat fail REGR. vs. 59254 test-armhf-armhf-xl-rtds 11 guest-start fail REGR. vs. 59254 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 16 guest-start/debianhvm.repeat fail baseline untested test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stop fail like 59254 test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop fail like 59254 test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm 9 debian-hvm-install fail like 59423-bisect test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsm 13 guest-localmigrate fail like 60772-bisect Tests which did not succeed, but are not blocking: test-amd64-amd64-xl-pvh-amd 11 guest-start fail never pass test-armhf-armhf-libvirt-vhd 9 debian-di-installfail never pass test-amd64-i386-libvirt-xsm 12 migrate-support-checkfail never pass test-amd64-i386-libvirt-xsm 14 guest-saverestorefail never pass test-armhf-armhf-xl-raw 9 debian-di-installfail never pass test-armhf-armhf-xl-qcow2 9 debian-di-installfail never pass test-armhf-armhf-xl-vhd 9 debian-di-installfail never pass test-armhf-armhf-libvirt-raw 9 debian-di-installfail never pass test-amd64-amd64-xl-pvh-intel 14 guest-saverestorefail never pass test-amd64-i386-libvirt 14 guest-saverestorefail never pass test-amd64-i386-libvirt 12 migrate-support-checkfail never pass test-armhf-armhf-libvirt-qcow2 9 debian-di-installfail never pass test-armhf-armhf-libvirt-xsm 12 migrate-support-checkfail never pass test-armhf-armhf-libvirt-xsm 14 guest-saverestorefail never pass test-amd64-amd64-libvirt-pair 21 guest-migrate/src_host/dst_host fail never pass test-amd64-amd64-libvirt-xsm 12 migrate-support-checkfail never pass test-amd64-i386-libvirt-pair 21 guest-migrate/src_host/dst_host fail never pass test-armhf-armhf-xl-arndale 12 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 13 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-qcow2 11 migrate-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check fail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check fail never pass test-amd64-amd64-libvirt-raw 11 migrate-support-checkfail never pass test-armhf-armhf-xl-xsm 13 saverestore-support-checkfail never pass test-armhf-armhf-xl-xsm 12 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 12 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 13 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-vhd 11 migrate-support-checkfail never pass test-armhf-armhf-xl 12 migrate-support-checkfail never pass test-armhf-armhf-xl 13 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit2 13 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit2 12 migrate-support-checkfail never pass test-amd64-amd64-libvirt 12 migrate-support-checkfail never pass test-amd64-amd64-xl-qemut-win7-amd64 17 guest-stop fail never pass test-amd64-i386-libvirt-qcow2 11 migrate-support-checkfail never pass test-amd64-i386-libvirt-vhd 11 migrate-support-checkfail never pass test-amd64-i386-libvirt-raw 11 migrate-support-checkfail never pass test-amd64-i386-xl-qemut-win7-amd64 17 guest-stop fail never pass test-armhf-armhf-libvirt 14 guest-saverestorefail never pass test-armhf-armhf-libvirt 12 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 13 saverestore-support-checkfail never pass test-armhf-armhf-xl-multivcpu 12 migrate-support-checkfail never pass version targeted for testing: linuxc13dcf9f2d6f5f06ef1bf79ec456df614c5e058b baseline version: linux45820c294fe1b1a9df495d57f40585ef2d069a39 Last test of basis59254 2015-07-09 04:20:48 Z 49 days Failing since 59348 2015-07-10 04:24:05 Z 48 days 32 attempts Testing same since60877 2015-08-25 15:46:48 Z1 days1 attempts
Re: [Xen-devel] [PATCH v2] Tmem bug-fixes and cleanups.
On Thu, Aug 27, 2015 at 07:01:55AM -0400, Konrad Rzeszutek Wilk wrote: Hey! At the Xenhackathon we spoke that the tmem code needs a bit of cleanups and simplification. One of the things that Andrew mentioned was that the TMEM_CONTROL should really be part of the sysctl hypercall. As I ventured this path I realized there were some other issues that need to be taken care of (like shared pools blowing up). This patchset has been tested with 32/64 guests, with a 64 hypervisor and 32 bit toolstack (and also with 64bit toolstack) with success. For fun I've also created an Linux module: http://xenbits.xen.org/gitweb/?p=xentesttools/bootstrap.git;a=blob;f=root_image/drivers/tmem_test/tmem_test.c that I will expand to cover in the future more interesting hypercall uses. Going forward the next step will be to move the 'tmem_control' function to its own file to simplify the code. The patches are also in my git tree: git://xenbits.xen.org/people/konradwilk/xen.git for-4.6/tmem.cleanups.v2 tools/libxc/include/xenctrl.h | 2 +- tools/libxc/xc_tmem.c | 111 ++-- tools/libxl/libxl.c| 22 ++-- tools/python/xen/lowlevel/xc/xc.c | 27 +++-- tools/xenstat/libxenstat/src/xenstat.c | 6 +- xen/common/sysctl.c| 7 +- xen/common/tmem.c | 183 + xen/include/public/sysctl.h| 44 xen/include/public/tmem.h | 39 +-- xen/include/xen/tmem.h | 3 + xen/include/xen/tmem_xen.h | 1 - xen/include/xsm/dummy.h| 6 -- xen/include/xsm/xsm.h | 6 -- xen/xsm/dummy.c| 1 - xen/xsm/flask/hooks.c | 9 +- xen/xsm/flask/policy/access_vectors| 2 +- 16 files changed, 237 insertions(+), 232 deletions(-) Konrad Rzeszutek Wilk (8): tmem: Don't crash/hang/leak hypervisor when using shared pools within an guest. tmem: Add ASSERT in obj_rb_insert for pool-rwlock lock. tmem: remove in xc_tmem_control_oid duplicate set_xen_guest_handle call tmem: Remove xc_tmem_control mystical arg3 tmem: Move TMEM_CONTROL subop of tmem hypercall to sysctl. tmem: Remove the old tmem control XSM checks as it is part of sysctl hypercall. tmem: Remove extra spaces at end and some hard tabbing. tmem: Spelling mistakes. This series is pretty isolated to TMEM. I'm in principle happy to let this series go in: Release-acked-by: Wei Liu wei.l...@citrix.com The only one that needs more careful thought is #5, I'm going to leave that to HV maintainers. Wei. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] xen 4.6 regression: xl create fails if domU have custom vifname
On Thu, Aug 27, 2015 at 04:41:46PM +0200, Fabio Fantoni wrote: Today trying xen 4.6.0-rc2 with all things needed for future production server I found a regression: xl create fails if domU have custom vifname. xl create test.cfg Parsing config from test.cfg libxl: error: libxl_exec.c:118:libxl_report_child_exitstatus: /etc/xen/scripts/vif-bridge add [14581] exited with error status 1 libxl: error: libxl_device.c:1085:device_hotplug_child_death_cb: script: ip link set vif14.0-emu name frc-0-0-emu failed This is the culprit. Did you do a clean install? You need to remove udev files in /etc. I should have mentioned this in testing instructions. Wei. libxl: error: libxl_create.c:1380:domcreate_attach_vtpms: unable to add nic devices libxl: error: libxl_exec.c:118:libxl_report_child_exitstatus: /etc/xen/scripts/block remove [14615] exited with error status 1 libxl: error: libxl_device.c:1085:device_hotplug_child_death_cb: script: /etc/xen/scripts/block failed; error detected. libxl: error: libxl.c:1580:libxl__destroy_domid: non-existant domain 14 libxl: error: libxl.c:1538:domain_destroy_callback: unable to destroy guest with domid 14 libxl: error: libxl.c:1465:domain_destroy_cb: destruction of domain 14 failed From domU xl cfg: vif=['bridge=xenbr0,mac=00:16:3e:dd:e3:f1,vifname=frc-0-0'] Same domU removing only vifname is working. On production server with xen 4.4.2 or 4.5.1 domUs with vifname are working. I did a fast search without found the exact bug/regression cause. If you need more informations/tests tell me and I'll post them. Thanks for any reply and sorry for my bad english. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH] IOMMU: skip domains without page tables when dumping
Reported-by: Roger Pau Monné roger@citrix.com Signed-off-by: Jan Beulich jbeul...@suse.com --- a/xen/drivers/passthrough/iommu.c +++ b/xen/drivers/passthrough/iommu.c @@ -411,7 +411,7 @@ static void iommu_dump_p2m_table(unsigne ops = iommu_get_ops(); for_each_domain(d) { -if ( is_hardware_domain(d) ) +if ( is_hardware_domain(d) || need_iommu(d) = 0 ) continue; if ( iommu_use_hap_pt(d) ) IOMMU: skip domains without page tables when dumping Reported-by: Roger Pau Monné roger@citrix.com Signed-off-by: Jan Beulich jbeul...@suse.com --- a/xen/drivers/passthrough/iommu.c +++ b/xen/drivers/passthrough/iommu.c @@ -411,7 +411,7 @@ static void iommu_dump_p2m_table(unsigne ops = iommu_get_ops(); for_each_domain(d) { -if ( is_hardware_domain(d) ) +if ( is_hardware_domain(d) || need_iommu(d) = 0 ) continue; if ( iommu_use_hap_pt(d) ) ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] IOMMU: skip domains without page tables when dumping
Thanks, this solves the problem. El 27/08/15 a les 16.04, Jan Beulich ha escrit: Reported-by: Roger Pau Monné roger@citrix.com Signed-off-by: Jan Beulich jbeul...@suse.com Tested-by: Roger Pau Monné roger@citrix.com Acked-by: Roger Pau Monné roger@citrix.com --- a/xen/drivers/passthrough/iommu.c +++ b/xen/drivers/passthrough/iommu.c @@ -411,7 +411,7 @@ static void iommu_dump_p2m_table(unsigne ops = iommu_get_ops(); for_each_domain(d) { -if ( is_hardware_domain(d) ) +if ( is_hardware_domain(d) || need_iommu(d) = 0 ) continue; if ( iommu_use_hap_pt(d) ) ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [URGENT RFC] Branching and reopening -unstable
On Tue, Aug 11, 2015 at 12:13 PM, Ian Jackson ian.jack...@eu.citrix.com wrote: B 1(c)(maintainer)/2(a)/3(a) Branch. Maintainers may choose to defer patch series based on risk of conflicts with bugfixes required for 4.6. Clear communication with submitters is required. Bugfixes for bugs in 4.6 will be accepted onto the 4.6 branch. Maintainers are required to cherry pick them onto unstable. Bugfixes will not be accepted for unstable unless it is clear that the bug was introduced in unstable since 4.6 branched. I am happy with B because it gives the relevant maintainers the option. Did we ever come to a conclusion on this? FWIW I think Ian's proposal B here is the most sensible: the maintainer evaluates each patch for how much work it will create them, and decides whether they want to accept that work or not. -George ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] x86/NUMA: make init_node_heap() respect Xen heap limit
On Thu, Aug 27, 2015 at 02:37:17AM -0600, Jan Beulich wrote: On NUMA systems, where we try to use node local memory for the basic control structures of the buddy allocator, this special case needs to take into consideration a possible address width limit placed on the Xen heap. In turn this (but also other, more abstract considerations) requires that xenheap_max_mfn() not be called more than once (at most we might permit it to be called a second time with a larger value than was passed the first time), and be called only before calling end_boot_allocator(). While inspecting all the involved code, a couple of off-by-one issues were found (and are being corrected here at once): - arch_init_memory() cleared one too many page table slots - the highmem_start based invocation of xenheap_max_mfn() passed too big a value - xenheap_max_mfn() calculated the wrong bit count in edge cases Signed-off-by: Jan Beulich jbeul...@suse.com Release-acked-by: Wei Liu wei.l...@citrix.com ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 7/8] tmem: Remove extra spaces at end and some hard tabbing.
On 27/08/15 12:02, Konrad Rzeszutek Wilk wrote: @@ -1559,7 +1559,7 @@ refind: { /* no puts allowed into a frozen pool (except dup puts) */ if ( client-frozen ) - goto unlock_obj; + goto unlock_obj; Need to lose 3 spaces here. } } else @@ -1572,10 +1572,10 @@ refind: write_lock(pool-pool_rwlock); /* - * Parallel callers may already allocated obj and inserted to obj_rb_root - * before us. - */ -if (!obj_rb_insert(pool-obj_rb_root[oid_hash(oidp)], obj)) + * Parallel callers may already allocated obj and inserted to obj_rb_root + * before us. + */ +if ( !obj_rb_insert(pool-obj_rb_root[oid_hash(oidp)], obj) ) { tmem_free(obj, pool); write_unlock(pool-pool_rwlock); @@ -1945,7 +1945,7 @@ static int do_tmem_new_pool(domid_t this_cli_id, (client-shared_auth_uuid[i][1] == uuid_hi) ) break; if ( i == MAX_GLOBAL_SHARED_POOLS ) - { + { And here. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [BUG] 'o' debug key panics the hypervisor
El 27/08/15 a les 14.02, Andrew Cooper ha escrit: On 27/08/15 12:29, Roger Pau Monné wrote: Hello, When using Intel hardware without shared page tables between the IOMMU and EPT (I have not tried if the same happens when sharing the page tables), the following crash happens if I press the 'o' debug key with a HVM guest running. The guest doesn't have any device passed-through. (XEN) domain1 IOMMU p2m table: (XEN) p2m table has 4 levels (XEN) Assertion 'pfn_to_pdx(ma PAGE_SHIFT) (DIRECTMAP_SIZE PAGE_SHIFT)' failed at /root/src/xen/xen/include/asm/x86_64/page.h:80 (XEN) [ Xen-4.6.0-rc x86_64 debug=y Tainted:C ] (XEN) CPU:0 (XEN) RIP:e008:[82d080165f20] map_domain_page+0x1ef/0x5e6 (XEN) RFLAGS: 00010216 CONTEXT: hypervisor The use of paddr_to_pfn() in map_vtd_domain_page() is incorrect. Does this fix the issue? Sadly no. Here is the panic trace with your patch applied: (XEN) domain1 IOMMU p2m table: (XEN) p2m table has 4 levels (XEN) Assertion 'pfn_to_pdx(ma PAGE_SHIFT) (DIRECTMAP_SIZE PAGE_SHIFT)' failed at /root/src/xen/xen/include/asm/x86_64/page.h:80 (XEN) [ Xen-4.6.0-rc x86_64 debug=y Tainted:C ] (XEN) CPU:0 (XEN) RIP:e008:[82d080165815] map_domain_page+0x1ef/0x5e6 (XEN) RFLAGS: 00010216 CONTEXT: hypervisor (XEN) rax: 000a8bcf000e rbx: a8bcf000e000 rcx: (XEN) rdx: 0007c7ff rsi: dfa9f000 rdi: 000a8bcf000e (XEN) rbp: 8300dfaf7d60 rsp: 8300dfaf7d20 r8: (XEN) r9: 0001 r10: 0004 r11: 0001 (XEN) r12: 83019d263000 r13: 0003 r14: 8300dfdf2000 (XEN) r15: 000ff000 cr0: 8005003b cr4: 26e0 (XEN) cr3: dfa9f000 cr2: 880012b18c90 (XEN) ds: es: fs: gs: ss: e010 cs: e008 (XEN) Xen stack trace from rsp=8300dfaf7d20: (XEN)82d08012cfda 82d0802a1bc0 8300dfaf7d40 a8bcf000e000 (XEN)8300 0003 0080 000ff000 (XEN)8300dfaf7d70 82d080152998 8300dfaf7dc0 82d08014c389 (XEN)0001dfaf7db0 82d08012c71a 0001 (XEN)8300 0003 000ff000 (XEN)8300dfaf7e10 82d08014c402 dfaf7e20 00270001 (XEN)8300dfdf2000 83019ff11000 82d080291240 82d0802688f8 (XEN)82d080339488 8300dfaf7e30 82d08014c4a6 (XEN) 83019ff11000 8300dfaf7e60 82d08014786a (XEN) 82d0802a1d60 006f (XEN)8300dfaf7e90 82d080114906 8300dfaf7e90 82d0802a15c0 (XEN) 82d080339560 8300dfaf7ea0 82d080114940 (XEN)8300dfaf7ec0 82d08012f47d 8300dfaf7ec0 82d080339570 (XEN)8300dfaf7ef0 82d08012f7b3 054782099709 8300dfaf (XEN)054782099709 8300dfdf2000 8300dfaf7f10 82d0801607c3 (XEN)82d08012c7c4 8300dfdf2000 8300dfaf7e10 (XEN) 88001faa7e80 8168f160 (XEN)8160 0246 0020 88001f60da80 (XEN) 810013aa (XEN)deadbeef deadbeef 0100 810013aa (XEN) Xen call trace: (XEN)[82d080165815] map_domain_page+0x1ef/0x5e6 (XEN)[82d080152998] map_vtd_domain_page+0x4a/0x4c (XEN)[82d08014c389] vtd_dump_p2m_table_level+0x2a/0xf6 (XEN)[82d08014c402] vtd_dump_p2m_table_level+0xa3/0xf6 (XEN)[82d08014c4a6] vtd_dump_p2m_table+0x51/0x59 (XEN)[82d08014786a] iommu_dump_p2m_table+0xaa/0xbe (XEN)[82d080114906] handle_keypress+0x68/0x8d (XEN)[82d080114940] keypress_action+0x15/0x17 (XEN)[82d08012f47d] do_tasklet_work+0x78/0xab (XEN)[82d08012f7b3] do_tasklet+0x5e/0x8a (XEN)[82d0801607c3] idle_loop+0x56/0x6b (XEN) (XEN) (XEN) (XEN) Panic on CPU 0: (XEN) Assertion 'pfn_to_pdx(ma PAGE_SHIFT) (DIRECTMAP_SIZE PAGE_SHIFT)' failed at /root/src/xen/xen/include/asm/x86_64/page.h (XEN) (XEN) (XEN) Reboot in five seconds... ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Design doc of adding ACPI support for arm64 on Xen - version 4
On 2015/8/27 15:52, Jan Beulich wrote: On 27.08.15 at 02:37, julien.gr...@citrix.com wrote: On 20/08/2015 19:25, Shannon Zhao wrote: On 2015/8/20 22:06, Jan Beulich wrote: So can the two of you please sort out whether these are Linux internal tags (which Xen has no business generating, or even knowing of) or some form of publicly documented interface? They are Linux internal tags. But to support Xen ACPI on ARM, we just reuse the existing mechanism to pass the information to Dom0. My previous mail was unclear, sorry. I was explaining the current usage of those properties because you seemed to be concern about the backward compatibility. We'd like to formalize those properties in order to use them to pass information between Xen and the kernel image. Ah, that clarifies things. Thanks. This would avoid us to create a Xen specific entry path in the kernel and use the non-EFI one (also used by the EFI stub) and make easier to support new DOM0 OS (for instance FreeBSD). So I totally agree with Shannon mails. One other aspect completely left off so far is that of proper isolation: What x86 exposes to Dom0 is specifically limited to what Dom0 is supposed to know. I'm getting the impression that by exposing more EFI tables this is being violated just for the purpose of avoiding any changes to Linux. But maybe I'm misremembering, and all the extra tables exposed are actually fake ones rather than cloned host ones. Currently we create EFI system table and EFI memory descriptor table as Linux requires. I think the EFI memory descriptor table is necessary. What we didn't reach an agreement is only the EFI system table. Yes, we only use the Configure table of the EFI system table to get the ACPI root address. As you mentioned before, it could pass only the Configure table to Dom0, but we should change the process of parsing the DT and consider the backwards compatibility. On the other hand, as the RUNTIME service is not supported, it could assign the runtime service members of EFI system table invalid values and let Dom0 not initialize RUNTIME service(This could be done by making the memory attribute not be EFI_MEMORY_RUNTIME when we create the EFI memory descriptor table). If the RUNTIME service is supported in the future, it doesn't need to change the Linux again. So it could avoid changing back. Jan, how do you think about this? Thanks, -- Shannon ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [BUG] 'o' debug key panics the hypervisor
On 27.08.15 at 13:29, roger@citrix.com wrote: When using Intel hardware without shared page tables between the IOMMU and EPT (I have not tried if the same happens when sharing the page tables), the following crash happens if I press the 'o' debug key with a HVM guest running. The guest doesn't have any device passed-through. This last thing is the relevant one: No-one ever cared to use 'o' without also passing through a device. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Design doc of adding ACPI support for arm64 on Xen - version 4
On 2015/8/27 22:13, Jan Beulich wrote: On 27.08.15 at 15:50, shannon.z...@linaro.org wrote: On 2015/8/27 15:52, Jan Beulich wrote: One other aspect completely left off so far is that of proper isolation: What x86 exposes to Dom0 is specifically limited to what Dom0 is supposed to know. I'm getting the impression that by exposing more EFI tables this is being violated just for the purpose of avoiding any changes to Linux. But maybe I'm misremembering, and all the extra tables exposed are actually fake ones rather than cloned host ones. Currently we create EFI system table and EFI memory descriptor table as Linux requires. I think the EFI memory descriptor table is necessary. What we didn't reach an agreement is only the EFI system table. Yes, we only use the Configure table of the EFI system table to get the ACPI root address. As you mentioned before, it could pass only the Configure table to Dom0, but we should change the process of parsing the DT and consider the backwards compatibility. A made up system table would (as said before) be fine with me too. Just not a clone of the host one. Yeah, it's a made up one. On the other hand, as the RUNTIME service is not supported, it could assign the runtime service members of EFI system table invalid values and let Dom0 not initialize RUNTIME service(This could be done by making the memory attribute not be EFI_MEMORY_RUNTIME when we create the EFI memory descriptor table). If the RUNTIME service is supported in the future, it doesn't need to change the Linux again. So it could avoid changing back. I'd strongly advise against such hackery - it will get you (and Xen) into the bad firmware corner. EFI without runtime services doesn't exist. And runtime services code/data not marked as such are a firmware bug (sadly existing in reality on the x86 side). But remember that under Xen the Dom0 kernel mustn't care about runtime services (other than wanting to be able to invoke them through hypercalls). Oh, I see. Thanks for your explanation. -- Shannon ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] xen 4.6 regression: xl create fails if domU have custom vifname
Today trying xen 4.6.0-rc2 with all things needed for future production server I found a regression: xl create fails if domU have custom vifname. xl create test.cfg Parsing config from test.cfg libxl: error: libxl_exec.c:118:libxl_report_child_exitstatus: /etc/xen/scripts/vif-bridge add [14581] exited with error status 1 libxl: error: libxl_device.c:1085:device_hotplug_child_death_cb: script: ip link set vif14.0-emu name frc-0-0-emu failed libxl: error: libxl_create.c:1380:domcreate_attach_vtpms: unable to add nic devices libxl: error: libxl_exec.c:118:libxl_report_child_exitstatus: /etc/xen/scripts/block remove [14615] exited with error status 1 libxl: error: libxl_device.c:1085:device_hotplug_child_death_cb: script: /etc/xen/scripts/block failed; error detected. libxl: error: libxl.c:1580:libxl__destroy_domid: non-existant domain 14 libxl: error: libxl.c:1538:domain_destroy_callback: unable to destroy guest with domid 14 libxl: error: libxl.c:1465:domain_destroy_cb: destruction of domain 14 failed From domU xl cfg: vif=['bridge=xenbr0,mac=00:16:3e:dd:e3:f1,vifname=frc-0-0'] Same domU removing only vifname is working. On production server with xen 4.4.2 or 4.5.1 domUs with vifname are working. I did a fast search without found the exact bug/regression cause. If you need more informations/tests tell me and I'll post them. Thanks for any reply and sorry for my bad english. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v2] libxenstore: prefer using the character device
With the addition of FMODE_ATOMIC_POS in the Linux 3.14 kernel, concurrent blocking file accesses to a single open file descriptor can cause a deadlock trying to grab the file position lock. If a watch has been set up, causing a read_thread to blocking read on the file descriptor, then future writes that would cause the background read to complete will block waiting on the file position lock before they can execute. This race condition only occurs when libxenstore is accessing the xenstore daemon through the /proc/xen/xenbus file and not through the unix domain socket, which is the case when the xenstore daemon is running as a stub domain or when oxenstored is passed --disable-socket. Accessing the daemon from the true character device also does not exhibit this problem. On Linux, prefer using the character device file over the proc file if the character device exists. Signed-off-by: Jonathan Creekmore jonathan.creekm...@gmail.com --- tools/xenstore/xs_lib.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/tools/xenstore/xs_lib.c b/tools/xenstore/xs_lib.c index af4f75a..0c7744e 100644 --- a/tools/xenstore/xs_lib.c +++ b/tools/xenstore/xs_lib.c @@ -81,6 +81,8 @@ const char *xs_domain_dev(void) #if defined(__RUMPUSER_XEN__) || defined(__RUMPRUN__) return /dev/xen/xenbus; #elif defined(__linux__) + if (access(/dev/xen/xenbus, F_OK) == 0) + return /dev/xen/xenbus; return /proc/xen/xenbus; #elif defined(__NetBSD__) return /kern/xen/xenbus; -- 2.1.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Design doc of adding ACPI support for arm64 on Xen - version 4
On 27.08.15 at 15:50, shannon.z...@linaro.org wrote: On 2015/8/27 15:52, Jan Beulich wrote: One other aspect completely left off so far is that of proper isolation: What x86 exposes to Dom0 is specifically limited to what Dom0 is supposed to know. I'm getting the impression that by exposing more EFI tables this is being violated just for the purpose of avoiding any changes to Linux. But maybe I'm misremembering, and all the extra tables exposed are actually fake ones rather than cloned host ones. Currently we create EFI system table and EFI memory descriptor table as Linux requires. I think the EFI memory descriptor table is necessary. What we didn't reach an agreement is only the EFI system table. Yes, we only use the Configure table of the EFI system table to get the ACPI root address. As you mentioned before, it could pass only the Configure table to Dom0, but we should change the process of parsing the DT and consider the backwards compatibility. A made up system table would (as said before) be fine with me too. Just not a clone of the host one. On the other hand, as the RUNTIME service is not supported, it could assign the runtime service members of EFI system table invalid values and let Dom0 not initialize RUNTIME service(This could be done by making the memory attribute not be EFI_MEMORY_RUNTIME when we create the EFI memory descriptor table). If the RUNTIME service is supported in the future, it doesn't need to change the Linux again. So it could avoid changing back. I'd strongly advise against such hackery - it will get you (and Xen) into the bad firmware corner. EFI without runtime services doesn't exist. And runtime services code/data not marked as such are a firmware bug (sadly existing in reality on the x86 side). But remember that under Xen the Dom0 kernel mustn't care about runtime services (other than wanting to be able to invoke them through hypercalls). Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 22/23] x86: make Xen early boot code relocatable
On 20.07.15 at 16:29, daniel.ki...@oracle.com wrote: - %fs register is filled with segment descriptor which describes memory region with Xen image (it could be relocated or not); This is too fuzzy. Please be very precise which region it is that %fs is supposed to point to (so that reviewers have a chance to validate uses). --- a/xen/arch/x86/Makefile +++ b/xen/arch/x86/Makefile @@ -72,8 +72,10 @@ efi-$(x86_64) := $(shell if [ ! -r $(BASEDIR)/include/xen/compile.h -o \ echo '$(TARGET).efi'; fi) $(TARGET): $(TARGET)-syms $(efi-y) boot/mkelf32 - ./boot/mkelf32 $(TARGET)-syms $(TARGET) 0x10 \ - `$(NM) -nr $(TARGET)-syms | head -n 1 | sed -e 's/^\([^ ]*\).*/0x\1/'` +#THIS IS UGLY HACK! PLEASE DO NOT COMPLAIN. I WILL FIX IT IN NEXT RELEASE. + ./boot/mkelf32 $(TARGET)-syms $(TARGET) $(XEN_IMG_PHYS_START) 0x82d08100 +#./boot/mkelf32 $(TARGET)-syms $(TARGET) 0x10 \ +#`$(NM) -nr $(TARGET)-syms | head -n 1 | sed -e 's/^\([^ ]*\).*/0x\1/'` I do complain nevertheless: Such a patch should be tagged RFC. And with such a hack in place I'm not even sure it's worth reviewing. --- a/xen/arch/x86/boot/head.S +++ b/xen/arch/x86/boot/head.S @@ -12,13 +12,15 @@ .text .code32 -#define sym_phys(sym) ((sym) - __XEN_VIRT_START) +#define sym_phys(sym) ((sym) - __XEN_VIRT_START + XEN_IMG_PHYS_START - XEN_IMG_OFFSET) +#define sym_offset(sym) ((sym) - __XEN_VIRT_START) With XEN_IMG_PHYS_START == XEN_IMG_OFFSET - what's the point? (If there is a point, then there's obviously a comment missing here explaining things, including when to use which.) @@ -105,12 +107,13 @@ multiboot1_header_end: .word 0 gdt_boot_descr: -.word 6*8-1 -.long sym_phys(trampoline_gdt) +.word 7*8-1 +gdt_boot_descr_addr: gdt_boot_base would seem a better name; with C in mind what you use currently could be easily mistaken for a variable holding gdt_boot_descr. @@ -263,12 +271,8 @@ __start: cld cli -/* Initialise GDT and basic data segments. */ -lgdt%cs:sym_phys(gdt_boot_descr) -mov $BOOT_DS,%ecx -mov %ecx,%ds -mov %ecx,%es -mov %ecx,%ss +/* Load default Xen image base address. */ +mov $sym_phys(__image_base__),%ebp Isn't this always zero? I.e. wouldn't you better use xor %ebp,%ebp? /* Save the Multiboot info struct (after relocation) for later use. */ -mov $sym_phys(cpu0_stack)+1024,%esp +lea (sym_offset(cpu0_stack)+1024)(%ebp),%esp Please avoid obfuscating the code by adding pointless parentheses. @@ -390,72 +432,111 @@ trampoline_setup: /* Stash TSC to calculate a good approximation of time-since-boot */ rdtsc -mov %eax,sym_phys(boot_tsc_stamp) -mov %edx,sym_phys(boot_tsc_stamp+4) +mov %eax,%fs:sym_offset(boot_tsc_stamp) +mov %edx,%fs:sym_offset(boot_tsc_stamp+4) -/* Initialise L2 boot-map page table entries (16MB). */ -mov $sym_phys(l2_bootmap),%edx -mov $PAGE_HYPERVISOR|_PAGE_PSE,%eax -mov $8,%ecx +/* Update frame addreses in page tables. */ +lea sym_offset(__page_tables_start)(%ebp),%edx +mov $((__page_tables_end-__page_tables_start)/8),%ecx +1: testl $_PAGE_PRESENT,(%edx) +jz 2f +add %ebp,(%edx) +2: add $8,%edx +loop1b + +/* Initialise L2 boot-map page table entries (14MB). */ +lea sym_offset(l2_bootmap)(%ebp),%edx +lea sym_offset(start)(%ebp),%eax +and $~((1L2_PAGETABLE_SHIFT)-1),%eax +mov %eax,%ebx +shr $(L2_PAGETABLE_SHIFT-3),%ebx +and $(L2_PAGETABLE_ENTRIES*4*8-1),%ebx +add %ebx,%edx +add $(PAGE_HYPERVISOR|_PAGE_PSE),%eax +mov $7,%ecx 1: mov %eax,(%edx) add $8,%edx add $(1L2_PAGETABLE_SHIFT),%eax loop1b + /* Initialise L3 boot-map page directory entry. */ -mov $sym_phys(l2_bootmap)+__PAGE_HYPERVISOR,%eax -mov %eax,sym_phys(l3_bootmap) + 0*8 +lea (sym_offset(l2_bootmap)+__PAGE_HYPERVISOR)(%ebp),%eax +lea sym_offset(l3_bootmap)(%ebp),%ebx +mov $4,%ecx +1: mov %eax,(%ebx) +add $8,%ebx +add $(L2_PAGETABLE_ENTRIES*8),%eax +loop1b + +/* Initialise L2 direct map page table entries (14MB). */ +lea sym_offset(l2_identmap)(%ebp),%edx +lea sym_offset(start)(%ebp),%eax +and $~((1L2_PAGETABLE_SHIFT)-1),%eax +mov %eax,%ebx +shr $(L2_PAGETABLE_SHIFT-3),%ebx +and $(L2_PAGETABLE_ENTRIES*4*8-1),%ebx +add
Re: [Xen-devel] [BUG] 'o' debug key panics the hypervisor
El 27/08/15 a les 16.01, Jan Beulich ha escrit: On 27.08.15 at 13:29, roger@citrix.com wrote: When using Intel hardware without shared page tables between the IOMMU and EPT (I have not tried if the same happens when sharing the page tables), the following crash happens if I press the 'o' debug key with a HVM guest running. The guest doesn't have any device passed-through. This last thing is the relevant one: No-one ever cared to use 'o' without also passing through a device. I have to admit I discovered this when I thought I was typing at the Dom0 console but it turns out I had it switched to the Xen console... Anyway, it shouldn't crash Xen. Roger. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 22/23] x86: make Xen early boot code relocatable
On Thu, Aug 27, 2015 at 9:29 AM, Jan Beulich jbeul...@suse.com wrote: On 27.08.15 at 17:10, daniel.ki...@oracle.com wrote: On Thu, Aug 27, 2015 at 07:12:38AM -0600, Jan Beulich wrote: On 20.07.15 at 16:29, daniel.ki...@oracle.com wrote: /* Copy bootstrap trampoline to low memory, below 1MB. */ -mov $sym_phys(trampoline_start),%esi +lea sym_offset(trampoline_start)(%ebp),%esi mov $trampoline_end - trampoline_start,%ecx rep movsb The latest at this final hunk I'm getting tired (and upset). I'd much rather not touch all this code in these fragile ways, and instead require Xen to be booted without grub2 on badly written firmware like the one on the machine you quote in the description. Let's start discussion from here. My further work on this patch series is pointless until we do not agree details in this case. So, I agree that any software (including firmware) should not use precious resources (i.e. low memory in that case) if it is not strictly required. And I do not think so that latest UEFI implementations on new hardware really need this low memory to survive (at least page tables could be put anywhere above low memory). However, in case of UEFI it is a wish of smart people not requirement set by spec. I was checking UEFI docs a few times but I was not able to find anything which says that e.g. ...developer MUST allocate memory outside of low memory Even I have not found any suggestion about that. However, I must admit that I do not know whole UEFI spec, so, if you know something which is similar to above please tell me where it is (title, revision, page, paragraph, etc.). Hence, if there is not strict requirement in regards to memory allocation in specs we are lost. Of course we can encourage people to not do nasty things but I do not believe that many will listen. So, sadly, I suppose that we will see more and more machines in the wild which are broken (well, let's say do not align to good practices). On the other hand I think that we should not assume that a given memory region (in our case starting at 1 MiB) is (or will be) available in every machine. I have not seen anything which says that it is true. We should stop doing that even if it works right now. I think that it works by chance and there is a chance that it will stop working one day because somebody will discover that he or she can put there e.g. device or hole. Last but not least. I suppose that Xen users and especially distros will complain when they are not able to use GRUB2 with Xen on every platform just because somebody (i.e. platform designers, developers, or whatever) do not accept our beliefs or we require that platform must obey rules (i.e. memory map requirements) which are specified nowhere. You're right, there's no such requirement on memory use in the spec. But you're missing the point. Supporting grub2 on UEFI is already a hack (ignoring all intentions EFI had from its first days). And now you've found an environment where that hack needs another hack (in Xen) to actually work. That's too much hackery for my taste, the more that things on this system can (afaict) work quite okay (without grub2, or with using its chainloader mechanism). If you advocate direct booting ( no boot loader) on production machines I wont argue much, as long as there is good recovery tools to deal with failed boots (grub does this very well, I am not aware of anything comparable that is pure efi). however the other use case which care more about is dual booting. I am a novice when it comes to Xen, although otherwise competent. The test machines I have for playing with Xen are also used for other tests, some of which require raw hardware support, so I want the ability to use a boot menu. I am aware of refit, but it does not have serial support which makes automating testing more difficult. and I have not yet got ipxe to successfully boot Xen (although this is purely because I have not yet devoted enough time to that project and I am not yet familiar with how Xen boots). So no, I'm still not convinced of the need for this patch. I am at a loss for alternatives. Yes grub on efi violates the spirit of efi. Propose a better way forward rather than deriding those who have found a successful, portable way around the limitations of efi implementations. Jan ___ Grub-devel mailing list grub-de...@gnu.org https://lists.gnu.org/mailman/listinfo/grub-devel -- -- Ben Hildred Automation Support Services 303 815 6721 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] x86: wrap kexec feature with CONFIG_KEXEC
On 27.08.15 at 17:22, andrew.coop...@citrix.com wrote: On 27/08/15 15:47, Jonathan Creekmore wrote: @@ -812,7 +816,11 @@ ENTRY(hypercall_args_table) .byte 2 /* do_hvm_op*/ .byte 1 /* do_sysctl*/ /* 35 */ .byte 1 /* do_domctl*/ +#ifdef CONFIG_KEXEC .byte 2 /* do_kexec */ +#else +.byte 0 /* do_ni_hypercall */ +#endif These changes will corrupt guest registers in debug builds. It's quite the other way around: Other than now (where two registers get clobbered), no registers would be clobbered anymore. Of course that's not to say the argument count tables shouldn't be left alone. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH RFC] xen: if on Xen, flatten the scheduling domain hierarchy
On Thu, Aug 27, 2015 at 11:24 AM, George Dunlap george.dun...@citrix.com wrote: On 08/18/2015 04:55 PM, Dario Faggioli wrote: Hey everyone, So, as a followup of what we were discussing in this thread: [Xen-devel] PV-vNUMA issue: topology is misinterpreted by the guest http://lists.xenproject.org/archives/html/xen-devel/2015-07/msg03241.html I started looking in more details at scheduling domains in the Linux kernel. Now, that thread was about CPUID and vNUMA, and their weird way of interacting, while this thing I'm proposing here is completely independent from them both. In fact, no matter whether vNUMA is supported and enabled, and no matter whether CPUID is reporting accurate, random, meaningful or completely misleading information, I think that we should do something about how scheduling domains are build. Fact is, unless we use 1:1, and immutable (across all the guest lifetime) pinning, scheduling domains should not be constructed, in Linux, by looking at *any* topology information, because that just does not make any sense, when vcpus move around. Let me state this again (hoping to make myself as clear as possible): no matter in how much good shape we put CPUID support, no matter how beautifully and consistently that will interact with both vNUMA, licensing requirements and whatever else. It will be always possible for vCPU #0 and vCPU #3 to be scheduled on two SMT threads at time t1, and on two different NUMA nodes at time t2. Hence, the Linux scheduler should really not skew his load balancing logic toward any of those two situations, as neither of them could be considered correct (since nothing is!). For now, this only covers the PV case. HVM case shouldn't be any different, but I haven't looked at how to make the same thing happen in there as well. OVERALL DESCRIPTION === What this RFC patch does is, in the Xen PV case, configure scheduling domains in such a way that there is only one of them, spanning all the pCPUs of the guest. Note that the patch deals directly with scheduling domains, and there is no need to alter the masks that will then be used for building and reporting the topology (via CPUID, /proc/cpuinfo, /sysfs, etc.). That is the main difference between it and the patch proposed by Juergen here: http://lists.xenproject.org/archives/html/xen-devel/2015-07/msg05088.html This means that when, in future, we will fix CPUID handling and make it comply with whatever logic or requirements we want, that won't have any unexpected side effects on scheduling domains. Information about how the scheduling domains are being constructed during boot are available in `dmesg', if the kernel is booted with the 'sched_debug' parameter. It is also possible to look at /proc/sys/kernel/sched_domain/cpu*, and at /proc/schedstat. With the patch applied, only one scheduling domain is created, called the 'VCPU' domain, spanning all the guest's (or Dom0's) vCPUs. You can tell that from the fact that every cpu* folder in /proc/sys/kernel/sched_domain/ only have one subdirectory ('domain0'), with all the tweaks and the tunables for our scheduling domain. EVALUATION == I've tested this with UnixBench, and by looking at Xen build time, on a 16, 24 and 48 pCPUs hosts. I've run the benchmarks in Dom0 only, for now, but I plan to re-run them in DomUs soon (Juergen may be doing something similar to this in DomU already, AFAUI). I've run the benchmarks with and without the patch applied ('patched' and 'vanilla', respectively, in the tables below), and with different number of build jobs (in case of the Xen build) or of parallel copy of the benchmarks (in the case of UnixBench). What I get from the numbers is that the patch almost always brings benefits, in some cases even huge ones. There are a couple of cases where we regress, but always only slightly so, especially if comparing that to the magnitude of some of the improvement that we get. Bear also in mind that these results are gathered from Dom0, and without any overcommitment at the vCPU level (i.e., nr. vCPUs == nr pCPUs). If we move things in DomU and do overcommit at the Xen scheduler level, I am expecting even better results. RESULTS === To have a quick idea of how a benchmark went, look at the '% improvement' row of each table. I'll put these results online, in a googledoc spreadsheet or something like that, to make them easier to read, as soon as possible. *** Intel(R) Xeon(R) E5620 @ 2.40GHz *** pCPUs 16DOM0 vCPUS 16 *** RAM12285 MB DOM0 Memory 9955 MB *** NUMA nodes 2 === MAKE XEN (lower == better) === # of build jobs -j1
Re: [Xen-devel] [RFC v2 for-4.6 0/2] In-tree feature documentation
Andrew Cooper writes (Re: [RFC v2 for-4.6 0/2] In-tree feature documentation): On 27/08/15 15:52, Ian Jackson wrote: I do wonder whether cross-referencing all the issues is a good idea. It seems like it might be a lot of work to keep them in step. I don't expect all the issues to be enumerated (generally, they would be found the first time someone falls over the issue), but where known interaction issues exist, we need to have some place to leave them. I was prompted to ask this because it seemed to me that some of the issues were discussed in other parts of the text or in other patches as well as in `issues'. There are plenty of examples where known issues are documented somewhere in the xen-devel archives, or in an individuals head, and neither of these are useful places for the information to exist. I agree with this. I think things should be in the tree, but once. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2] libxenstore: prefer using the character device
On Thu, Aug 27, 2015 at 09:04:38AM -0500, Jonathan Creekmore wrote: With the addition of FMODE_ATOMIC_POS in the Linux 3.14 kernel, concurrent blocking file accesses to a single open file descriptor can cause a deadlock trying to grab the file position lock. If a watch has been set up, causing a read_thread to blocking read on the file descriptor, then future writes that would cause the background read to complete will block waiting on the file position lock before they can execute. This race condition only occurs when libxenstore is accessing the xenstore daemon through the /proc/xen/xenbus file and not through the unix domain socket, which is the case when the xenstore daemon is running as a stub domain or when oxenstored is passed --disable-socket. Accessing the daemon from the true character device also does not exhibit this problem. On Linux, prefer using the character device file over the proc file if the character device exists. Signed-off-by: Jonathan Creekmore jonathan.creekm...@gmail.com Acked-by: Wei Liu wei.l...@citrix.com I think this is a candidate for 4.6. --- tools/xenstore/xs_lib.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/tools/xenstore/xs_lib.c b/tools/xenstore/xs_lib.c index af4f75a..0c7744e 100644 --- a/tools/xenstore/xs_lib.c +++ b/tools/xenstore/xs_lib.c @@ -81,6 +81,8 @@ const char *xs_domain_dev(void) #if defined(__RUMPUSER_XEN__) || defined(__RUMPRUN__) return /dev/xen/xenbus; #elif defined(__linux__) + if (access(/dev/xen/xenbus, F_OK) == 0) + return /dev/xen/xenbus; return /proc/xen/xenbus; #elif defined(__NetBSD__) return /kern/xen/xenbus; -- 2.1.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] PAGE_SIZE (64KB), while block driver 'struct request' deals with PAGE_SIZE (up to 44Kb). Was:Re: [RFC] Support of non-indirect grant backend on 64KB guest
Hi, On 21/08/15 18:10, Konrad Rzeszutek Wilk wrote: On Fri, Aug 21, 2015 at 05:08:35PM +0100, David Vrabel wrote: On 21/08/15 17:05, Konrad Rzeszutek Wilk wrote: I have to concur with that. We can't mandate that ARM 64k page MUST use indirect descriptors. Then it has to be fixed in the block layer to allow PAGE_SIZE segments and to get the block layer to split requests for blkfront. Hey Jens, I am hoping you can help us figure this problem out. The Linux ARM is capable of using 4KB pages and 64KB pages. Our block driver (xen-blkfront) was built with 4KB pages in mind and without using any fancy flags (which some backends lack) the maximum amount of I/O it can fit on a ring is 44KB. This has the unfortunate effect that when the xen-blkfront gets an 'struct request' it can have on page (64KB) and it can't actually fit it on the ring! And the lowest segment size it advertises is PAGE_SIZE (64KB). I believe Julien (who found this) tried initially advertising smaller segment size than PAGE_SIZE (32KB). However looking at __blk_segment_map_sg it looks to assume smallest size is PAGE_SIZE so that would explain why it did not work. To be honest, I haven't tried to see how the block layer will act if I dropped those checks in blk-settings.c until today. I don't see any assumption about PAGE_SIZE in __blk_segment_map_sg. Although dropping the checks in blk-settings (see quick patch [1]), I got the following error in the frontend: bio too big device xvda (128 88) Buffer I/O error on dev xvda, logical block 0, async page read bio too big device xvda (128 88) Buffer I/O error on dev xvda, logical block 0, async page read The bio too big device ... comes from generic_make_request_checks (linux/block/blk-core.c) and the stack trace is: [fe096c7c] dump_backtrace+0x0/0x124 [fe096db0] show_stack+0x10/0x1c [fe5885e8] dump_stack+0x78/0xbc [fe0bf7f8] warn_slowpath_common+0x98/0xd0 [fe0bf8f0] warn_slowpath_null+0x14/0x20 [fe2df304] generic_make_request_checks+0x114/0x230 [fe2e0580] generic_make_request+0x10/0x108 [fe2e070c] submit_bio+0x94/0x1e0 [fe1d573c] submit_bh_wbc.isra.36+0x100/0x1a8 [fe1d5bf8] block_read_full_page+0x320/0x3e8 [fe1d877c] blkdev_readpage+0x14/0x20 [fe14582c] do_read_cache_page+0x16c/0x1a0 [fe145870] read_cache_page+0x10/0x1c [fe2f2908] read_dev_sector+0x30/0x9c [fe2f3d84] msdos_partition+0x84/0x554 [fe2f38e4] check_partition+0xf8/0x21c [fe2f2f28] rescan_partitions+0xb0/0x2a4 [fe1d98b0] __blkdev_get+0x228/0x34c [fe1d9a14] blkdev_get+0x40/0x364 [fe2f0b6c] add_disk+0x398/0x424 [fe3d8500] blkback_changed+0x1200/0x152c [fe36a954] xenbus_otherend_changed+0x9c/0xa4 [fe36c984] backend_changed+0xc/0x18 [fe36a088] xenwatch_thread+0xa0/0x13c [fe0d98d0] kthread+0xd8/0xf0 The fs buffer code seems to assume that the block driver will always support at least a bio of PAGE_SIZE. One wya to make this work is for the driver (xen-blkfront) to split the 'struct request' I/O in two internal requests. But this has to be a normal problem. Surely there are other drivers (MMC?) that can't handle PAGE_SIZE and have to break their I/Os. Would it make sense for the common block code to be able to deal with this? It will take me a bit of time to fully understand the block layer as the changes doesn't seem trivial from POV (I don't have any knowledge in it). So I will wait a feedback from Jens before going further on this approach. Regards, [1] patch diff --git a/block/blk-settings.c b/block/blk-settings.c index e0057d0..ac024e7 100644 --- a/block/blk-settings.c +++ b/block/blk-settings.c @@ -251,12 +251,15 @@ EXPORT_SYMBOL(blk_queue_bounce_limit); **/ void blk_limits_max_hw_sectors(struct queue_limits *limits, unsigned int max_hw_sectors) { +#if 0 if ((max_hw_sectors 9) PAGE_CACHE_SIZE) { max_hw_sectors = 1 (PAGE_CACHE_SHIFT - 9); printk(KERN_INFO %s: set to minimum %d\n, __func__, max_hw_sectors); } +#endif + limits-max_sectors = limits-max_hw_sectors = max_hw_sectors; } EXPORT_SYMBOL(blk_limits_max_hw_sectors); @@ -351,11 +354,14 @@ EXPORT_SYMBOL(blk_queue_max_segments); **/ void blk_queue_max_segment_size(struct request_queue *q, unsigned int max_size) { +#if 0 if (max_size PAGE_CACHE_SIZE) { max_size = PAGE_CACHE_SIZE; printk(KERN_INFO %s: set to minimum %d\n, __func__, max_size); } +#endif + q-limits.max_segment_size = max_size; } @@ -777,11 +783,14 @@ EXPORT_SYMBOL_GPL(blk_queue_dma_drain); **/ void blk_queue_segment_boundary(struct request_queue *q, unsigned long mask) { +#if 0 if (mask PAGE_CACHE_SIZE - 1) { mask = PAGE_CACHE_SIZE - 1; printk(KERN_INFO %s: set
Re: [Xen-devel] [PATCH for 4.6] build: fix tarball stubdom build
Wei Liu writes ([PATCH for 4.6] build: fix tarball stubdom build): When we create a source code tarball, mini-os is extracted to extras/mini-os directory. When building a source code tarball, we shouldn't clone mini-os again. Only clone mini-os when that directory doesn't exist. This fixes tarball build and doesn't affect non-tarball build. Acked-by: Ian Jackson ian.jack...@eu.citrix.com I am about to commit this. Thanks, Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2] libxenstore: prefer using the character device
Wei Liu writes (Re: [Xen-devel] [PATCH v2] libxenstore: prefer using the character device): On Thu, Aug 27, 2015 at 09:04:38AM -0500, Jonathan Creekmore wrote: With the addition of FMODE_ATOMIC_POS in the Linux 3.14 kernel, concurrent blocking file accesses to a single open file descriptor can cause a deadlock trying to grab the file position lock. If a watch has been set up, causing a read_thread to blocking read on the file descriptor, then future writes that would cause the background read to complete will block waiting on the file position lock before they can execute. This race condition only occurs when libxenstore is accessing the xenstore daemon through the /proc/xen/xenbus file and not through the unix domain socket, which is the case when the xenstore daemon is running as a stub domain or when oxenstored is passed --disable-socket. Accessing the daemon from the true character device also does not exhibit this problem. On Linux, prefer using the character device file over the proc file if the character device exists. I confess I still see this as working around a kernel bug. Only this time we are switching from a buggy to non-buggy kernel interface. Why don't we have the kernel provide only non-buggy interfaces ? diff --git a/tools/xenstore/xs_lib.c b/tools/xenstore/xs_lib.c index af4f75a..0c7744e 100644 --- a/tools/xenstore/xs_lib.c +++ b/tools/xenstore/xs_lib.c @@ -81,6 +81,8 @@ const char *xs_domain_dev(void) #if defined(__RUMPUSER_XEN__) || defined(__RUMPRUN__) return /dev/xen/xenbus; #elif defined(__linux__) + if (access(/dev/xen/xenbus, F_OK) == 0) + return /dev/xen/xenbus; Also, previously xs_domain_dev was a function which simply returned a static value. I feel vaguely uneasy at putting this kind of autodetection logic here. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 22/23] x86: make Xen early boot code relocatable
On 27/08/15 16:29, Jan Beulich wrote: On 27.08.15 at 17:10, daniel.ki...@oracle.com wrote: On Thu, Aug 27, 2015 at 07:12:38AM -0600, Jan Beulich wrote: On 20.07.15 at 16:29, daniel.ki...@oracle.com wrote: /* Copy bootstrap trampoline to low memory, below 1MB. */ -mov $sym_phys(trampoline_start),%esi +lea sym_offset(trampoline_start)(%ebp),%esi mov $trampoline_end - trampoline_start,%ecx rep movsb The latest at this final hunk I'm getting tired (and upset). I'd much rather not touch all this code in these fragile ways, and instead require Xen to be booted without grub2 on badly written firmware like the one on the machine you quote in the description. Let's start discussion from here. My further work on this patch series is pointless until we do not agree details in this case. So, I agree that any software (including firmware) should not use precious resources (i.e. low memory in that case) if it is not strictly required. And I do not think so that latest UEFI implementations on new hardware really need this low memory to survive (at least page tables could be put anywhere above low memory). However, in case of UEFI it is a wish of smart people not requirement set by spec. I was checking UEFI docs a few times but I was not able to find anything which says that e.g. ...developer MUST allocate memory outside of low memory Even I have not found any suggestion about that. However, I must admit that I do not know whole UEFI spec, so, if you know something which is similar to above please tell me where it is (title, revision, page, paragraph, etc.). Hence, if there is not strict requirement in regards to memory allocation in specs we are lost. Of course we can encourage people to not do nasty things but I do not believe that many will listen. So, sadly, I suppose that we will see more and more machines in the wild which are broken (well, let's say do not align to good practices). On the other hand I think that we should not assume that a given memory region (in our case starting at 1 MiB) is (or will be) available in every machine. I have not seen anything which says that it is true. We should stop doing that even if it works right now. I think that it works by chance and there is a chance that it will stop working one day because somebody will discover that he or she can put there e.g. device or hole. Last but not least. I suppose that Xen users and especially distros will complain when they are not able to use GRUB2 with Xen on every platform just because somebody (i.e. platform designers, developers, or whatever) do not accept our beliefs or we require that platform must obey rules (i.e. memory map requirements) which are specified nowhere. You're right, there's no such requirement on memory use in the spec. But you're missing the point. Supporting grub2 on UEFI is already a hack (ignoring all intentions EFI had from its first days). And now you've found an environment where that hack needs another hack (in Xen) to actually work. That's too much hackery for my taste, the more that things on this system can (afaict) work quite okay (without grub2, or with using its chainloader mechanism). So no, I'm still not convinced of the need for this patch. I disagree. There are many things a grub environment gives us which plain EFI doesn't, most notably a familiar booting environment and recovery facilities for users (which vastly trumps how EFI expects to run things). Furthermore, it offers common management of /boot in the dom0 filesystem between legacy and EFI boots. Therefore I am very much +1 get grub working. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] build: use correct qemu emulator binary
Wei Liu writes (Re: [Xen-devel] [PATCH] build: use correct qemu emulator binary): On Tue, Aug 25, 2015 at 08:45:07AM -0500, Doug Goldstein wrote: Per http://wiki.qemu.org/ChangeLog/1.0 and the fact that no currently supported distro ships the x86 system emulator binary as 'qemu', this changes the default when a user specifies --with-system-qemu without a PATH to 'qemu-system-i386', otherwise the default results in a non-functional setup. Signed-off-by: Doug Goldstein car...@cardoe.com Acked-by: Wei Liu wei.l...@citrix.com Committed-by: Ian Jackson ian.jack...@eu.citrix.com Ian, please rerun autogen.sh when applying this patch. Done. Thanks, Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [DOCS DAY] [PATCH for-4.6 0/5] Cleanup of docs spread throughout the tree
Wei Liu writes (Re: [Xen-devel] [DOCS DAY] [PATCH for-4.6 0/5] Cleanup of docs spread throughout the tree): On Wed, Aug 26, 2015 at 10:09:46AM +0100, Andrew Cooper wrote: Whole series: Acked-by: Wei Liu wei.l...@citrix.com Committed-by: Ian Jackson ian.jack...@eu.citrix.com Thanks, Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [RFC v2 for-4.6 0/2] In-tree feature documentation
On 27/08/15 18:58, Ian Jackson wrote: Andrew Cooper writes (Re: [RFC v2 for-4.6 0/2] In-tree feature documentation): On 27/08/15 15:52, Ian Jackson wrote: I do wonder whether cross-referencing all the issues is a good idea. It seems like it might be a lot of work to keep them in step. I don't expect all the issues to be enumerated (generally, they would be found the first time someone falls over the issue), but where known interaction issues exist, we need to have some place to leave them. I was prompted to ask this because it seemed to me that some of the issues were discussed in other parts of the text or in other patches as well as in `issues'. Which issues are you concerned about? There are plenty of examples where known issues are documented somewhere in the xen-devel archives, or in an individuals head, and neither of these are useful places for the information to exist. I agree with this. I think things should be in the tree, but once. I am certainly not advocating needless repetition, but I don't see anything in the submitted text which would quality. I will happily correct the text if I am mistaken. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v4 2/4] x86/compat: Test both PV and PVH guests for compat mode
On 13.08.15 at 20:12, boris.ostrov...@oracle.com wrote: @@ -777,7 +777,7 @@ int arch_set_info_guest( /* The context is a compat-mode one if the target domain is compat-mode; * we expect the tools to DTRT even in compat-mode callers. */ -compat = is_pv_32bit_domain(d); +compat = is_pv_32bit_domain(d) || is_pvh_32bit_domain(d); I continue to think that this should include a v-domain == current-domain check (to match behavior for HVM guests from the tool stack perspective). Having looked at patch 4, I also can't see how the tool stack is being made expect a non-native guest context record in the 32-bit PVH case (i.e. I'd appreciate if you could point out where that hides). @@ -1203,7 +1204,7 @@ void arch_get_info_guest(struct vcpu *v, vcpu_guest_context_u c) { unsigned int i; const struct domain *d = v-domain; -bool_t compat = is_pv_32bit_domain(d); +bool_t compat = is_pv_32bit_domain(d) || is_pvh_32bit_domain(d); Same here then naturally. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH for 4.6] build: fix tarball stubdom build
On Thu, Aug 27, 2015 at 10:05:56AM -0600, Jan Beulich wrote: On 27.08.15 at 17:54, wei.l...@citrix.com wrote: When we create a source code tarball, mini-os is extracted to extras/mini-os directory. When building a source code tarball, we shouldn't clone mini-os again. Only clone mini-os when that directory doesn't exist. This fixes tarball build and doesn't affect non-tarball build. Signed-off-by: Wei Liu wei.l...@citrix.com Cc: Ian Campbell ian.campb...@citrix.com Cc: Ian Jackson ian.jack...@eu.citrix.com --- Makefile | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/Makefile b/Makefile index e8a75ff..ba0df70 100644 --- a/Makefile +++ b/Makefile @@ -19,10 +19,12 @@ include Config.mk .PHONY: mini-os-dir mini-os-dir: - GIT=$(GIT) $(XEN_ROOT)/scripts/git-checkout.sh \ - $(MINIOS_UPSTREAM_URL) \ - $(MINIOS_UPSTREAM_REVISION) \ - $(XEN_ROOT)/extras/mini-os + if [ ! -d $(XEN_ROOT)/extras/mini-os ]; then \ + GIT=$(GIT) $(XEN_ROOT)/scripts/git-checkout.sh \ + $(MINIOS_UPSTREAM_URL) \ + $(MINIOS_UPSTREAM_REVISION) \ + $(XEN_ROOT)/extras/mini-os ; \ + fi Wouldn't his better be done (avoiding the need for the shell conditional) by simply renaming the make target from mini-os-dir to extras/mini-os (and dropping the .PHONY)? All targets for external trees follow some conventions defined in tools/Makefile. Although I didn't strictly follow the same rules in mini-os's case, I don't want it to deviate from the original rules too much. Wei. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 22/23] x86: make Xen early boot code relocatable
On Thu, Aug 27, 2015 at 07:12:38AM -0600, Jan Beulich wrote: On 20.07.15 at 16:29, daniel.ki...@oracle.com wrote: [...] /* Copy bootstrap trampoline to low memory, below 1MB. */ -mov $sym_phys(trampoline_start),%esi +lea sym_offset(trampoline_start)(%ebp),%esi mov $trampoline_end - trampoline_start,%ecx rep movsb The latest at this final hunk I'm getting tired (and upset). I'd much rather not touch all this code in these fragile ways, and instead require Xen to be booted without grub2 on badly written firmware like the one on the machine you quote in the description. Let's start discussion from here. My further work on this patch series is pointless until we do not agree details in this case. So, I agree that any software (including firmware) should not use precious resources (i.e. low memory in that case) if it is not strictly required. And I do not think so that latest UEFI implementations on new hardware really need this low memory to survive (at least page tables could be put anywhere above low memory). However, in case of UEFI it is a wish of smart people not requirement set by spec. I was checking UEFI docs a few times but I was not able to find anything which says that e.g. ...developer MUST allocate memory outside of low memory Even I have not found any suggestion about that. However, I must admit that I do not know whole UEFI spec, so, if you know something which is similar to above please tell me where it is (title, revision, page, paragraph, etc.). Hence, if there is not strict requirement in regards to memory allocation in specs we are lost. Of course we can encourage people to not do nasty things but I do not believe that many will listen. So, sadly, I suppose that we will see more and more machines in the wild which are broken (well, let's say do not align to good practices). On the other hand I think that we should not assume that a given memory region (in our case starting at 1 MiB) is (or will be) available in every machine. I have not seen anything which says that it is true. We should stop doing that even if it works right now. I think that it works by chance and there is a chance that it will stop working one day because somebody will discover that he or she can put there e.g. device or hole. Last but not least. I suppose that Xen users and especially distros will complain when they are not able to use GRUB2 with Xen on every platform just because somebody (i.e. platform designers, developers, or whatever) do not accept our beliefs or we require that platform must obey rules (i.e. memory map requirements) which are specified nowhere. So, I think that early Xen boot code should be relocatable. However, I agree that there is a chance that may solution is not perfect. So, I suppose, taking into account above, we should discuss how to improve it instead of discussing whether we should do it. Daniel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 5/8] tmem: Move TMEM_CONTROL subop of tmem hypercall to sysctl.
On 27.08.15 at 13:02, konrad.w...@oracle.com wrote: @@ -2666,9 +2662,9 @@ long do_tmem_op(tmem_cli_op_t uops) /* Acquire wirte lock for all command at first */ write_lock(tmem_rwlock); -if ( op.cmd == TMEM_CONTROL ) +if ( op.cmd == TMEM_CONTROL_MOVED ) { -rc = do_tmem_control(op); +rc = -ENOSYS; As in other similar cases I'd prefer -EOPNOTSUPP here to prevent callers from implying the tmem hypercall as a whole is unimplemented. --- a/xen/include/public/sysctl.h +++ b/xen/include/public/sysctl.h @@ -710,6 +710,48 @@ struct xen_sysctl_psr_cat_op { typedef struct xen_sysctl_psr_cat_op xen_sysctl_psr_cat_op_t; DEFINE_XEN_GUEST_HANDLE(xen_sysctl_psr_cat_op_t); +#define XEN_SYSCTL_TMEM_OP_ALL_CLIENTS 0xU + +#define XEN_SYSCTL_TMEM_OP_THAW 0 +#define XEN_SYSCTL_TMEM_OP_FREEZE 1 +#define XEN_SYSCTL_TMEM_OP_FLUSH 2 +#define XEN_SYSCTL_TMEM_OP_DESTROY3 +#define XEN_SYSCTL_TMEM_OP_LIST 4 +#define XEN_SYSCTL_TMEM_OP_SET_WEIGHT 5 +#define XEN_SYSCTL_TMEM_OP_SET_CAP6 +#define XEN_SYSCTL_TMEM_OP_SET_COMPRESS 7 +#define XEN_SYSCTL_TMEM_OP_QUERY_FREEABLE_MB 8 +#define XEN_SYSCTL_TMEM_OP_SAVE_BEGIN 10 +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_VERSION 11 +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_MAXPOOLS 12 +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_CLIENT_WEIGHT 13 +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_CLIENT_CAP14 +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_CLIENT_FLAGS 15 +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_POOL_FLAGS16 +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_POOL_NPAGES 17 +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_POOL_UUID 18 +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_NEXT_PAGE 19 +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_NEXT_INV 20 +#define XEN_SYSCTL_TMEM_OP_SAVE_END 21 +#define XEN_SYSCTL_TMEM_OP_RESTORE_BEGIN 30 +#define XEN_SYSCTL_TMEM_OP_RESTORE_PUT_PAGE 32 +#define XEN_SYSCTL_TMEM_OP_RESTORE_FLUSH_PAGE 33 Perhaps better to have all these in tmem.h, to not clutter this header? +struct xen_sysctl_tmem_op { +uint32_t cmd; /* IN: XEN_SYSCTL_TMEM_OP_* . */ +int32_t pool_id;/* IN: 0 by default unless _SAVE_*, RESTORE_* .*/ +uint32_t cli_id;/* IN: client id, 0 for XEN_SYSCTL_TMEM_QUERY_FREEABLE_MB + for all others can be the domain id or + XEN_SYSCTL_TMEM_OP_ALL_CLIENTS for all. */ +uint32_t arg1; /* IN: If not applicable to command use 0. */ +uint32_t arg2; /* IN: If not applicable to command use 0. */ +uint8_t pad[4];/* Padding so structure is the same under 32 and 64. */ uint32_t please. And despite this being an (easily changeable) sysctl, verifying that it's zero on input would be nice. --- a/xen/include/public/tmem.h +++ b/xen/include/public/tmem.h @@ -33,7 +33,7 @@ #define TMEM_SPEC_VERSION 1 /* Commands to HYPERVISOR_tmem_op() */ -#define TMEM_CONTROL 0 +#define TMEM_CONTROL_MOVED 0 Perhaps say where it moved in a brief comment? --- a/xen/xsm/flask/hooks.c +++ b/xen/xsm/flask/hooks.c @@ -761,6 +761,9 @@ static int flask_sysctl(int cmd) case XEN_SYSCTL_tbuf_op: return domain_has_xen(current-domain, XEN__TBUFCONTROL); +case XEN_SYSCTL_tmem_op: +return domain_has_xen(current-domain, XEN__TMEM_CONTROL); + case XEN_SYSCTL_sched_id: return domain_has_xen(current-domain, XEN__GETSCHEDULER); Hmm, these cases appear to be roughly sorted numerically, i.e. yours would normally go at the end. Despite the comments I see no reason for this (once adjusted where needed) not to go in for 4.6. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [RFC v2 for-4.6 0/2] In-tree feature documentation
On 27/08/15 15:52, Ian Jackson wrote: Andrew Cooper writes ([RFC v2 for-4.6 0/2] In-tree feature documentation): An issue which Xen has is an uncertain support statement for features. Given the success seen with docs/misc/xen-command-line.markdown, and in particular keeping it up to date, introduce a similar system for features. Patch 1 introduces a proposed template (and a makefile tweak to include the new docs/features subdirectory), while patch 2 is a feature document covering the topic of migration. v2 Adds %Revision and #History table, following feedback from v1. This is tagged RFC as I expect people to have different views as to what is useful to include. I would particilarly appreciate feedback on the template before it starts getting used widely. Lars: Does this look like a reasonable counterpart to your formal support statement document? Jim: Per your request at the summit for new information, is patch 2 suitable? I have read both patches. I do wonder whether cross-referencing all the issues is a good idea. It seems like it might be a lot of work to keep them in step. I don't expect all the issues to be enumerated (generally, they would be found the first time someone falls over the issue), but where known interaction issues exist, we need to have some place to leave them. There are plenty of examples where known issues are documented somewhere in the xen-devel archives, or in an individuals head, and neither of these are useful places for the information to exist. I would hope that over time the number of issues decreases to 0 (that would be the day!), but a good second is at least to have a formal acknowledge that there are issues, which can be located by the next unfortunate sole to hit the issue. I don't have anything else to add to the comments that others have made. Overall I think this is a good template. The extra overhead may even be negative. The work of writing up a feature in the style of this document may obviate the need to put much of the same information in a 0/N or a design document, and the existing template is quite lightweight. I hadn't considered this side benefit when designing the template. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] x86: wrap kexec feature with CONFIG_KEXEC
On 27/08/15 16:34, Jan Beulich wrote: On 27.08.15 at 17:22, andrew.coop...@citrix.com wrote: On 27/08/15 15:47, Jonathan Creekmore wrote: @@ -812,7 +816,11 @@ ENTRY(hypercall_args_table) .byte 2 /* do_hvm_op*/ .byte 1 /* do_sysctl*/ /* 35 */ .byte 1 /* do_domctl*/ +#ifdef CONFIG_KEXEC .byte 2 /* do_kexec */ +#else +.byte 0 /* do_ni_hypercall */ +#endif These changes will corrupt guest registers in debug builds. It's quite the other way around: Ah yes - of course. Sorry for the noise. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] x86: wrap kexec feature with CONFIG_KEXEC
On 27/08/15 15:47, Jonathan Creekmore wrote: Add the appropriate #if checks around the kexec code in the x86 codebase so that the feature can actually be turned off by the flag instead of always required to be enabled on x86. What's your use case for this? I think you should consider providing empty stub functions for !KEXEC instead of all the #ifdefs. --- a/xen/drivers/passthrough/vtd/dmar.h +++ b/xen/drivers/passthrough/vtd/dmar.h @@ -108,6 +108,7 @@ struct acpi_atsr_unit *acpi_find_matched_atsr_unit(const struct pci_dev *); #define DMAR_OPERATION_TIMEOUT MILLISECS(1000) +#ifdef CONFIG_KEXEC #define IOMMU_WAIT_OP(iommu, offset, op, cond, sts) \ do {\ s_time_t start_time = NOW();\ @@ -125,6 +126,22 @@ do {\ cpu_relax();\ } \ } while (0) +#else +#define IOMMU_WAIT_OP(iommu, offset, op, cond, sts) \ +do {\ +s_time_t start_time = NOW();\ +while (1) { \ +sts = op(iommu-reg, offset); \ +if ( cond ) \ +break; \ +if ( NOW() start_time + DMAR_OPERATION_TIMEOUT ) {\ + panic(%s:%d:%s: DMAR hardware is malfunctional, \ + __FILE__, __LINE__, __func__); \ +} \ +cpu_relax();\ +} \ +} while (0) +#endif This is particular might be best done by making kexecing a static const variable equal to 0 in the !KEXEC case. --- a/xen/include/asm-x86/config.h +++ b/xen/include/asm-x86/config.h @@ -1,6 +1,6 @@ /** * config.h - * + * * A Linux-style configuration list. */ Stray whitespace change. David ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [added to the 3.18 stable tree] x86/xen: Probe target addresses in set_aliased_prot() before the hypercall
From: Andy Lutomirski l...@kernel.org This patch has been added to the 3.18 stable tree. If you have any objections, please let us know. === [ Upstream commit aa1acff356bbedfd03b544051f5b371746735d89 ] The update_va_mapping hypercall can fail if the VA isn't present in the guest's page tables. Under certain loads, this can result in an OOPS when the target address is in unpopulated vmap space. While we're at it, add comments to help explain what's going on. This isn't a great long-term fix. This code should probably be changed to use something like set_memory_ro. Signed-off-by: Andy Lutomirski l...@kernel.org Cc: Andrew Cooper andrew.coop...@citrix.com Cc: Andy Lutomirski l...@amacapital.net Cc: Boris Ostrovsky boris.ostrov...@oracle.com Cc: Borislav Petkov b...@alien8.de Cc: Brian Gerst brge...@gmail.com Cc: David Vrabel dvra...@cantab.net Cc: Denys Vlasenko dvlas...@redhat.com Cc: H. Peter Anvin h...@zytor.com Cc: Jan Beulich jbeul...@suse.com Cc: Konrad Rzeszutek Wilk konrad.w...@oracle.com Cc: Linus Torvalds torva...@linux-foundation.org Cc: Peter Zijlstra pet...@infradead.org Cc: Sasha Levin sasha.le...@oracle.com Cc: Steven Rostedt rost...@goodmis.org Cc: Thomas Gleixner t...@linutronix.de Cc: secur...@kernel.org secur...@kernel.org Cc: sta...@vger.kernel.org Cc: xen-devel xen-devel@lists.xen.org Link: http://lkml.kernel.org/r/0b0e55b995cda11e7829f140b833ef932fcabe3a.1438291540.git.l...@kernel.org Signed-off-by: Ingo Molnar mi...@kernel.org Signed-off-by: Sasha Levin sasha.le...@oracle.com --- arch/x86/xen/enlighten.c | 40 1 file changed, 40 insertions(+) diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c index 7d67146..e180e09 100644 --- a/arch/x86/xen/enlighten.c +++ b/arch/x86/xen/enlighten.c @@ -481,6 +481,7 @@ static void set_aliased_prot(void *v, pgprot_t prot) pte_t pte; unsigned long pfn; struct page *page; + unsigned char dummy; ptep = lookup_address((unsigned long)v, level); BUG_ON(ptep == NULL); @@ -490,6 +491,32 @@ static void set_aliased_prot(void *v, pgprot_t prot) pte = pfn_pte(pfn, prot); + /* +* Careful: update_va_mapping() will fail if the virtual address +* we're poking isn't populated in the page tables. We don't +* need to worry about the direct map (that's always in the page +* tables), but we need to be careful about vmap space. In +* particular, the top level page table can lazily propagate +* entries between processes, so if we've switched mms since we +* vmapped the target in the first place, we might not have the +* top-level page table entry populated. +* +* We disable preemption because we want the same mm active when +* we probe the target and when we issue the hypercall. We'll +* have the same nominal mm, but if we're a kernel thread, lazy +* mm dropping could change our pgd. +* +* Out of an abundance of caution, this uses __get_user() to fault +* in the target address just in case there's some obscure case +* in which the target address isn't readable. +*/ + + preempt_disable(); + + pagefault_disable();/* Avoid warnings due to being atomic. */ + __get_user(dummy, (unsigned char __user __force *)v); + pagefault_enable(); + if (HYPERVISOR_update_va_mapping((unsigned long)v, pte, 0)) BUG(); @@ -501,6 +528,8 @@ static void set_aliased_prot(void *v, pgprot_t prot) BUG(); } else kmap_flush_unused(); + + preempt_enable(); } static void xen_alloc_ldt(struct desc_struct *ldt, unsigned entries) @@ -508,6 +537,17 @@ static void xen_alloc_ldt(struct desc_struct *ldt, unsigned entries) const unsigned entries_per_page = PAGE_SIZE / LDT_ENTRY_SIZE; int i; + /* +* We need to mark the all aliases of the LDT pages RO. We +* don't need to call vm_flush_aliases(), though, since that's +* only responsible for flushing aliases out the TLBs, not the +* page tables, and Xen will flush the TLB for us if needed. +* +* To avoid confusing future readers: none of this is necessary +* to load the LDT. The hypervisor only checks this when the +* LDT is faulted in due to subsequent descriptor access. +*/ + for(i = 0; i entries; i += entries_per_page) set_aliased_prot(ldt + i, PAGE_KERNEL_RO); } -- 2.1.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] x86: wrap kexec feature with CONFIG_KEXEC
On Aug 27, 2015, at 10:27 AM, David Vrabel david.vra...@citrix.com wrote: On 27/08/15 15:47, Jonathan Creekmore wrote: Add the appropriate #if checks around the kexec code in the x86 codebase so that the feature can actually be turned off by the flag instead of always required to be enabled on x86. What's your use case for this? The use case is for a slimmed down version of the hypervisor that can be used as a security hypervisor, exposing as little extra functionality as possible. When looking for features to trim out to reduce the attack surface, I saw the flag for KEXEC and wanted to disable that, then ran into compile problems. I think you should consider providing empty stub functions for !KEXEC instead of all the #ifdefs. Good points, all, who mentioned the empty stub functions. I am revising it now to use stubs instead of all of the #ifdefs. --- a/xen/include/asm-x86/config.h +++ b/xen/include/asm-x86/config.h @@ -1,6 +1,6 @@ /** * config.h - * + * * A Linux-style configuration list. */ Stray whitespace change. I guess I really need to just turn off that feature (stripping trailing whitespace) in my editor. So useful when writing new code, not as helpful when trying to modify existing code. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH for 4.6] build: fix tarball stubdom build
On 27.08.15 at 17:54, wei.l...@citrix.com wrote: When we create a source code tarball, mini-os is extracted to extras/mini-os directory. When building a source code tarball, we shouldn't clone mini-os again. Only clone mini-os when that directory doesn't exist. This fixes tarball build and doesn't affect non-tarball build. Signed-off-by: Wei Liu wei.l...@citrix.com Cc: Ian Campbell ian.campb...@citrix.com Cc: Ian Jackson ian.jack...@eu.citrix.com --- Makefile | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/Makefile b/Makefile index e8a75ff..ba0df70 100644 --- a/Makefile +++ b/Makefile @@ -19,10 +19,12 @@ include Config.mk .PHONY: mini-os-dir mini-os-dir: - GIT=$(GIT) $(XEN_ROOT)/scripts/git-checkout.sh \ - $(MINIOS_UPSTREAM_URL) \ - $(MINIOS_UPSTREAM_REVISION) \ - $(XEN_ROOT)/extras/mini-os + if [ ! -d $(XEN_ROOT)/extras/mini-os ]; then \ + GIT=$(GIT) $(XEN_ROOT)/scripts/git-checkout.sh \ + $(MINIOS_UPSTREAM_URL) \ + $(MINIOS_UPSTREAM_REVISION) \ + $(XEN_ROOT)/extras/mini-os ; \ + fi Wouldn't his better be done (avoiding the need for the shell conditional) by simply renaming the make target from mini-os-dir to extras/mini-os (and dropping the .PHONY)? Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [RFC v2 for-4.6 0/2] In-tree feature documentation
Andrew Cooper writes ([RFC v2 for-4.6 0/2] In-tree feature documentation): An issue which Xen has is an uncertain support statement for features. Given the success seen with docs/misc/xen-command-line.markdown, and in particular keeping it up to date, introduce a similar system for features. Patch 1 introduces a proposed template (and a makefile tweak to include the new docs/features subdirectory), while patch 2 is a feature document covering the topic of migration. v2 Adds %Revision and #History table, following feedback from v1. This is tagged RFC as I expect people to have different views as to what is useful to include. I would particilarly appreciate feedback on the template before it starts getting used widely. Lars: Does this look like a reasonable counterpart to your formal support statement document? Jim: Per your request at the summit for new information, is patch 2 suitable? I have read both patches. I do wonder whether cross-referencing all the issues is a good idea. It seems like it might be a lot of work to keep them in step. I don't have anything else to add to the comments that others have made. Overall I think this is a good template. The extra overhead may even be negative. The work of writing up a feature in the style of this document may obviate the need to put much of the same information in a 0/N or a design document, and the existing template is quite lightweight. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 5/8] tmem: Move TMEM_CONTROL subop of tmem hypercall to sysctl.
On 27.08.15 at 15:12, andrew.coop...@citrix.com wrote: On 27/08/15 12:02, Konrad Rzeszutek Wilk wrote: --- a/xen/include/xen/tmem.h +++ b/xen/include/xen/tmem.h @@ -9,6 +9,9 @@ #ifndef __XEN_TMEM_H__ #define __XEN_TMEM_H__ +struct xen_sysctl_tmem_op; #includepublic/sysctl.h please. Why? Forward declarations like this help not including all headers everywhere. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 8/8] tmem: Spelling mistakes.
On 27.08.15 at 13:02, konrad.w...@oracle.com wrote: Signed-off-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com --- xen/common/tmem.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/xen/common/tmem.c b/xen/common/tmem.c index 66d2852..9bd75d8 100644 --- a/xen/common/tmem.c +++ b/xen/common/tmem.c @@ -2659,7 +2659,7 @@ long do_tmem_op(tmem_cli_op_t uops) return -EFAULT; } -/* Acquire wirte lock for all command at first */ +/* Acquire write lock for all command at first */ Mind adding the missing full stop at once? Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] x86: wrap kexec feature with CONFIG_KEXEC
On 27.08.15 at 16:47, jonathan.creekm...@gmail.com wrote: Add the appropriate #if checks around the kexec code in the x86 codebase so that the feature can actually be turned off by the flag instead of always required to be enabled on x86. But you realize that these HAVE_* variables aren't meant to be used for disabling features? If you really wanted something like that, you'd need to introduce kexec=y in the top level Makefile, overridable by kexec=n on the make command line. --- a/xen/arch/x86/Makefile +++ b/xen/arch/x86/Makefile @@ -56,8 +56,8 @@ obj-y += trace.o obj-y += traps.o obj-y += usercopy.o obj-y += x86_emulate.o -obj-y += machine_kexec.o -obj-y += crash.o +obj-$(HAS_KEXEC) += machine_kexec.o +obj-$(HAS_KEXEC) += crash.o obj-y += tboot.o obj-y += hpet.o obj-y += vm_event.o If you already touch this, please move the two altered lines to their proper (alphabetical) slot. --- a/xen/arch/x86/crash.c +++ b/xen/arch/x86/crash.c @@ -64,7 +64,9 @@ static void noreturn do_nmi_crash(const struct cpu_user_regs *regs) */ set_ist(idt_tables[cpu][TRAP_machine_check], IST_NONE); +#ifdef CONFIG_KEXEC kexec_crash_save_cpu(); +#endif In cases like this code will look much better if you introduced a no-op inline function for the !CONFIG_KEXEC case in the appropriate header. --- a/xen/arch/x86/x86_64/compat/entry.S +++ b/xen/arch/x86/x86_64/compat/entry.S Similarly at least the changes here ... --- a/xen/arch/x86/x86_64/entry.S +++ b/xen/arch/x86/x86_64/entry.S .. and here will want better abstraction, to not further uglify the code. --- a/xen/drivers/passthrough/vtd/dmar.h +++ b/xen/drivers/passthrough/vtd/dmar.h @@ -108,6 +108,7 @@ struct acpi_atsr_unit *acpi_find_matched_atsr_unit(const struct pci_dev *); #define DMAR_OPERATION_TIMEOUT MILLISECS(1000) +#ifdef CONFIG_KEXEC #define IOMMU_WAIT_OP(iommu, offset, op, cond, sts) \ do {\ s_time_t start_time = NOW();\ @@ -125,6 +126,22 @@ do {\ cpu_relax();\ } \ } while (0) +#else +#define IOMMU_WAIT_OP(iommu, offset, op, cond, sts) \ +do {\ +s_time_t start_time = NOW();\ +while (1) { \ +sts = op(iommu-reg, offset); \ +if ( cond ) \ +break; \ +if ( NOW() start_time + DMAR_OPERATION_TIMEOUT ) {\ + panic(%s:%d:%s: DMAR hardware is malfunctional, \ + __FILE__, __LINE__, __func__); \ +} \ +cpu_relax();\ +} \ +} while (0) +#endif Along the same lines here - no way for such a macro to be duplicated for a case likely no-one but you will ever build-test. --- a/xen/include/asm-x86/config.h +++ b/xen/include/asm-x86/config.h @@ -1,6 +1,6 @@ / ** * config.h - * + * * A Linux-style configuration list. */ Please leave out this unrelated white space change. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] x86: wrap kexec feature with CONFIG_KEXEC
On 27/08/15 15:47, Jonathan Creekmore wrote: Add the appropriate #if checks around the kexec code in the x86 codebase so that the feature can actually be turned off by the flag instead of always required to be enabled on x86. Signed-off-by: Jonathan Creekmore jonathan.creekm...@gmail.com In principle, this is a good change, but is definitely aimed at 4.7 now. diff --git a/xen/arch/x86/x86_64/compat/entry.S b/xen/arch/x86/x86_64/compat/entry.S index 1521779..e2fe49c 100644 --- a/xen/arch/x86/x86_64/compat/entry.S +++ b/xen/arch/x86/x86_64/compat/entry.S @@ -428,7 +428,11 @@ ENTRY(compat_hypercall_table) .quad do_hvm_op .quad do_sysctl /* 35 */ .quad do_domctl +#if CONFIG_KEXEC .quad compat_kexec_op +#else +.quad do_ni_hypercall +#endif .quad do_tmem_op .quad do_ni_hypercall /* reserved for XenClient */ .quad do_xenpmu_op /* 40 */ @@ -479,6 +483,11 @@ ENTRY(compat_hypercall_args_table) .byte 2 /* do_hvm_op*/ .byte 1 /* do_sysctl*/ /* 35 */ .byte 1 /* do_domctl*/ +#if CONFIG_KEXEC +.byte 2 /* compat_kexec_op */ +#else + .byte 0 /* do_ni_hypercall */ +#endif You have a hard tab here. .byte 2 /* compat_kexec_op */ .byte 1 /* do_tmem_op */ .byte 0 /* reserved for XenClient */ diff --git a/xen/arch/x86/x86_64/entry.S b/xen/arch/x86/x86_64/entry.S index 74677a2..357f3d5 100644 --- a/xen/arch/x86/x86_64/entry.S +++ b/xen/arch/x86/x86_64/entry.S @@ -761,7 +761,11 @@ ENTRY(hypercall_table) .quad do_hvm_op .quad do_sysctl /* 35 */ .quad do_domctl +#ifdef CONFIG_KEXEC .quad do_kexec_op +#else + .quad do_ni_hypercall +#endif .quad do_tmem_op .quad do_ni_hypercall /* reserved for XenClient */ .quad do_xenpmu_op /* 40 */ @@ -812,7 +816,11 @@ ENTRY(hypercall_args_table) .byte 2 /* do_hvm_op*/ .byte 1 /* do_sysctl*/ /* 35 */ .byte 1 /* do_domctl*/ +#ifdef CONFIG_KEXEC .byte 2 /* do_kexec */ +#else +.byte 0 /* do_ni_hypercall */ +#endif These changes will corrupt guest registers in debug builds. Don't alter the args table at all, and use #ifndef CONFIG_KEXEC #define do_kexec_op do_ni_hypercall #define compat_kexec_op do_ni_hypercall #endif rather than patching the hypercall table itself. ~Andrew .byte 1 /* do_tmem_op */ .byte 0 /* reserved for XenClient */ .byte 2 /* do_xenpmu_op */ /* 40 */ diff --git a/xen/drivers/passthrough/vtd/dmar.h b/xen/drivers/passthrough/vtd/dmar.h index 729b603..697e5e5 100644 --- a/xen/drivers/passthrough/vtd/dmar.h +++ b/xen/drivers/passthrough/vtd/dmar.h @@ -108,6 +108,7 @@ struct acpi_atsr_unit *acpi_find_matched_atsr_unit(const struct pci_dev *); #define DMAR_OPERATION_TIMEOUT MILLISECS(1000) +#ifdef CONFIG_KEXEC #define IOMMU_WAIT_OP(iommu, offset, op, cond, sts) \ do {\ s_time_t start_time = NOW();\ @@ -125,6 +126,22 @@ do {\ cpu_relax();\ } \ } while (0) +#else +#define IOMMU_WAIT_OP(iommu, offset, op, cond, sts) \ +do {\ +s_time_t start_time = NOW();\ +while (1) { \ +sts = op(iommu-reg, offset); \ +if ( cond ) \ +break; \ +if ( NOW() start_time + DMAR_OPERATION_TIMEOUT ) {\ + panic(%s:%d:%s: DMAR hardware is malfunctional, \ + __FILE__, __LINE__, __func__); \ +} \ +cpu_relax();\ +} \ +} while (0) +#endif int vtd_hw_check(void); void disable_pmr(struct iommu *iommu); diff --git a/xen/include/asm-x86/config.h b/xen/include/asm-x86/config.h index 3e9be83..e84e489 100644 --- a/xen/include/asm-x86/config.h +++ b/xen/include/asm-x86/config.h @@ -1,6 +1,6 @@ /** * config.h - * + * * A Linux-style configuration list. */ This is an unrelated hunk and should be dropped. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] x86: wrap kexec feature with CONFIG_KEXEC
On 27.08.15 at 17:27, david.vra...@citrix.com wrote: On 27/08/15 15:47, Jonathan Creekmore wrote: @@ -125,6 +126,22 @@ do {\ cpu_relax();\ } \ } while (0) +#else +#define IOMMU_WAIT_OP(iommu, offset, op, cond, sts) \ +do {\ +s_time_t start_time = NOW();\ +while (1) { \ +sts = op(iommu-reg, offset); \ +if ( cond ) \ +break; \ +if ( NOW() start_time + DMAR_OPERATION_TIMEOUT ) {\ + panic(%s:%d:%s: DMAR hardware is malfunctional, \ +__FILE__, __LINE__, __func__); \ +} \ +cpu_relax();\ +} \ +} while (0) +#endif This is particular might be best done by making kexecing a static const variable equal to 0 in the !KEXEC case. Or even a #define (unless there's some use somewhere preventing that). Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v4 3/4] x86/pvh: Handle hypercalls for 32b PVH guests
On 13.08.15 at 20:12, boris.ostrov...@oracle.com wrote: Signed-off-by: Boris Ostrovsky boris.ostrov...@oracle.com Reviewed-by: Jan Beulich jbeul...@suse.com ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 22/23] x86: make Xen early boot code relocatable
On 27.08.15 at 17:10, daniel.ki...@oracle.com wrote: On Thu, Aug 27, 2015 at 07:12:38AM -0600, Jan Beulich wrote: On 20.07.15 at 16:29, daniel.ki...@oracle.com wrote: /* Copy bootstrap trampoline to low memory, below 1MB. */ -mov $sym_phys(trampoline_start),%esi +lea sym_offset(trampoline_start)(%ebp),%esi mov $trampoline_end - trampoline_start,%ecx rep movsb The latest at this final hunk I'm getting tired (and upset). I'd much rather not touch all this code in these fragile ways, and instead require Xen to be booted without grub2 on badly written firmware like the one on the machine you quote in the description. Let's start discussion from here. My further work on this patch series is pointless until we do not agree details in this case. So, I agree that any software (including firmware) should not use precious resources (i.e. low memory in that case) if it is not strictly required. And I do not think so that latest UEFI implementations on new hardware really need this low memory to survive (at least page tables could be put anywhere above low memory). However, in case of UEFI it is a wish of smart people not requirement set by spec. I was checking UEFI docs a few times but I was not able to find anything which says that e.g. ...developer MUST allocate memory outside of low memory Even I have not found any suggestion about that. However, I must admit that I do not know whole UEFI spec, so, if you know something which is similar to above please tell me where it is (title, revision, page, paragraph, etc.). Hence, if there is not strict requirement in regards to memory allocation in specs we are lost. Of course we can encourage people to not do nasty things but I do not believe that many will listen. So, sadly, I suppose that we will see more and more machines in the wild which are broken (well, let's say do not align to good practices). On the other hand I think that we should not assume that a given memory region (in our case starting at 1 MiB) is (or will be) available in every machine. I have not seen anything which says that it is true. We should stop doing that even if it works right now. I think that it works by chance and there is a chance that it will stop working one day because somebody will discover that he or she can put there e.g. device or hole. Last but not least. I suppose that Xen users and especially distros will complain when they are not able to use GRUB2 with Xen on every platform just because somebody (i.e. platform designers, developers, or whatever) do not accept our beliefs or we require that platform must obey rules (i.e. memory map requirements) which are specified nowhere. You're right, there's no such requirement on memory use in the spec. But you're missing the point. Supporting grub2 on UEFI is already a hack (ignoring all intentions EFI had from its first days). And now you've found an environment where that hack needs another hack (in Xen) to actually work. That's too much hackery for my taste, the more that things on this system can (afaict) work quite okay (without grub2, or with using its chainloader mechanism). So no, I'm still not convinced of the need for this patch. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 2/8] tmem: Add ASSERT in obj_rb_insert for pool-rwlock lock.
On 27.08.15 at 13:01, konrad.w...@oracle.com wrote: --- a/xen/common/tmem.c +++ b/xen/common/tmem.c @@ -915,6 +915,11 @@ static int obj_rb_insert(struct rb_root *root, struct tmem_object_root *obj) { struct rb_node **new, *parent = NULL; struct tmem_object_root *this; +struct tmem_pool *pool; + +pool = obj-pool; +ASSERT(pool != NULL); +ASSERT_WRITELOCK(pool-pool_rwlock); Considering the new variable is used only in ASSERT()s, I'd recommend against its introduction. But it's your code ... Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH for 4.6] build: fix tarball stubdom build
When we create a source code tarball, mini-os is extracted to extras/mini-os directory. When building a source code tarball, we shouldn't clone mini-os again. Only clone mini-os when that directory doesn't exist. This fixes tarball build and doesn't affect non-tarball build. Signed-off-by: Wei Liu wei.l...@citrix.com Cc: Ian Campbell ian.campb...@citrix.com Cc: Ian Jackson ian.jack...@eu.citrix.com --- Makefile | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/Makefile b/Makefile index e8a75ff..ba0df70 100644 --- a/Makefile +++ b/Makefile @@ -19,10 +19,12 @@ include Config.mk .PHONY: mini-os-dir mini-os-dir: - GIT=$(GIT) $(XEN_ROOT)/scripts/git-checkout.sh \ - $(MINIOS_UPSTREAM_URL) \ - $(MINIOS_UPSTREAM_REVISION) \ - $(XEN_ROOT)/extras/mini-os + if [ ! -d $(XEN_ROOT)/extras/mini-os ]; then \ + GIT=$(GIT) $(XEN_ROOT)/scripts/git-checkout.sh \ + $(MINIOS_UPSTREAM_URL) \ + $(MINIOS_UPSTREAM_REVISION) \ + $(XEN_ROOT)/extras/mini-os ; \ + fi .PHONY: mini-os-dir-force-update mini-os-dir-force-update: mini-os-dir -- 2.1.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 5/8] tmem: Move TMEM_CONTROL subop of tmem hypercall to sysctl.
On Thu, Aug 27, 2015 at 09:11:35AM -0600, Jan Beulich wrote: On 27.08.15 at 13:02, konrad.w...@oracle.com wrote: @@ -2666,9 +2662,9 @@ long do_tmem_op(tmem_cli_op_t uops) /* Acquire wirte lock for all command at first */ write_lock(tmem_rwlock); -if ( op.cmd == TMEM_CONTROL ) +if ( op.cmd == TMEM_CONTROL_MOVED ) { -rc = do_tmem_control(op); +rc = -ENOSYS; As in other similar cases I'd prefer -EOPNOTSUPP here to prevent callers from implying the tmem hypercall as a whole is unimplemented. --- a/xen/include/public/sysctl.h +++ b/xen/include/public/sysctl.h @@ -710,6 +710,48 @@ struct xen_sysctl_psr_cat_op { typedef struct xen_sysctl_psr_cat_op xen_sysctl_psr_cat_op_t; DEFINE_XEN_GUEST_HANDLE(xen_sysctl_psr_cat_op_t); +#define XEN_SYSCTL_TMEM_OP_ALL_CLIENTS 0xU + +#define XEN_SYSCTL_TMEM_OP_THAW 0 +#define XEN_SYSCTL_TMEM_OP_FREEZE 1 +#define XEN_SYSCTL_TMEM_OP_FLUSH 2 +#define XEN_SYSCTL_TMEM_OP_DESTROY3 +#define XEN_SYSCTL_TMEM_OP_LIST 4 +#define XEN_SYSCTL_TMEM_OP_SET_WEIGHT 5 +#define XEN_SYSCTL_TMEM_OP_SET_CAP6 +#define XEN_SYSCTL_TMEM_OP_SET_COMPRESS 7 +#define XEN_SYSCTL_TMEM_OP_QUERY_FREEABLE_MB 8 +#define XEN_SYSCTL_TMEM_OP_SAVE_BEGIN 10 +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_VERSION 11 +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_MAXPOOLS 12 +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_CLIENT_WEIGHT 13 +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_CLIENT_CAP14 +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_CLIENT_FLAGS 15 +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_POOL_FLAGS16 +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_POOL_NPAGES 17 +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_POOL_UUID 18 +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_NEXT_PAGE 19 +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_NEXT_INV 20 +#define XEN_SYSCTL_TMEM_OP_SAVE_END 21 +#define XEN_SYSCTL_TMEM_OP_RESTORE_BEGIN 30 +#define XEN_SYSCTL_TMEM_OP_RESTORE_PUT_PAGE 32 +#define XEN_SYSCTL_TMEM_OP_RESTORE_FLUSH_PAGE 33 Perhaps better to have all these in tmem.h, to not clutter this header? Yes and no. The other sysctl had their #defines for commands here so I figured I would follow that rule. But I am OK keeping it in tmem.h and just do /* For the 'cmd' values consule tmem.h. Look for XEN_SYSCTL_TMEM_OP_* */ +struct xen_sysctl_tmem_op { +uint32_t cmd; /* IN: XEN_SYSCTL_TMEM_OP_* . */ +int32_t pool_id;/* IN: 0 by default unless _SAVE_*, RESTORE_* .*/ +uint32_t cli_id;/* IN: client id, 0 for XEN_SYSCTL_TMEM_QUERY_FREEABLE_MB + for all others can be the domain id or + XEN_SYSCTL_TMEM_OP_ALL_CLIENTS for all. */ +uint32_t arg1; /* IN: If not applicable to command use 0. */ +uint32_t arg2; /* IN: If not applicable to command use 0. */ +uint8_t pad[4];/* Padding so structure is the same under 32 and 64. */ uint32_t please. And despite this being an (easily changeable) sysctl, verifying that it's zero on input would be nice. Yes! That was what I forgotten! --- a/xen/include/public/tmem.h +++ b/xen/include/public/tmem.h @@ -33,7 +33,7 @@ #define TMEM_SPEC_VERSION 1 /* Commands to HYPERVISOR_tmem_op() */ -#define TMEM_CONTROL 0 +#define TMEM_CONTROL_MOVED 0 Perhaps say where it moved in a brief comment? Yes, good idea. /* Deprecated. These operations are done now via XEN_SYSCTL_tmem_op * using the sysctl hypercall. */ --- a/xen/xsm/flask/hooks.c +++ b/xen/xsm/flask/hooks.c @@ -761,6 +761,9 @@ static int flask_sysctl(int cmd) case XEN_SYSCTL_tbuf_op: return domain_has_xen(current-domain, XEN__TBUFCONTROL); +case XEN_SYSCTL_tmem_op: +return domain_has_xen(current-domain, XEN__TMEM_CONTROL); + case XEN_SYSCTL_sched_id: return domain_has_xen(current-domain, XEN__GETSCHEDULER); Hmm, these cases appear to be roughly sorted numerically, i.e. yours would normally go at the end. I recall a comment from Andrew asking the newly introduced commands to be in alphabetical order. But perhaps that was for domctl which is more .. ah.. random? Anyhow will make it in numerical order. Despite the comments I see no reason for this (once adjusted where needed) not to go in for 4.6. Thank you. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 5/8] tmem: Move TMEM_CONTROL subop of tmem hypercall to sysctl.
On Thu, Aug 27, 2015 at 02:12:35PM +0100, Andrew Cooper wrote: On 27/08/15 12:02, Konrad Rzeszutek Wilk wrote: --- a/tools/python/xen/lowlevel/xc/xc.c +++ b/tools/python/xen/lowlevel/xc/xc.c @@ -1808,25 +1808,25 @@ static PyObject *pyxc_tmem_control(XcObject *self, pool_id, subop, cli_id, arg1, arg2, buf) ) return NULL; -if ( (subop == TMEMC_LIST) (arg1 32768) ) +if ( (subop == XEN_SYSCTL_TMEM_OP_LIST) (arg1 32768) ) arg1 = 32768; if ( (rc = xc_tmem_control(self-xc_handle, pool_id, subop, cli_id, arg1, arg2, buffer)) 0 ) return Py_BuildValue(i, rc); switch (subop) { -case TMEMC_LIST: +case XEN_SYSCTL_TMEM_OP_LIST: return Py_BuildValue(s, buffer); -case TMEMC_FLUSH: +case XEN_SYSCTL_TMEM_OP_FLUSH: return Py_BuildValue(i, rc); -case TMEMC_QUERY_FREEABLE_MB: +case XEN_SYSCTL_TMEM_OP_QUERY_FREEABLE_MB: return Py_BuildValue(i, rc); -case TMEMC_THAW: -case TMEMC_FREEZE: -case TMEMC_DESTROY: -case TMEMC_SET_WEIGHT: -case TMEMC_SET_CAP: -case TMEMC_SET_COMPRESS: +case XEN_SYSCTL_TMEM_OP_THAW: +case XEN_SYSCTL_TMEM_OP_FREEZE: +case XEN_SYSCTL_TMEM_OP_DESTROY: +case XEN_SYSCTL_TMEM_OP_SET_WEIGHT: +case XEN_SYSCTL_TMEM_OP_SET_CAP: +case XEN_SYSCTL_TMEM_OP_SET_COMPRESS: Any case statements falling through into default like this can simply be dropped. Lets do that in another patch. This patch is just to move the operation from one hypercall to another - with the minimum amount of changes. I will gladly modify the case statements in subsequent patches. @@ -2555,77 +2556,72 @@ static int tmemc_restore_flush_page(int cli_id, uint32_t pool_id, struct oid *oi return do_tmem_flush_page(pool,oidp,index); } -static int do_tmem_control(struct tmem_op *op) +int tmem_control(struct xen_sysctl_tmem_op *op) { int ret; uint32_t pool_id = op-pool_id; -uint32_t subop = op-u.ctrl.subop; -struct oid *oidp = (struct oid *)(op-u.ctrl.oid[0]); +uint32_t cmd = op-cmd; +struct oid *oidp = (struct oid *)(op-oid[0]); Please put a struct xen_sysctl_tmem_oid definition in sysctl.h and avoid this typesystem abuse. Again, I tried to make this move as minimal as possible and not introduce other sensible changes - and defer them to later patches. --- a/xen/include/public/sysctl.h +++ b/xen/include/public/sysctl.h @@ -710,6 +710,48 @@ struct xen_sysctl_psr_cat_op { typedef struct xen_sysctl_psr_cat_op xen_sysctl_psr_cat_op_t; DEFINE_XEN_GUEST_HANDLE(xen_sysctl_psr_cat_op_t); +#define XEN_SYSCTL_TMEM_OP_ALL_CLIENTS 0xU + +#define XEN_SYSCTL_TMEM_OP_THAW 0 +#define XEN_SYSCTL_TMEM_OP_FREEZE 1 +#define XEN_SYSCTL_TMEM_OP_FLUSH 2 +#define XEN_SYSCTL_TMEM_OP_DESTROY3 +#define XEN_SYSCTL_TMEM_OP_LIST 4 +#define XEN_SYSCTL_TMEM_OP_SET_WEIGHT 5 +#define XEN_SYSCTL_TMEM_OP_SET_CAP6 +#define XEN_SYSCTL_TMEM_OP_SET_COMPRESS 7 +#define XEN_SYSCTL_TMEM_OP_QUERY_FREEABLE_MB 8 +#define XEN_SYSCTL_TMEM_OP_SAVE_BEGIN 10 +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_VERSION 11 +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_MAXPOOLS 12 +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_CLIENT_WEIGHT 13 +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_CLIENT_CAP14 +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_CLIENT_FLAGS 15 +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_POOL_FLAGS16 +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_POOL_NPAGES 17 +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_POOL_UUID 18 +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_NEXT_PAGE 19 +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_NEXT_INV 20 +#define XEN_SYSCTL_TMEM_OP_SAVE_END 21 +#define XEN_SYSCTL_TMEM_OP_RESTORE_BEGIN 30 +#define XEN_SYSCTL_TMEM_OP_RESTORE_PUT_PAGE 32 +#define XEN_SYSCTL_TMEM_OP_RESTORE_FLUSH_PAGE 33 + +struct xen_sysctl_tmem_op { +uint32_t cmd; /* IN: XEN_SYSCTL_TMEM_OP_* . */ +int32_t pool_id;/* IN: 0 by default unless _SAVE_*, RESTORE_* .*/ +uint32_t cli_id;/* IN: client id, 0 for XEN_SYSCTL_TMEM_QUERY_FREEABLE_MB + for all others can be the domain id or + XEN_SYSCTL_TMEM_OP_ALL_CLIENTS for all. */ +uint32_t arg1; /* IN: If not applicable to command use 0. */ +uint32_t arg2; /* IN: If not applicable to command use 0. */ Please can this interface be fixed as part of the move, even if it is in subsequent patches for clarity. I will gladly fix this interface in further patches. By all means! Part of the issue with the XSA-17 security audit was that these args are
Re: [Xen-devel] [PATCH v2 5/8] tmem: Move TMEM_CONTROL subop of tmem hypercall to sysctl.
On 27/08/15 19:43, Konrad Rzeszutek Wilk wrote: --- a/xen/include/public/sysctl.h +++ b/xen/include/public/sysctl.h @@ -710,6 +710,48 @@ struct xen_sysctl_psr_cat_op { typedef struct xen_sysctl_psr_cat_op xen_sysctl_psr_cat_op_t; DEFINE_XEN_GUEST_HANDLE(xen_sysctl_psr_cat_op_t); +#define XEN_SYSCTL_TMEM_OP_ALL_CLIENTS 0xU + +#define XEN_SYSCTL_TMEM_OP_THAW 0 +#define XEN_SYSCTL_TMEM_OP_FREEZE 1 +#define XEN_SYSCTL_TMEM_OP_FLUSH 2 +#define XEN_SYSCTL_TMEM_OP_DESTROY3 +#define XEN_SYSCTL_TMEM_OP_LIST 4 +#define XEN_SYSCTL_TMEM_OP_SET_WEIGHT 5 +#define XEN_SYSCTL_TMEM_OP_SET_CAP6 +#define XEN_SYSCTL_TMEM_OP_SET_COMPRESS 7 +#define XEN_SYSCTL_TMEM_OP_QUERY_FREEABLE_MB 8 +#define XEN_SYSCTL_TMEM_OP_SAVE_BEGIN 10 +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_VERSION 11 +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_MAXPOOLS 12 +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_CLIENT_WEIGHT 13 +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_CLIENT_CAP14 +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_CLIENT_FLAGS 15 +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_POOL_FLAGS16 +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_POOL_NPAGES 17 +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_POOL_UUID 18 +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_NEXT_PAGE 19 +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_NEXT_INV 20 +#define XEN_SYSCTL_TMEM_OP_SAVE_END 21 +#define XEN_SYSCTL_TMEM_OP_RESTORE_BEGIN 30 +#define XEN_SYSCTL_TMEM_OP_RESTORE_PUT_PAGE 32 +#define XEN_SYSCTL_TMEM_OP_RESTORE_FLUSH_PAGE 33 Perhaps better to have all these in tmem.h, to not clutter this header? Yes and no. The other sysctl had their #defines for commands here so I figured I would follow that rule. But I am OK keeping it in tmem.h and just do /* For the 'cmd' values consule tmem.h. Look for XEN_SYSCTL_TMEM_OP_* */ Better to stay consistent with sysctl.h. Separating the structure and the subops is unhelpful to people reading the code. +struct xen_sysctl_tmem_op { +uint32_t cmd; /* IN: XEN_SYSCTL_TMEM_OP_* . */ +int32_t pool_id;/* IN: 0 by default unless _SAVE_*, RESTORE_* .*/ +uint32_t cli_id;/* IN: client id, 0 for XEN_SYSCTL_TMEM_QUERY_FREEABLE_MB + for all others can be the domain id or + XEN_SYSCTL_TMEM_OP_ALL_CLIENTS for all. */ +uint32_t arg1; /* IN: If not applicable to command use 0. */ +uint32_t arg2; /* IN: If not applicable to command use 0. */ +uint8_t pad[4];/* Padding so structure is the same under 32 and 64. */ uint32_t please. And despite this being an (easily changeable) sysctl, verifying that it's zero on input would be nice. Yes! That was what I forgotten! --- a/xen/include/public/tmem.h +++ b/xen/include/public/tmem.h @@ -33,7 +33,7 @@ #define TMEM_SPEC_VERSION 1 /* Commands to HYPERVISOR_tmem_op() */ -#define TMEM_CONTROL 0 +#define TMEM_CONTROL_MOVED 0 Perhaps say where it moved in a brief comment? Yes, good idea. /* Deprecated. These operations are done now via XEN_SYSCTL_tmem_op * using the sysctl hypercall. */ I hope XEN_SYSCTL_tmem_op is sufficient to imply using the sysctl hypercall ;) --- a/xen/xsm/flask/hooks.c +++ b/xen/xsm/flask/hooks.c @@ -761,6 +761,9 @@ static int flask_sysctl(int cmd) case XEN_SYSCTL_tbuf_op: return domain_has_xen(current-domain, XEN__TBUFCONTROL); +case XEN_SYSCTL_tmem_op: +return domain_has_xen(current-domain, XEN__TMEM_CONTROL); + case XEN_SYSCTL_sched_id: return domain_has_xen(current-domain, XEN__GETSCHEDULER); Hmm, these cases appear to be roughly sorted numerically, i.e. yours would normally go at the end. I recall a comment from Andrew asking the newly introduced commands to be in alphabetical order. But perhaps that was for domctl which is more .. ah.. random? Really? that doesn't sound like me. Apologies if it was. (Alphabetic for obj-y in a Makefile perhaps?) Numeric would be far better here. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [BUG] 'o' debug key panics the hypervisor
On Thu, Aug 27, 2015 at 04:07:36PM +0200, Roger Pau Monné wrote: El 27/08/15 a les 16.01, Jan Beulich ha escrit: On 27.08.15 at 13:29, roger@citrix.com wrote: When using Intel hardware without shared page tables between the IOMMU and EPT (I have not tried if the same happens when sharing the page tables), the following crash happens if I press the 'o' debug key with a HVM guest running. The guest doesn't have any device passed-through. This last thing is the relevant one: No-one ever cared to use 'o' without also passing through a device. I have to admit I discovered this when I thought I was typing at the Dom0 console but it turns out I had it switched to the Xen console... Anyway, it shouldn't crash Xen. And it would be called by '*' which is often used in the field to collect data. I concur on the not-crashing Xen part. Roger. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 4/8] tmem: Remove xc_tmem_control mystical arg3
On Thu, Aug 27, 2015 at 01:53:17PM +0100, Andrew Cooper wrote: On 27/08/15 12:01, Konrad Rzeszutek Wilk wrote: It mentions it but it is never used. The hypercall interface knows nothing of this sort of thing either. Lets just remove it. Signed-off-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com It would be nice if you could take the opportunity of changing every caller to fix the style issues in affected code. There are a huge number of missing spaces in the parameter lists of calls to xc_tmem_control() I was soo tempted to do it - but I've been hammered in the rule of 'one logical change per patch' so abstained from mission creep. Do not fear - I will send out another patch that will fix this up :-) ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [DOCSDAY PATCH for-4.6] docs: Fix installation of man8 pages
c/s a430436 docs: Support for generating man(8) pages accidentally failed to update to the install and clean rules for man8 pages, meaning that c/s 7b21214 docs: Move xentrace.8 to docs/man/xentrace.pod.8 caused a packaging regression when it came to xentop.8 To avoid similar bugs in the future, move the generation of the build, install and clean rules into the manpage metarule. Signed-off-by: Andrew Cooper andrew.coop...@citrix.com --- CC: Ian Campbell ian.campb...@citrix.com CC: Ian Jackson ian.jack...@eu.citrix.com CC: Wei Liu wei.l...@citrix.com --- docs/Makefile | 35 +++ 1 file changed, 23 insertions(+), 12 deletions(-) diff --git a/docs/Makefile b/docs/Makefile index 3d77913..d25e6c7 100644 --- a/docs/Makefile +++ b/docs/Makefile @@ -58,20 +58,15 @@ else @echo fig2dev (transfig) not installed; skipping figs. endif -.PHONY: man-pages -man-pages: $(DOC_MAN1) $(DOC_MAN5) $(DOC_MAN8) - .PHONY: pdf pdf: $(DOC_PDF) .PHONY: clean -clean: +clean: clean-man-pages $(MAKE) -C figs clean rm -rf .word_count *.aux *.dvi *.bbl *.blg *.glo *.idx *~ rm -rf *.ilg *.log *.ind *.toc *.bak *.tmp core rm -rf html txt pdf - rm -rf man5 - rm -rf man1 .PHONY: distclean distclean: clean @@ -80,6 +75,8 @@ distclean: clean # Top level install targets +.PHONY: man-pages install-man-pages clean-man-pages + # Metarules for generating manpages. Run with $(1) substitued for section define GENERATE_MANPAGE_RULES @@ -110,17 +107,31 @@ else @echo pod2text not installed; skipping $$@ endif +# Build +.PHONY: man$(1)-pages +man$(1)-pages: $$(DOC_MAN$(1)) + +# Install +.PHONY: install-man$(1)-pages +install-man$(1)-pages: man$(1)-pages + $(INSTALL_DIR) $(DESTDIR)$(mandir) + cp -r man$(1) $(DESTDIR)$(mandir) + +# Clean +.PHONY: clean-man$(1)-pages +clean-man$(1)-pages: + rm -rf man$(1) + +# Link buld/install/clean to toplevel rules +man-pages: man$(1)-pages +install-man-pages: install-man$(1)-pages +clean-man-pages: clean-man$(1)-pages + endef # Generate manpage rules for each section $(foreach i,1 5 8,$(eval $(call GENERATE_MANPAGE_RULES,$(i -.PHONY: install-man-pages -install-man-pages: man-pages - $(INSTALL_DIR) $(DESTDIR)$(mandir) - cp -R man1 $(DESTDIR)$(mandir) - cp -R man5 $(DESTDIR)$(mandir) - .PHONY: install-html install-html: html txt figs $(INSTALL_DIR) $(DESTDIR)$(docdir) -- 1.7.10.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [xen-4.4-testing test] 60873: regressions - FAIL
flight 60873 xen-4.4-testing real [real] http://logs.test-lab.xenproject.org/osstest/logs/60873/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-i386-xl-qemuu-debianhvm-amd64 19 guest-start/debianhvm.repeat fail REGR. vs. 60727 test-armhf-armhf-xl-credit2 16 guest-start/debian.repeat fail REGR. vs. 60727 Regressions which are regarded as allowable (not blocking): test-amd64-amd64-libvirt-raw 9 debian-di-install fail REGR. vs. 60727 test-amd64-i386-libvirt-vhd 9 debian-di-install fail REGR. vs. 60727 test-armhf-armhf-xl-multivcpu 16 guest-start/debian.repeatfail like 60696 test-amd64-i386-libvirt-qcow2 9 debian-di-installfail like 60727 test-amd64-amd64-xl-vhd 9 debian-di-installfail like 60727 test-amd64-i386-libvirt 11 guest-start fail like 60727 test-amd64-amd64-libvirt 11 guest-start fail like 60727 test-amd64-i386-xl-vhd9 debian-di-installfail like 60727 test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stop fail like 60727 Tests which did not succeed, but are not blocking: test-amd64-amd64-rumpuserxen-amd64 1 build-check(1) blocked n/a test-amd64-amd64-migrupgrade 1 build-check(1) blocked n/a test-amd64-i386-rumpuserxen-i386 1 build-check(1) blocked n/a test-amd64-i386-migrupgrade 1 build-check(1) blocked n/a build-amd64-rumpuserxen 6 xen-buildfail never pass test-armhf-armhf-xl-vhd 9 debian-di-installfail never pass test-armhf-armhf-libvirt-vhd 9 debian-di-installfail never pass test-amd64-amd64-xl-qcow2 9 debian-di-installfail never pass test-armhf-armhf-xl-raw 9 debian-di-installfail never pass build-amd64-prev 5 xen-buildfail never pass build-i386-prev 5 xen-buildfail never pass test-armhf-armhf-xl-qcow2 9 debian-di-installfail never pass build-i386-rumpuserxen6 xen-buildfail never pass test-amd64-i386-libvirt-pair 21 guest-migrate/src_host/dst_host fail never pass test-amd64-amd64-libvirt-pair 21 guest-migrate/src_host/dst_host fail never pass test-armhf-armhf-libvirt 11 guest-start fail never pass test-armhf-armhf-libvirt-qcow2 9 debian-di-installfail never pass test-armhf-armhf-xl-arndale 12 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 13 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-qcow2 11 migrate-support-checkfail never pass test-amd64-i386-libvirt-raw 11 migrate-support-checkfail never pass test-armhf-armhf-xl 12 migrate-support-checkfail never pass test-armhf-armhf-xl 13 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-vhd 11 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 13 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit2 12 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 12 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 13 saverestore-support-checkfail never pass test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop fail never pass test-amd64-i386-xl-qemut-win7-amd64 17 guest-stop fail never pass test-amd64-amd64-xl-qemut-win7-amd64 17 guest-stop fail never pass test-armhf-armhf-libvirt-raw 9 debian-di-installfail never pass test-amd64-i386-xend-qemut-winxpsp3 21 leak-check/checkfail never pass test-armhf-armhf-xl-multivcpu 13 saverestore-support-checkfail never pass test-armhf-armhf-xl-multivcpu 12 migrate-support-checkfail never pass version targeted for testing: xen 27b82b08b17f589f638f3d5be8dcea42b5e73330 baseline version: xen 3646b134c1673f09c0a239de10b0da4c9265c8e8 Last test of basis60727 2015-08-16 16:15:09 Z 11 days Testing same since60802 2015-08-20 14:41:37 Z7 days3 attempts People who touched revisions under test: Jan Beulich jbeul...@suse.com jobs: build-amd64-xend pass build-i386-xend pass build-amd64 pass build-armhf pass build-i386 pass build-amd64-libvirt pass build-armhf-libvirt pass build-i386-libvirt
Re: [Xen-devel] [PATCH v2 5/8] tmem: Move TMEM_CONTROL subop of tmem hypercall to sysctl.
On 27/08/15 19:47, Konrad Rzeszutek Wilk wrote: --- a/xen/include/public/sysctl.h +++ b/xen/include/public/sysctl.h @@ -710,6 +710,48 @@ struct xen_sysctl_psr_cat_op { typedef struct xen_sysctl_psr_cat_op xen_sysctl_psr_cat_op_t; DEFINE_XEN_GUEST_HANDLE(xen_sysctl_psr_cat_op_t); +#define XEN_SYSCTL_TMEM_OP_ALL_CLIENTS 0xU + +#define XEN_SYSCTL_TMEM_OP_THAW 0 +#define XEN_SYSCTL_TMEM_OP_FREEZE 1 +#define XEN_SYSCTL_TMEM_OP_FLUSH 2 +#define XEN_SYSCTL_TMEM_OP_DESTROY3 +#define XEN_SYSCTL_TMEM_OP_LIST 4 +#define XEN_SYSCTL_TMEM_OP_SET_WEIGHT 5 +#define XEN_SYSCTL_TMEM_OP_SET_CAP6 +#define XEN_SYSCTL_TMEM_OP_SET_COMPRESS 7 +#define XEN_SYSCTL_TMEM_OP_QUERY_FREEABLE_MB 8 +#define XEN_SYSCTL_TMEM_OP_SAVE_BEGIN 10 +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_VERSION 11 +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_MAXPOOLS 12 +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_CLIENT_WEIGHT 13 +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_CLIENT_CAP14 +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_CLIENT_FLAGS 15 +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_POOL_FLAGS16 +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_POOL_NPAGES 17 +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_POOL_UUID 18 +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_NEXT_PAGE 19 +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_NEXT_INV 20 +#define XEN_SYSCTL_TMEM_OP_SAVE_END 21 +#define XEN_SYSCTL_TMEM_OP_RESTORE_BEGIN 30 +#define XEN_SYSCTL_TMEM_OP_RESTORE_PUT_PAGE 32 +#define XEN_SYSCTL_TMEM_OP_RESTORE_FLUSH_PAGE 33 + +struct xen_sysctl_tmem_op { +uint32_t cmd; /* IN: XEN_SYSCTL_TMEM_OP_* . */ +int32_t pool_id;/* IN: 0 by default unless _SAVE_*, RESTORE_* .*/ +uint32_t cli_id;/* IN: client id, 0 for XEN_SYSCTL_TMEM_QUERY_FREEABLE_MB + for all others can be the domain id or + XEN_SYSCTL_TMEM_OP_ALL_CLIENTS for all. */ +uint32_t arg1; /* IN: If not applicable to command use 0. */ +uint32_t arg2; /* IN: If not applicable to command use 0. */ Please can this interface be fixed as part of the move, even if it is in subsequent patches for clarity. I will gladly fix this interface in further patches. By all means! What I wish to avoid is 4.6 releasing with the API in this state, as adjusting valgrind and strace to compensate will be miserable. The best solution would be to have this patch and the fixups adjacent in the series, at which point the valgrind/strace adjustments can start with the clean API for 4.6 +uint64_t oid[3];/* IN: If not applicable to command use 0. */ +XEN_GUEST_HANDLE_64(char) buf; /* IN/OUT: Buffer to save and restore ops. */ +}; +typedef struct xen_sysctl_tmem_op xen_sysctl_tmem_op_t; +DEFINE_XEN_GUEST_HANDLE(xen_sysctl_tmem_op_t); + struct xen_sysctl { uint32_t cmd; #define XEN_SYSCTL_readconsole1 @@ -734,6 +776,7 @@ struct xen_sysctl { #define XEN_SYSCTL_psr_cmt_op21 #define XEN_SYSCTL_pcitopoinfo 22 #define XEN_SYSCTL_psr_cat_op23 +#define XEN_SYSCTL_tmem_op 24 uint32_t interface_version; /* XEN_SYSCTL_INTERFACE_VERSION */ union { struct xen_sysctl_readconsole readconsole; @@ -758,6 +801,7 @@ struct xen_sysctl { struct xen_sysctl_coverage_op coverage_op; struct xen_sysctl_psr_cmt_oppsr_cmt_op; struct xen_sysctl_psr_cat_oppsr_cat_op; +struct xen_sysctl_tmem_op tmem_op; uint8_t pad[128]; } u; }; diff --git a/xen/include/public/tmem.h b/xen/include/public/tmem.h index 4fd2fc6..208f5a6 100644 --- a/xen/include/public/tmem.h +++ b/xen/include/public/tmem.h @@ -33,7 +33,7 @@ #define TMEM_SPEC_VERSION 1 /* Commands to HYPERVISOR_tmem_op() */ -#define TMEM_CONTROL 0 +#define TMEM_CONTROL_MOVED 0 If anything, this should be deleted as opposed to being renamed. The _MOVED suffix is confusing, as TMEM_CONTROL_MOVED sounds like it could be a plausible TMEM operation. Ah, will do what Jan asked and just put a big comment. And keep the old name (or maybe add 'DEPRECATED'?) If you are going to the effort of renaming, then just delete. Either way will break the build of unsuspecting code, and the latter leaves us with less junk. A comment however will be useful in all cases. #define TMEM_NEW_POOL 1 #define TMEM_DESTROY_POOL 2 #define TMEM_PUT_PAGE 4 @@ -48,35 +48,9 @@ #endif /* Privileged commands to HYPERVISOR_tmem_op() */ -#define TMEM_AUTH 101 +#define TMEM_AUTH 101 #define TMEM_RESTORE_NEW 102 -/* Subops for HYPERVISOR_tmem_op(TMEM_CONTROL) */
Re: [Xen-devel] [PATCH v2 5/8] tmem: Move TMEM_CONTROL subop of tmem hypercall to sysctl.
+struct xen_sysctl_tmem_op { +uint32_t cmd; /* IN: XEN_SYSCTL_TMEM_OP_* . */ +int32_t pool_id;/* IN: 0 by default unless _SAVE_*, RESTORE_* .*/ +uint32_t cli_id;/* IN: client id, 0 for XEN_SYSCTL_TMEM_QUERY_FREEABLE_MB + for all others can be the domain id or + XEN_SYSCTL_TMEM_OP_ALL_CLIENTS for all. */ +uint32_t arg1; /* IN: If not applicable to command use 0. */ +uint32_t arg2; /* IN: If not applicable to command use 0. */ Please can this interface be fixed as part of the move, even if it is in subsequent patches for clarity. I will gladly fix this interface in further patches. By all means! What I wish to avoid is 4.6 releasing with the API in this state, as adjusting valgrind and strace to compensate will be miserable. Right. The best solution would be to have this patch and the fixups adjacent in the series, at which point the valgrind/strace adjustments can start with the clean API for 4.6 Yes. I shall post this patchset plus extra patches next week. +uint64_t oid[3];/* IN: If not applicable to command use 0. */ +XEN_GUEST_HANDLE_64(char) buf; /* IN/OUT: Buffer to save and restore ops. */ +}; +typedef struct xen_sysctl_tmem_op xen_sysctl_tmem_op_t; +DEFINE_XEN_GUEST_HANDLE(xen_sysctl_tmem_op_t); + struct xen_sysctl { uint32_t cmd; #define XEN_SYSCTL_readconsole1 @@ -734,6 +776,7 @@ struct xen_sysctl { #define XEN_SYSCTL_psr_cmt_op21 #define XEN_SYSCTL_pcitopoinfo 22 #define XEN_SYSCTL_psr_cat_op23 +#define XEN_SYSCTL_tmem_op 24 uint32_t interface_version; /* XEN_SYSCTL_INTERFACE_VERSION */ union { struct xen_sysctl_readconsole readconsole; @@ -758,6 +801,7 @@ struct xen_sysctl { struct xen_sysctl_coverage_op coverage_op; struct xen_sysctl_psr_cmt_oppsr_cmt_op; struct xen_sysctl_psr_cat_oppsr_cat_op; +struct xen_sysctl_tmem_op tmem_op; uint8_t pad[128]; } u; }; diff --git a/xen/include/public/tmem.h b/xen/include/public/tmem.h index 4fd2fc6..208f5a6 100644 --- a/xen/include/public/tmem.h +++ b/xen/include/public/tmem.h @@ -33,7 +33,7 @@ #define TMEM_SPEC_VERSION 1 /* Commands to HYPERVISOR_tmem_op() */ -#define TMEM_CONTROL 0 +#define TMEM_CONTROL_MOVED 0 If anything, this should be deleted as opposed to being renamed. The _MOVED suffix is confusing, as TMEM_CONTROL_MOVED sounds like it could be a plausible TMEM operation. Ah, will do what Jan asked and just put a big comment. And keep the old name (or maybe add 'DEPRECATED'?) If you are going to the effort of renaming, then just delete. Either way will break the build of unsuspecting code, and the latter leaves us with less junk. It will break the tmem hypervisor code as it has a check for TMEM_CONTROL (which per Jan suggestion should return -E.. something). A comment however will be useful in all cases. Aye. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [xen-unstable baseline test] 60878: regressions - FAIL
Old tested version had not actually been tested; therefore in this flight we test it, rather than a new candidate. The baseline, if any, is the most recent actually tested revision. flight 60878 xen-unstable real [real] http://logs.test-lab.xenproject.org/osstest/logs/60878/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-i386-xl-qemuu-debianhvm-amd64 19 guest-start/debianhvm.repeat fail REGR. vs. 60775 test-armhf-armhf-xl-cubietruck 6 xen-bootfail REGR. vs. 60775 test-amd64-amd64-xl-qemuu-ovmf-amd64 19 guest-start/debianhvm.repeat fail REGR. vs. 60775 test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsm 15 guest-localmigrate.2 fail REGR. vs. 60775 Regressions which are regarded as allowable (not blocking): test-armhf-armhf-libvirt-qcow2 5 xen-install fail REGR. vs. 60775 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 16 guest-start/debianhvm.repeat fail REGR. vs. 60775 test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm 9 debian-hvm-install fail like 60775 test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop fail like 60775 test-amd64-amd64-xl-qemut-win7-amd64 17 guest-stop fail like 60775 Tests which did not succeed, but are not blocking: test-armhf-armhf-xl-qcow2 9 debian-di-installfail never pass test-amd64-amd64-xl-pvh-intel 11 guest-start fail never pass test-amd64-amd64-xl-pvh-amd 11 guest-start fail never pass test-armhf-armhf-libvirt-raw 9 debian-di-installfail never pass test-armhf-armhf-xl-raw 9 debian-di-installfail never pass test-armhf-armhf-xl-vhd 9 debian-di-installfail never pass test-armhf-armhf-libvirt-vhd 9 debian-di-installfail never pass test-armhf-armhf-libvirt-xsm 12 migrate-support-checkfail never pass test-armhf-armhf-libvirt-xsm 14 guest-saverestorefail never pass test-armhf-armhf-libvirt 14 guest-saverestorefail never pass test-armhf-armhf-libvirt 12 migrate-support-checkfail never pass test-amd64-amd64-libvirt-pair 21 guest-migrate/src_host/dst_host fail never pass test-amd64-i386-libvirt-xsm 12 migrate-support-checkfail never pass test-amd64-i386-libvirt 12 migrate-support-checkfail never pass test-amd64-amd64-libvirt 12 migrate-support-checkfail never pass test-amd64-amd64-libvirt-xsm 12 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 15 guest-start.2fail never pass test-armhf-armhf-xl-rtds 13 saverestore-support-checkfail never pass test-armhf-armhf-xl-rtds 12 migrate-support-checkfail never pass test-amd64-i386-libvirt-pair 21 guest-migrate/src_host/dst_host fail never pass test-armhf-armhf-xl-arndale 12 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 13 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-qcow2 11 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 13 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit2 12 migrate-support-checkfail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check fail never pass test-armhf-armhf-xl 12 migrate-support-checkfail never pass test-armhf-armhf-xl 13 saverestore-support-checkfail never pass test-armhf-armhf-xl-multivcpu 13 saverestore-support-checkfail never pass test-armhf-armhf-xl-multivcpu 12 migrate-support-checkfail never pass test-armhf-armhf-xl-xsm 13 saverestore-support-checkfail never pass test-armhf-armhf-xl-xsm 12 migrate-support-checkfail never pass test-amd64-amd64-libvirt-vhd 11 migrate-support-checkfail never pass test-amd64-amd64-libvirt-raw 11 migrate-support-checkfail never pass test-amd64-i386-libvirt-raw 11 migrate-support-checkfail never pass test-amd64-i386-xl-qemut-win7-amd64 17 guest-stop fail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check fail never pass test-amd64-i386-libvirt-vhd 11 migrate-support-checkfail never pass test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stop fail never pass test-amd64-i386-libvirt-qcow2 11 migrate-support-checkfail never pass version targeted for testing: xen 7b99717f62caeac08eea224a177cd28f047ac4b5 baseline version: xen 743289d0296268fe6bad64531a24d8053afeb062 Last test of basis60810 2015-08-20 23:37:03 Z6 days Testing same since60841 2015-08-23 06:47:14 Z4 days2 attempts People who touched revisions under test: Boris Ostrovsky boris.ostrov...@oracle.com Ian
Re: [Xen-devel] [PATCH v2 6/8] tmem: Remove the old tmem control XSM checks as it is part of sysctl hypercall.
On 27/08/15 07:02, Konrad Rzeszutek Wilk wrote: The sysctl is where the tmem control operations are done and the XSM checks are done via there. The old mechanism (to check for control tmem op XSM from do_tmem_op) is not needed anymore. CC: Daniel De Graaf dgde...@tycho.nsa.gov Signed-off-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com Both this and patch 5 (which adds the sysctl check): Acked-by: Daniel De Graaf dgde...@tycho.nsa.gov ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [qemu-mainline test] 60879: regressions - FAIL
flight 60879 qemu-mainline real [real] http://logs.test-lab.xenproject.org/osstest/logs/60879/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-armhf-armhf-xl-multivcpu 11 guest-start fail REGR. vs. 60846 Regressions which are regarded as allowable (not blocking): test-armhf-armhf-xl-rtds 11 guest-start fail like 60846 Tests which did not succeed, but are not blocking: test-armhf-armhf-xl-qcow2 9 debian-di-installfail never pass test-amd64-amd64-xl-pvh-intel 11 guest-start fail never pass test-armhf-armhf-libvirt-qcow2 9 debian-di-installfail never pass test-armhf-armhf-libvirt-vhd 9 debian-di-installfail never pass test-amd64-amd64-xl-pvh-amd 11 guest-start fail never pass test-armhf-armhf-libvirt-raw 9 debian-di-installfail never pass test-armhf-armhf-xl-vhd 9 debian-di-installfail never pass test-armhf-armhf-xl-raw 9 debian-di-installfail never pass test-armhf-armhf-libvirt 14 guest-saverestorefail never pass test-armhf-armhf-libvirt 12 migrate-support-checkfail never pass test-armhf-armhf-libvirt-xsm 12 migrate-support-checkfail never pass test-armhf-armhf-libvirt-xsm 14 guest-saverestorefail never pass test-amd64-amd64-libvirt-xsm 12 migrate-support-checkfail never pass test-amd64-i386-libvirt 12 migrate-support-checkfail never pass test-amd64-i386-libvirt-pair 21 guest-migrate/src_host/dst_host fail never pass test-armhf-armhf-xl-arndale 12 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 13 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-raw 11 migrate-support-checkfail never pass test-armhf-armhf-xl-xsm 13 saverestore-support-checkfail never pass test-armhf-armhf-xl-xsm 12 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 12 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 13 saverestore-support-checkfail never pass test-armhf-armhf-xl 12 migrate-support-checkfail never pass test-armhf-armhf-xl 13 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit2 13 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit2 12 migrate-support-checkfail never pass test-amd64-i386-libvirt-xsm 12 migrate-support-checkfail never pass test-amd64-amd64-libvirt 12 migrate-support-checkfail never pass test-amd64-amd64-libvirt-pair 21 guest-migrate/src_host/dst_host fail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check fail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check fail never pass test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stop fail never pass test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop fail never pass test-amd64-amd64-libvirt-qcow2 11 migrate-support-checkfail never pass test-amd64-i386-libvirt-qcow2 11 migrate-support-checkfail never pass test-amd64-i386-libvirt-vhd 9 debian-di-installfail never pass test-amd64-i386-libvirt-raw 11 migrate-support-checkfail never pass test-amd64-amd64-libvirt-vhd 9 debian-di-installfail never pass test-amd64-amd64-xl-vhd 9 debian-di-installfail never pass test-amd64-i386-xl-vhd9 debian-di-installfail never pass version targeted for testing: qemuu7df9671989c1cfa693764f9ae6349324b2ada02a baseline version: qemuua30878e708c2149ce07d709a8b62edd944628449 Last test of basis60846 2015-08-23 16:48:08 Z4 days Testing same since60879 2015-08-25 22:41:03 Z1 days1 attempts People who touched revisions under test: Alistair Francis alistair.fran...@xilinx.com Aurelien Jarno aurel...@aurel32.net Benjamin Herrenschmidt b...@kernel.crashing.org Claudio Fontana claudio.font...@huawei.com Laurent Vivier laur...@vivier.eu Peter Maydell peter.mayd...@linaro.org Richard Henderson r...@twiddle.net jobs: build-amd64-xsm pass build-armhf-xsm pass build-i386-xsm pass build-amd64 pass build-armhf pass build-i386 pass build-amd64-libvirt pass build-armhf-libvirt pass
Re: [Xen-devel] [PATCH v2] libxenstore: prefer using the character device
Ian Jackson ian.jack...@eu.citrix.com writes: Wei Liu writes (Re: [Xen-devel] [PATCH v2] libxenstore: prefer using the character device): On Thu, Aug 27, 2015 at 09:04:38AM -0500, Jonathan Creekmore wrote: With the addition of FMODE_ATOMIC_POS in the Linux 3.14 kernel, concurrent blocking file accesses to a single open file descriptor can cause a deadlock trying to grab the file position lock. If a watch has been set up, causing a read_thread to blocking read on the file descriptor, then future writes that would cause the background read to complete will block waiting on the file position lock before they can execute. This race condition only occurs when libxenstore is accessing the xenstore daemon through the /proc/xen/xenbus file and not through the unix domain socket, which is the case when the xenstore daemon is running as a stub domain or when oxenstored is passed --disable-socket. Accessing the daemon from the true character device also does not exhibit this problem. On Linux, prefer using the character device file over the proc file if the character device exists. I confess I still see this as working around a kernel bug. Only this time we are switching from a buggy to non-buggy kernel interface. Why don't we have the kernel provide only non-buggy interfaces ? I was just trying to implement what David suggested; CC'ing him on this as well. diff --git a/tools/xenstore/xs_lib.c b/tools/xenstore/xs_lib.c index af4f75a..0c7744e 100644 --- a/tools/xenstore/xs_lib.c +++ b/tools/xenstore/xs_lib.c @@ -81,6 +81,8 @@ const char *xs_domain_dev(void) #if defined(__RUMPUSER_XEN__) || defined(__RUMPRUN__) return /dev/xen/xenbus; #elif defined(__linux__) + if (access(/dev/xen/xenbus, F_OK) == 0) + return /dev/xen/xenbus; Also, previously xs_domain_dev was a function which simply returned a static value. I feel vaguely uneasy at putting this kind of autodetection logic here. Not entirely; the existing code queried an environment variable first and, only if that was not set, did it return a static value. I added the autodetection logic to fall back in the case where, for some reason, /dev/xen/xenbus did not exist. Handling that kind of a fallback at a higher layer would be a larger refactor to probe for the existence of the character device first. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [3.19.y-ckt stable] Patch x86/xen: Probe target addresses in set_aliased_prot() before the hypercall has been added to staging queue
This is a note to let you know that I have just added a patch titled x86/xen: Probe target addresses in set_aliased_prot() before the hypercall to the linux-3.19.y-queue branch of the 3.19.y-ckt extended stable tree which can be found at: http://kernel.ubuntu.com/git/ubuntu/linux.git/log/?h=linux-3.19.y-queue This patch is scheduled to be released in version 3.19.8-ckt6. If you, or anyone else, feels it should not be added to this tree, please reply to this email. For more information about the 3.19.y-ckt tree, see https://wiki.ubuntu.com/Kernel/Dev/ExtendedStable Thanks. -Kamal -- From 922c946e2cad7cdaec82a9ae963c5eb47f7992b0 Mon Sep 17 00:00:00 2001 From: Andy Lutomirski l...@kernel.org Date: Thu, 30 Jul 2015 14:31:31 -0700 Subject: x86/xen: Probe target addresses in set_aliased_prot() before the hypercall commit aa1acff356bbedfd03b544051f5b371746735d89 upstream. The update_va_mapping hypercall can fail if the VA isn't present in the guest's page tables. Under certain loads, this can result in an OOPS when the target address is in unpopulated vmap space. While we're at it, add comments to help explain what's going on. This isn't a great long-term fix. This code should probably be changed to use something like set_memory_ro. Signed-off-by: Andy Lutomirski l...@kernel.org Cc: Andrew Cooper andrew.coop...@citrix.com Cc: Andy Lutomirski l...@amacapital.net Cc: Boris Ostrovsky boris.ostrov...@oracle.com Cc: Borislav Petkov b...@alien8.de Cc: Brian Gerst brge...@gmail.com Cc: David Vrabel dvra...@cantab.net Cc: Denys Vlasenko dvlas...@redhat.com Cc: H. Peter Anvin h...@zytor.com Cc: Jan Beulich jbeul...@suse.com Cc: Konrad Rzeszutek Wilk konrad.w...@oracle.com Cc: Linus Torvalds torva...@linux-foundation.org Cc: Peter Zijlstra pet...@infradead.org Cc: Sasha Levin sasha.le...@oracle.com Cc: Steven Rostedt rost...@goodmis.org Cc: Thomas Gleixner t...@linutronix.de Cc: secur...@kernel.org secur...@kernel.org Cc: xen-devel xen-devel@lists.xen.org Link: http://lkml.kernel.org/r/0b0e55b995cda11e7829f140b833ef932fcabe3a.1438291540.git.l...@kernel.org Signed-off-by: Ingo Molnar mi...@kernel.org Signed-off-by: Kamal Mostafa ka...@canonical.com --- arch/x86/xen/enlighten.c | 40 1 file changed, 40 insertions(+) diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c index 78a881b..f94ad30 100644 --- a/arch/x86/xen/enlighten.c +++ b/arch/x86/xen/enlighten.c @@ -483,6 +483,7 @@ static void set_aliased_prot(void *v, pgprot_t prot) pte_t pte; unsigned long pfn; struct page *page; + unsigned char dummy; ptep = lookup_address((unsigned long)v, level); BUG_ON(ptep == NULL); @@ -492,6 +493,32 @@ static void set_aliased_prot(void *v, pgprot_t prot) pte = pfn_pte(pfn, prot); + /* +* Careful: update_va_mapping() will fail if the virtual address +* we're poking isn't populated in the page tables. We don't +* need to worry about the direct map (that's always in the page +* tables), but we need to be careful about vmap space. In +* particular, the top level page table can lazily propagate +* entries between processes, so if we've switched mms since we +* vmapped the target in the first place, we might not have the +* top-level page table entry populated. +* +* We disable preemption because we want the same mm active when +* we probe the target and when we issue the hypercall. We'll +* have the same nominal mm, but if we're a kernel thread, lazy +* mm dropping could change our pgd. +* +* Out of an abundance of caution, this uses __get_user() to fault +* in the target address just in case there's some obscure case +* in which the target address isn't readable. +*/ + + preempt_disable(); + + pagefault_disable();/* Avoid warnings due to being atomic. */ + __get_user(dummy, (unsigned char __user __force *)v); + pagefault_enable(); + if (HYPERVISOR_update_va_mapping((unsigned long)v, pte, 0)) BUG(); @@ -503,6 +530,8 @@ static void set_aliased_prot(void *v, pgprot_t prot) BUG(); } else kmap_flush_unused(); + + preempt_enable(); } static void xen_alloc_ldt(struct desc_struct *ldt, unsigned entries) @@ -510,6 +539,17 @@ static void xen_alloc_ldt(struct desc_struct *ldt, unsigned entries) const unsigned entries_per_page = PAGE_SIZE / LDT_ENTRY_SIZE; int i; + /* +* We need to mark the all aliases of the LDT pages RO. We +* don't need to call vm_flush_aliases(), though, since that's +* only responsible for flushing aliases out the TLBs, not the +* page tables, and Xen will flush the TLB for us if needed. +* +* To avoid confusing future readers: none of this is
[Xen-devel] [PATCH 3.19.y-ckt 111/130] x86/xen: Probe target addresses in set_aliased_prot() before the hypercall
3.19.8-ckt6 -stable review patch. If anyone has any objections, please let me know. -- From: Andy Lutomirski l...@kernel.org commit aa1acff356bbedfd03b544051f5b371746735d89 upstream. The update_va_mapping hypercall can fail if the VA isn't present in the guest's page tables. Under certain loads, this can result in an OOPS when the target address is in unpopulated vmap space. While we're at it, add comments to help explain what's going on. This isn't a great long-term fix. This code should probably be changed to use something like set_memory_ro. Signed-off-by: Andy Lutomirski l...@kernel.org Cc: Andrew Cooper andrew.coop...@citrix.com Cc: Andy Lutomirski l...@amacapital.net Cc: Boris Ostrovsky boris.ostrov...@oracle.com Cc: Borislav Petkov b...@alien8.de Cc: Brian Gerst brge...@gmail.com Cc: David Vrabel dvra...@cantab.net Cc: Denys Vlasenko dvlas...@redhat.com Cc: H. Peter Anvin h...@zytor.com Cc: Jan Beulich jbeul...@suse.com Cc: Konrad Rzeszutek Wilk konrad.w...@oracle.com Cc: Linus Torvalds torva...@linux-foundation.org Cc: Peter Zijlstra pet...@infradead.org Cc: Sasha Levin sasha.le...@oracle.com Cc: Steven Rostedt rost...@goodmis.org Cc: Thomas Gleixner t...@linutronix.de Cc: secur...@kernel.org secur...@kernel.org Cc: xen-devel xen-devel@lists.xen.org Link: http://lkml.kernel.org/r/0b0e55b995cda11e7829f140b833ef932fcabe3a.1438291540.git.l...@kernel.org Signed-off-by: Ingo Molnar mi...@kernel.org Signed-off-by: Kamal Mostafa ka...@canonical.com --- arch/x86/xen/enlighten.c | 40 1 file changed, 40 insertions(+) diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c index 78a881b..f94ad30 100644 --- a/arch/x86/xen/enlighten.c +++ b/arch/x86/xen/enlighten.c @@ -483,6 +483,7 @@ static void set_aliased_prot(void *v, pgprot_t prot) pte_t pte; unsigned long pfn; struct page *page; + unsigned char dummy; ptep = lookup_address((unsigned long)v, level); BUG_ON(ptep == NULL); @@ -492,6 +493,32 @@ static void set_aliased_prot(void *v, pgprot_t prot) pte = pfn_pte(pfn, prot); + /* +* Careful: update_va_mapping() will fail if the virtual address +* we're poking isn't populated in the page tables. We don't +* need to worry about the direct map (that's always in the page +* tables), but we need to be careful about vmap space. In +* particular, the top level page table can lazily propagate +* entries between processes, so if we've switched mms since we +* vmapped the target in the first place, we might not have the +* top-level page table entry populated. +* +* We disable preemption because we want the same mm active when +* we probe the target and when we issue the hypercall. We'll +* have the same nominal mm, but if we're a kernel thread, lazy +* mm dropping could change our pgd. +* +* Out of an abundance of caution, this uses __get_user() to fault +* in the target address just in case there's some obscure case +* in which the target address isn't readable. +*/ + + preempt_disable(); + + pagefault_disable();/* Avoid warnings due to being atomic. */ + __get_user(dummy, (unsigned char __user __force *)v); + pagefault_enable(); + if (HYPERVISOR_update_va_mapping((unsigned long)v, pte, 0)) BUG(); @@ -503,6 +530,8 @@ static void set_aliased_prot(void *v, pgprot_t prot) BUG(); } else kmap_flush_unused(); + + preempt_enable(); } static void xen_alloc_ldt(struct desc_struct *ldt, unsigned entries) @@ -510,6 +539,17 @@ static void xen_alloc_ldt(struct desc_struct *ldt, unsigned entries) const unsigned entries_per_page = PAGE_SIZE / LDT_ENTRY_SIZE; int i; + /* +* We need to mark the all aliases of the LDT pages RO. We +* don't need to call vm_flush_aliases(), though, since that's +* only responsible for flushing aliases out the TLBs, not the +* page tables, and Xen will flush the TLB for us if needed. +* +* To avoid confusing future readers: none of this is necessary +* to load the LDT. The hypervisor only checks this when the +* LDT is faulted in due to subsequent descriptor access. +*/ + for(i = 0; i entries; i += entries_per_page) set_aliased_prot(ldt + i, PAGE_KERNEL_RO); } -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [libvirt test] 60880: regressions - FAIL
flight 60880 libvirt real [real] http://logs.test-lab.xenproject.org/osstest/logs/60880/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 16 guest-start/debianhvm.repeat fail REGR. vs. 60848 Tests which did not succeed, but are not blocking: test-armhf-armhf-libvirt-vhd 9 debian-di-installfail never pass test-armhf-armhf-libvirt-raw 9 debian-di-installfail never pass test-armhf-armhf-libvirt-qcow2 9 debian-di-installfail never pass test-armhf-armhf-libvirt-xsm 12 migrate-support-checkfail never pass test-armhf-armhf-libvirt-xsm 14 guest-saverestorefail never pass test-amd64-i386-libvirt-xsm 12 migrate-support-checkfail never pass test-amd64-amd64-libvirt-xsm 12 migrate-support-checkfail never pass test-armhf-armhf-libvirt 14 guest-saverestorefail never pass test-armhf-armhf-libvirt 12 migrate-support-checkfail never pass test-amd64-amd64-libvirt 12 migrate-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check fail never pass test-amd64-amd64-libvirt-pair 21 guest-migrate/src_host/dst_host fail never pass test-amd64-i386-libvirt-pair 21 guest-migrate/src_host/dst_host fail never pass test-amd64-amd64-libvirt-qcow2 11 migrate-support-checkfail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check fail never pass test-amd64-i386-libvirt 12 migrate-support-checkfail never pass test-amd64-amd64-libvirt-vhd 11 migrate-support-checkfail never pass test-amd64-i386-libvirt-vhd 11 migrate-support-checkfail never pass test-amd64-i386-libvirt-qcow2 11 migrate-support-checkfail never pass test-amd64-i386-libvirt-raw 11 migrate-support-checkfail never pass test-amd64-amd64-libvirt-raw 11 migrate-support-checkfail never pass version targeted for testing: libvirt bf2788218ad29719467260aa4ecad6dc31c97046 baseline version: libvirt 7dc52241b314adb19ca4f0af156d772e3d5a29a5 Last test of basis60848 2015-08-23 18:44:07 Z4 days Testing same since60880 2015-08-25 23:18:39 Z1 days1 attempts People who touched revisions under test: Erik Skultety eskul...@redhat.com Guido Günther a...@sigxcpu.org intrigeri intrig...@debian.org Laine Stump la...@laine.org Luyao Huang lhu...@redhat.com Martin Kletzander mklet...@redhat.com Sergey Bronnikov serg...@openvz.org Tomas Meszaros e...@tty.sk Vasiliy Tolstov v.tols...@selfip.ru jobs: build-amd64-xsm pass build-armhf-xsm pass build-i386-xsm pass build-amd64 pass build-armhf pass build-i386 pass build-amd64-libvirt pass build-armhf-libvirt pass build-i386-libvirt pass build-amd64-pvopspass build-armhf-pvopspass build-i386-pvops pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsmfail test-amd64-amd64-libvirt-xsm pass test-armhf-armhf-libvirt-xsm fail test-amd64-i386-libvirt-xsm pass test-amd64-amd64-libvirt pass test-armhf-armhf-libvirt fail test-amd64-i386-libvirt pass test-amd64-amd64-libvirt-pairfail test-amd64-i386-libvirt-pair fail test-amd64-amd64-libvirt-qcow2 pass test-armhf-armhf-libvirt-qcow2 fail test-amd64-i386-libvirt-qcow2pass test-amd64-amd64-libvirt-raw pass test-armhf-armhf-libvirt-raw fail test-amd64-i386-libvirt-raw pass test-amd64-amd64-libvirt-vhd pass test-armhf-armhf-libvirt-vhd fail test-amd64-i386-libvirt-vhd pass sg-report-flight on osstest.test-lab.xenproject.org logs:
Re: [Xen-devel] [PATCH V4 2/4] x86/xsaves: enable xsaves/xrstors in xen
On Wed, Aug 26, 2015 at 06:35:51AM -0600, Jan Beulich wrote: On 26.08.15 at 14:05, andrew.coop...@citrix.com wrote: On 26/08/15 12:50, Jan Beulich wrote: On 26.08.15 at 12:12, andrew.coop...@citrix.com wrote: On 25/08/15 11:54, Shuai Ruan wrote: --- a/xen/arch/x86/traps.c +++ b/xen/arch/x86/traps.c @@ -936,9 +936,10 @@ void pv_cpuid(struct cpu_user_regs *regs) if ( regs-_ecx == 1 ) { a = XSTATE_FEATURE_XSAVEOPT | - XSTATE_FEATURE_XSAVEC | - (cpu_has_xgetbv1 ? XSTATE_FEATURE_XGETBV1 : 0) | - (cpu_has_xsaves ? XSTATE_FEATURE_XSAVES : 0); + XSTATE_FEATURE_XSAVEC; + /* PV guest will not support xsaves. */ + /* (cpu_has_xgetbv1 ? XSTATE_FEATURE_XGETBV1 : 0) | + (cpu_has_xsaves ? XSTATE_FEATURE_XSAVES : 0); */ Don't leave this code commented out like this. Just delete it. Agreed, but - mind reminding me again why supporting them for PV guests isn't going to work? xsaves is a cpl0 instruction used to manage state which userspace can't be trusted to handle alone. Xen therefore can't trust PV guests to use it either. The first of these features is Processor Trace. A PV guest able to use xsaves/xrstors would be able to gather trace data of hypervisor execution, or cause the trace buffers to clobber arbitrary physical memory. The features covered by MSR_IA32_XSS can safely be exposed to PV guests, but via a hypercall interface. That covers xsaves, but not xgetbv1 (which also covers XSAVEOPT). That is my error. I will fix it in next version. Jan Thans for your review,Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [BUG] 'o' debug key panics the hypervisor
From: Roger Pau Monné [mailto:roger@citrix.com] Sent: Thursday, August 27, 2015 10:08 PM El 27/08/15 a les 16.01, Jan Beulich ha escrit: On 27.08.15 at 13:29, roger@citrix.com wrote: When using Intel hardware without shared page tables between the IOMMU and EPT (I have not tried if the same happens when sharing the page tables), the following crash happens if I press the 'o' debug key with a HVM guest running. The guest doesn't have any device passed-through. This last thing is the relevant one: No-one ever cared to use 'o' without also passing through a device. I have to admit I discovered this when I thought I was typing at the Dom0 console but it turns out I had it switched to the Xen console... Anyway, it shouldn't crash Xen. Roger. yes, it should be fixed anyway. Thanks Kevin ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH V4 2/4] x86/xsaves: enable xsaves/xrstors in xen
On Wed, Aug 26, 2015 at 11:12:02AM +0100, Andrew Cooper wrote: On 25/08/15 11:54, Shuai Ruan wrote: This patch uses xsaves/xrstors instead of xsaveopt/xrstor to perform the xsave_area switching so that xen itself can benefit from them when available. For xsaves/xrstors only use compact format. Add format conversion support when perform guest os migration. Signed-off-by: Shuai Ruan shuai.r...@linux.intel.com --- xen/arch/x86/domain.c | 2 + xen/arch/x86/domctl.c | 34 --- xen/arch/x86/hvm/hvm.c | 19 ++--- xen/arch/x86/i387.c| 4 ++ xen/arch/x86/traps.c | 7 ++-- xen/arch/x86/xstate.c | 112 ++--- 6 files changed, 132 insertions(+), 46 deletions(-) diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c index 045f6ff..7b8f649 100644 --- a/xen/arch/x86/domain.c +++ b/xen/arch/x86/domain.c @@ -1529,6 +1529,8 @@ static void __context_switch(void) if ( xcr0 != get_xcr0() !set_xcr0(xcr0) ) BUG(); } +if ( cpu_has_xsaves ) +wrmsrl(MSR_IA32_XSS, n-arch.msr_ia32_xss); This is still not appropriate. You need a per cpu variable and only perform lazy writes of the MSR. Ok.I will introduce two helper void set_xss_msr(uint64_t msr_xss); uint64_t get_xss_msr(); to perform lazy writes of the MSR. vcpu_restore_fpu_eager(n); n-arch.ctxt_switch_to(n); } diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c index bf62a88..da136876 100644 --- a/xen/arch/x86/domctl.c +++ b/xen/arch/x86/domctl.c @@ -867,6 +867,7 @@ long arch_do_domctl( if ( domctl-cmd == XEN_DOMCTL_getvcpuextstate ) { unsigned int size; +void * xsave_area; Unnecessary space. Sorry. ret = 0; vcpu_pause(v); @@ -896,9 +897,28 @@ long arch_do_domctl( ret = -EFAULT; offset += sizeof(v-arch.xcr0_accum); -if ( !ret copy_to_guest_offset(evc-buffer, offset, - (void *)v-arch.xsave_area, - size - 2 * sizeof(uint64_t)) ) + +if ( cpu_has_xsaves ) I think this would be more correct to gate on v-arch.xsave_area actually being compressed. I will introduce a helper called bool_t xsave_area_compressed(struct xsave_struct *xsave_area) to check whether the area is compressed or not. +{ +xsave_area = xmalloc_bytes(size); +if ( !xsave_area ) +{ +ret = -ENOMEM; +goto vcpuextstate_out; +} + +save_xsave_states(v, xsave_area, + evc-size - 2*sizeof(uint64_t)); And on a similar note, save_xsave_states() is really uncompress_xsave_area(). On further consideration, it should ASSERT() that the source is currently compressed. Ok. + +if ( !ret copy_to_guest_offset(evc-buffer, offset, + xsave_area, size - + 2 * sizeof(uint64_t)) ) + ret = -EFAULT; + xfree(xsave_area); + } + else if ( !ret copy_to_guest_offset(evc-buffer, offset, + (void *)v-arch.xsave_area, + size - 2 * sizeof(uint64_t)) ) ret = -EFAULT; vcpu_unpause(v); @@ -954,8 +974,12 @@ long arch_do_domctl( v-arch.xcr0_accum = _xcr0_accum; if ( _xcr0_accum XSTATE_NONLAZY ) v-arch.nonlazy_xstate_used = 1; -memcpy(v-arch.xsave_area, _xsave_area, - evc-size - 2 * sizeof(uint64_t)); +if ( cpu_has_xsaves ) +load_xsave_states(v, (void *)_xsave_area, + evc-size - 2*sizeof(uint64_t)); Similarly, load_xsave_states() is really compress_xsave_area(). You should check to see whether the area is already compressed, and just memcpy() in that case. Ok. +else +memcpy(v-arch.xsave_area, (void *)_xsave_area, + evc-size - 2 * sizeof(uint64_t)); vcpu_unpause(v); } else diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c index c957610..dc444ac 100644 --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -2148,8 +2148,12 @@ static int hvm_save_cpu_xsave_states(struct domain *d, hvm_domain_context_t *h) ctxt-xfeature_mask = xfeature_mask; ctxt-xcr0 = v-arch.xcr0; ctxt-xcr0_accum =
Re: [Xen-devel] [PATCH] x86/vmx: fix vmx_is_singlestep_supported return value
From: Tamas K Lengyel [mailto:ta...@tklengyel.com] Sent: Tuesday, August 25, 2015 3:56 AM The function supposed to return a boolean but instead it returned the value 0x800 which is the Intel internal flag for MTF. This has caused various checks using this function to falsely report no MTF capability. Signed-off-by: Tamas K Lengyel tleng...@novetta.com Acked-by: Kevin Tian kevin.t...@intel.com Thanks, Kevin ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH V4 1/4] x86/xsaves: add basic definitions/helpers to support xsaves
On Wed, Aug 26, 2015 at 10:47:22AM +0100, Andrew Cooper wrote: On 25/08/2015 11:54, Shuai Ruan wrote: This patch add basic definitions/helpers which will be used in later patches. Signed-off-by: Shuai Ruan shuai.r...@linux.intel.com Thankyou - this series is looking far better now. --- xen/arch/x86/xstate.c| 137 +++ xen/include/asm-x86/cpufeature.h | 4 ++ xen/include/asm-x86/domain.h | 1 + xen/include/asm-x86/msr-index.h | 2 + xen/include/asm-x86/xstate.h | 11 +++- 5 files changed, 154 insertions(+), 1 deletion(-) diff --git a/xen/arch/x86/xstate.c b/xen/arch/x86/xstate.c index d5f5e3b..3986515 100644 --- a/xen/arch/x86/xstate.c +++ b/xen/arch/x86/xstate.c @@ -29,6 +29,10 @@ static u32 __read_mostly xsave_cntxt_size; /* A 64-bit bitmask of the XSAVE/XRSTOR features supported by processor. */ u64 __read_mostly xfeature_mask; +unsigned int *xstate_offsets, *xstate_sizes; +static unsigned int xstate_features; +static unsigned int xstate_comp_offsets[sizeof(xfeature_mask)*8]; All of these should be tagged as __read_mostly. Ok. + /* Cached xcr0 for fast read */ static DEFINE_PER_CPU(uint64_t, xcr0); @@ -65,6 +69,139 @@ uint64_t get_xcr0(void) return this_cpu(xcr0); } +static void setup_xstate_features(void) +{ +unsigned int eax, ebx, ecx, edx, leaf = 0x2; +unsigned int i; + +xstate_features = fls(xfeature_mask); +xstate_offsets = xzalloc_array(unsigned int, xstate_features); +xstate_sizes = xzalloc_array(unsigned int, xstate_features); These can both be plain xmalloc_array(), as we unconditionally set every member below. As a separate patch (or possibly this one if it isn't too big), you now need to modify the xstate initialisation to return int rather than void. You must also check for allocation failures and bail with -ENOMEM. It is fine for an error like this to be fatal to system or AP startup. Also, merging review from patch 2, this function gets called once per pcpu, meaning that you waste a moderate quantity of memory and repeatedly clobber the xstate_{offsets,sizes} pointers. You should pass in bool_t bsp and, if bsp, allocate the arrays and read out, while if not bsp, check that the ap is identical to the bsp. Ok. + +for (i = 0; i xstate_features ; i++) Style: spaces inside braces, and there shouldn't be one between xstate_features; Also, you can remove i entirely, and use for ( leaf = 2; leaf xstate_features; ++i ) Sorry. +{ +cpuid_count(XSTATE_CPUID, leaf, eax, ebx, ecx, edx); + +xstate_offsets[leaf] = ebx; +xstate_sizes[leaf] = eax; You can drop the ebx and eax temporaries, and as ecx and edx are only around as discard variables, you can get away with having a single tmp variable. Therefore, something like: cpuid_count(XSTATE_CPUID, leaf, xstate_sizes[leaf], xstate_offsets[leaf], tmp, tmp); which also drops the body of the function to a single statement, allowing you to drop the braces. Ok. + +leaf++; +} +} + +static void setup_xstate_comp(void) This function can get away with being __init, as it is based solely on data written by the BSP. Also merging review from patch 2, it only needs calling once. ok. +{ +unsigned int xstate_comp_sizes[sizeof(xfeature_mask)*8]; +unsigned int i; + +/* + * The FP xstates and SSE xstates are legacy states. They are always + * in the fixed offsets in the xsave area in either compacted form + * or standard form. + */ +xstate_comp_offsets[0] = 0; +xstate_comp_offsets[1] = XSAVE_SSE_OFFSET; + +xstate_comp_offsets[2] = FXSAVE_SIZE + XSAVE_HDR_SIZE; + +for (i = 2; i xstate_features; i++) Style - spaces inside braces. Sorry. +{ +if ( (1ull i) xfeature_mask ) +xstate_comp_sizes[i] = xstate_sizes[i]; +else +xstate_comp_sizes[i] = 0; + +if ( i 2 ) +xstate_comp_offsets[i] = xstate_comp_offsets[i-1] ++ xstate_comp_sizes[i-1]; The entire xstate_comp_sizes array (which is quite large) can be removed if you rearrange the loop body as: xstate_comp_offsets[i] = xstate_comp_offsets[i-1] + (((1ul i) xfeature_mask) ? xstate_comp_sizes[i-1] : 0); which also removes the if ( i 2 ) +} As this point, it is probably worth matching xstate_comp_offsets[i-1] against cpuid.0xd[0].ecx. If hardware and Xen disagree on the maximum size of an xsave area, we shouldn't continue. Ok. +} + +static void *get_xsave_addr(struct xsave_struct *xsave, int xstate) +{ +int feature = fls(xstate) - 1; In each callsite, you have already calculated fls(xstate) - 1. It would be better to pass an unsigned int xfeature_idx and avoid the repeated
Re: [Xen-devel] [PATCH V4 3/4] x86/xsaves: enable xsaves/xrstors for hvm guest
On Wed, Aug 26, 2015 at 11:36:53AM +0100, Andrew Cooper wrote: On 25/08/15 11:54, Shuai Ruan wrote: This patch enables xsaves for hvm guest, includes: 1.handle xsaves vmcs init and vmexit. 2.add logic to write/read the XSS msr. Signed-off-by: Shuai Ruan shuai.r...@linux.intel.com Reviewed-by: Andrew Cooper andrew.coop...@citrix.com, given two corrections... Thanks. --- xen/arch/x86/hvm/hvm.c | 43 ++ xen/arch/x86/hvm/vmx/vmcs.c| 6 -- xen/arch/x86/hvm/vmx/vmx.c | 20 ++ xen/arch/x86/xstate.c | 4 ++-- xen/include/asm-x86/hvm/vmx/vmcs.h | 7 +++ xen/include/asm-x86/hvm/vmx/vmx.h | 2 ++ xen/include/asm-x86/xstate.h | 1 + 7 files changed, 79 insertions(+), 4 deletions(-) diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c index dc444ac..57612de 100644 --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -4540,6 +4540,33 @@ void hvm_cpuid(unsigned int input, unsigned int *eax, unsigned int *ebx, *ebx = _eax + _ebx; } } +if ( count == 1 ) +{ +if ( cpu_has_xsaves ) +{ +*ebx = XSTATE_AREA_MIN_SIZE; +if ( v-arch.xcr0 | v-arch.msr_ia32_xss ) +for ( sub_leaf = 2; sub_leaf 63; sub_leaf++ ) +{ +if ( !((v-arch.xcr0 | v-arch.msr_ia32_xss) + (1ULL sub_leaf)) ) Stray hard tabs. Ok. @@ -4771,6 +4804,16 @@ int hvm_msr_write_intercept(unsigned int msr, uint64_t msr_content, return X86EMUL_EXCEPTION; break; +case MSR_IA32_XSS: + /* +* skylake only support bit 8 , but it is +* not support in xen. +*/ I would alter this comment to /* No XSS features currently supported for guests. */ The skylake bits are covered by cpu_has_xsaves. ~Andrew Ok. ___ Xen-devel mailing list Thanks for your reviwe,Andrew Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] x86/vmx: fix vmx_is_singlestep_supported return value
From: Jan Beulich [mailto:jbeul...@suse.com] Sent: Tuesday, August 25, 2015 3:59 PM Tamas K Lengyel tamas.leng...@zentific.com 08/25/15 1:51 AM @@ -1768,7 +1768,7 @@ static void vmx_enable_msr_exit_interception(struct domain *d) static bool_t vmx_is_singlestep_supported(void) { -return cpu_has_monitor_trap_flag; +return cpu_has_monitor_trap_flag ? 1 : 0; Prevailing style would tend towards !!cpu_has_monitor_trap_flag Yeap, you are right. If the maintainers prefer I can resend with that style. This could easily be adjusted upon commit. What I'm wondering is whether this is the right place to fix it: Wouldn't it be better for the cpu_has_* macros themselves to be adjusted so other (future) users won't fall into the same trap (vmx_virtual_intr_delivery_enabled() is a good second example bogusly using int as its return type, and once adjusted to bool_t it would break)? I'm OK with original patch. Above example can be taken care when changing return type. Thanks Kevin ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v8 2/3] Differentiate IO/mem resources tracked by ioreq server
On 8/25/2015 5:40 PM, Wei Liu wrote: On Sun, Aug 23, 2015 at 05:33:17PM +0800, Yu Zhang wrote: Currently in ioreq server, guest write-protected ram pages are tracked in the same rangeset with device mmio resources. Yet unlike device mmio, which can be in big chunks, the guest write- protected pages may be discrete ranges with 4K bytes each. This patch uses a seperate rangeset for the guest ram pages. Note: Previously, a new hypercall or subop was suggested to map write-protected pages into ioreq server. However, it turned out handler of this new hypercall would be almost the same with the existing pair - HVMOP_[un]map_io_range_to_ioreq_server, and there's already a type parameter in this hypercall. So no new hypercall defined, only a new type is introduced. Signed-off-by: Yu Zhang yu.c.zh...@linux.intel.com --- tools/libxc/include/xenctrl.h| 31 tools/libxc/xc_domain.c | 61 For tools bits: Acked-by: Wei Liu wei.l...@citrix.com Thank you, Wei. To other maintainers, do you have any question or suggestion? Any advices would be appreciated! Thanks! :) Yu ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH for 4.6] VT-d: Create IOMMU mappings for RMRR regions if shared EPT is not being used
From: Chen, Tiejun Sent: Thursday, August 27, 2015 5:05 PM On 8/27/2015 4:40 PM, Malcolm Crossley wrote: On 27/08/15 03:59, Chen, Tiejun wrote: This kind of issue is already gone. https://www.mail-archive.com/xen-devel@lists.xen.org/msg32464.html There is a bug in the code you refer to above which results in no IOMMU page table mappings being created if the guest domain is not sharing it's EPT page tables with the IOMMU. set_identity_p2m_entry only configures the EPT page tables and does not configure the IOMMU page tables. Okay, I got what you mean. Instead, could you insert iommu_{map,unmap_page() into {set,clear}_identity_p2m_entry()? I think this can make {set,clear}_identity_p2m_entry approachable in all circumstances. Kevin and Jan, Is this fine? Yes that is the right thing to do. It's a bit surprise to me that it's not there yet. :-) Thanks Kevin ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] VT-d faults with Integrated Intel graphics on 4.6
On 8/27/2015 7:03 PM, Konrad Rzeszutek Wilk wrote: On Thu, Aug 27, 2015 at 11:06:30AM +0800, Chen, Tiejun wrote: On 8/25/2015 10:43 PM, Konrad Rzeszutek Wilk wrote: On Tue, Aug 25, 2015 at 02:55:31PM +0800, Chen, Tiejun wrote: On 8/25/2015 8:19 AM, Tamas K Lengyel wrote: Hi everyone, I saw some people passingly mention this on the list before but just in case it has been missed, my serial is also being spammed with the following printouts with both Xen 4.6 RC1 and the latest staging build: ... (XEN) [VT-D]DMAR:[DMA Read] Request device [:00:02.0] fault addr 33487d7000, iommu reg = 82c000201000 (XEN) [VT-D]DMAR: reason 06 - PTE Read access is not set (XEN) [VT-D]DMAR:[DMA Read] Request device [:00:02.0] fault addr 33487d7000, iommu reg = 82c000201000 (XEN) [VT-D]DMAR: reason 06 - PTE Read access is not set (XEN) [VT-D]DMAR:[DMA Read] Request device [:00:02.0] fault addr 33487d7000, iommu reg = 82c000201000 (XEN) [VT-D]DMAR: reason 06 - PTE Read access is not set (XEN) [VT-D]DMAR:[DMA Read] Request device [:00:02.0] fault addr 33487d7000, iommu reg = 82c000201000 (XEN) [VT-D]DMAR: reason 06 - PTE Read access is not set (XEN) [VT-D]DMAR:[DMA Read] Request device [:00:02.0] fault addr 2610742000, iommu reg = 82c000201000 (XEN) [VT-D]DMAR: reason 07 - Next page table ptr is invalid ... What's your platform? BDW? And how much memory is set to your guest OS? Is see this as well. But oddly enough - only when I use the AMT feature (normally I just use serial console on the machine). The platform is /DQ67SW, BIOS SWQ6710H.86A.0066.2012.1105.1504 11/05/2012 There is no guest OS - this is initial domain. And I boot with 2GB: Released 0 page(s) Xen: [mem 0x-0x00099fff] usable Xen: [mem 0x0009a800-0x000f] reserved Xen: [mem 0x0010-0x1fff] usable Xen: [mem 0x2000-0x201f] reserved Xen: [mem 0x2020-0x3fff] usable Xen: [mem 0x4000-0x401f] reserved Xen: [mem 0x4020-0x80465fff] usable Xen: [mem 0x80466000-0x9e855fff] unusable Xen: [mem 0x9e856000-0x9e85efff] ACPI data Xen: [mem 0x9e85f000-0x9e8a9fff] ACPI NVS Xen: [mem 0x9e8aa000-0x9e8b1fff] unusable Xen: [mem 0x9e8b2000-0x9e9a4fff] reserved Xen: [mem 0x9e9a5000-0x9e9a6fff] unusable Xen: [mem 0x9e9a7000-0x9ebc5fff] reserved Xen: [mem 0x9ebc6000-0x9ebc6fff] unusable Xen: [mem 0x9ebc7000-0x9ebd6fff] reserved Xen: [mem 0x9ebd7000-0x9ebf4fff] ACPI NVS Xen: [mem 0x9ebf5000-0x9ec18fff] reserved Xen: [mem 0x9ec19000-0x9ec5bfff] ACPI NVS Xen: [mem 0x9ec5c000-0x9ee7bfff] reserved Xen: [mem 0x9ee7c000-0x9eff] unusable Xen: [mem 0x9f80-0xbf9f] reserved Xen: [mem 0xfec0-0xfec00fff] reserved Xen: [mem 0xfed1c000-0xfed3] reserved Xen: [mem 0xfed9-0xfed91fff] reserved Xen: [mem 0xfee0-0xfeef] reserved Xen: [mem 0xff00-0x] reserved Xen: [mem 0x0001-0x00043e5f] unusable Just at first glance to fault address, this seems be issued from some As you see those fault addresses are out of the normal memory range here. known erratas on BDS and SKL. I am runnig v4.2-rc8. So I really doubt this is related to some erratas. Currently the pre-fetch unit of IOMMU unit dedicated to IGD can't work well on some platforms, so you can see these wired faults. Do you have some ideas for a solution/patch? I can't reproduce this on my BDW. And this is my assumption as well. So could anyone provide a log with iommu=debug? Thanks Tiejun Thanks! ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH V4 1/4] x86/xsaves: add basic definitions/helpers to support xsaves
On Wed, Aug 26, 2015 at 06:53:06AM -0600, Jan Beulich wrote: On 26.08.15 at 13:41, jbeul...@suse.com wrote: On 26.08.15 at 11:47, andrew.coop...@citrix.com wrote: On 25/08/2015 11:54, Shuai Ruan wrote: --- a/xen/include/asm-x86/cpufeature.h +++ b/xen/include/asm-x86/cpufeature.h @@ -153,6 +153,10 @@ #define X86_FEATURE_RDSEED (7*32+18) /* RDSEED instruction */ #define X86_FEATURE_ADX (7*32+19) /* ADCX, ADOX instructions */ #define X86_FEATURE_SMAP (7*32+20) /* Supervisor Mode Access Prevention */ +#define XSAVEOPT (1 0) +#define XSAVEC (1 1) +#define XGETBV1 (1 2) +#define XSAVES (1 3) This absolutely must be X86_FEATURE_* for consistency, and you will need to introduce a new capability word. Or even better re-use word 2. And make sure to replace the then redundant XSTATE_FEATURE_* definitions in xstate.h. Ok.Thanks I will fix these in next version. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [Draft B] Boot ABI for HVM guests without a device-model
On 27/08/15 09:04, Jan Beulich wrote: On 26.08.15 at 16:44, roger@citrix.com wrote: El 26/08/15 a les 14.12, Andrew Cooper ha escrit: On 26/08/15 13:00, Jan Beulich wrote: This structure is guaranteed to always be placed in memory after the DYM These structures are ...? loaded kernel and modules. There is no requirement for the command line/module information to be after the loaded kernel. All it needs to do is not overlap. IMHO, this is helpful in order to get last used physical address, after which free memory starts. Current FreeBSD implementation relies on this, if we didn't do it that way I would have to calculate where the symtab + strtab ends, which is more complex. But the statement leaves open whether there is any free memory at all after those structures, or whether instead all free memory lives at lower addresses. Nor do I consider it appropriate to take a present (one might say overly simplistic) implementation as a basis for setting arbitrary restrictions. I agree. This sounds like a FreeBSD bug, and absolutely shouldn't be a written restriction in the boot ABI. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [Draft B] Boot ABI for HVM guests without a device-model
El 27/08/15 a les 11.43, Andrew Cooper ha escrit: On 27/08/15 09:04, Jan Beulich wrote: On 26.08.15 at 16:44, roger@citrix.com wrote: El 26/08/15 a les 14.12, Andrew Cooper ha escrit: On 26/08/15 13:00, Jan Beulich wrote: This structure is guaranteed to always be placed in memory after the DYM These structures are ...? loaded kernel and modules. There is no requirement for the command line/module information to be after the loaded kernel. All it needs to do is not overlap. IMHO, this is helpful in order to get last used physical address, after which free memory starts. Current FreeBSD implementation relies on this, if we didn't do it that way I would have to calculate where the symtab + strtab ends, which is more complex. But the statement leaves open whether there is any free memory at all after those structures, or whether instead all free memory lives at lower addresses. Nor do I consider it appropriate to take a present (one might say overly simplistic) implementation as a basis for setting arbitrary restrictions. Can we just state that the hvm_start_info structure and associated metadata is placed after the loaded kernel and modules? Whether there's free memory or not after this is something that the kernel has to figure out by itself, and I wasn't planning to add such a statement to the specification. I agree. This sounds like a FreeBSD bug, and absolutely shouldn't be a written restriction in the boot ABI. Bug? The FreeBSD native loader passes to the FreeBSD kernel the last used address, after which free memory starts. IMHO, it is not a bug, it's just how FreeBSD boots. I understand that Linux might not pass such a parameter, and there are other ways I can use to find this, but they are more complex. We already did something very similar with PV guests, see the comment before the start_info structure: * 3. This the order of bootstrap elements in the initial virtual region: * a. relocated kernel image * b. initial ram disk [mod_start, mod_len] * (may be omitted) * c. list of allocated page frames [mfn_list, nr_pages] * (unless relocated due to XEN_ELFNOTE_INIT_P2M) * d. start_info_t structure[register ESI (x86)] * in case of dom0 this page contains the console info, too * e. unless dom0: xenstore ring page * f. unless dom0: console ring page * g. bootstrap page tables [pt_base and CR3 (x86)] * h. bootstrap stack [register ESP (x86)] IMHO it is important to mention how things are loaded into memory, and placing the hvm_start_info struct after the loaded kernel and modules is also the most natural way to do it, I don't foresee this changing in the future. Roger. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] x86/NUMA: make init_node_heap() respect Xen heap limit
On 27/08/15 09:37, Jan Beulich wrote: On NUMA systems, where we try to use node local memory for the basic control structures of the buddy allocator, this special case needs to take into consideration a possible address width limit placed on the Xen heap. In turn this (but also other, more abstract considerations) requires that xenheap_max_mfn() not be called more than once (at most we might permit it to be called a second time with a larger value than was passed the first time), and be called only before calling end_boot_allocator(). While inspecting all the involved code, a couple of off-by-one issues were found (and are being corrected here at once): - arch_init_memory() cleared one too many page table slots - the highmem_start based invocation of xenheap_max_mfn() passed too big a value - xenheap_max_mfn() calculated the wrong bit count in edge cases Signed-off-by: Jan Beulich jbeul...@suse.com Reviewed-by: Andrew Cooper andrew.coop...@citrix.com ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] tmem vs Xen 4.6
Konrad, I thought I'd remind you of the some progress per release criteria to avoid it becoming subject to removal. The most recent changes date back to spring 2014 afaics... Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [RFC v2 for-4.6 0/2] In-tree feature documentation
On 27/08/15 03:44, Jim Fehlig wrote: On 08/25/2015 04:40 AM, Andrew Cooper wrote: An issue which Xen has is an uncertain support statement for features. Given the success seen with docs/misc/xen-command-line.markdown, and in particular keeping it up to date, introduce a similar system for features. Patch 1 introduces a proposed template (and a makefile tweak to include the new docs/features subdirectory), while patch 2 is a feature document covering the topic of migration. v2 Adds %Revision and #History table, following feedback from v1. This is tagged RFC as I expect people to have different views as to what is useful to include. I would particilarly appreciate feedback on the template before it starts getting used widely. Lars: Does this look like a reasonable counterpart to your formal support statement document? Jim: Per your request at the summit for new information, is patch 2 suitable? Yes. It provides excellent info, with pointers to dig deeper for those interested. Your proposal does raise the bar for feature contribution, but I'm not active enough in the Xen community to know if folks are supportive of the additional overhead. Would new features require a feature doc before committing? Unless there are some screaming objections, I am hoping yes, and that the document becomes the authoritative support statement for the feature. (This is currently a real problem working out the security status of bugs in features lacking a concrete statement of support.) Writing one of these documents is not hard (I hope for there to soon be many examples to refer to), and I would like to see it become common-practice to have as $N/$N on a series. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v2 6/8] tmem: Remove the old tmem control XSM checks as it is part of sysctl hypercall.
The sysctl is where the tmem control operations are done and the XSM checks are done via there. The old mechanism (to check for control tmem op XSM from do_tmem_op) is not needed anymore. CC: Daniel De Graaf dgde...@tycho.nsa.gov Signed-off-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com --- xen/include/xsm/dummy.h | 6 -- xen/include/xsm/xsm.h | 6 -- xen/xsm/dummy.c | 1 - xen/xsm/flask/hooks.c | 6 -- xen/xsm/flask/policy/access_vectors | 2 +- 5 files changed, 1 insertion(+), 20 deletions(-) diff --git a/xen/include/xsm/dummy.h b/xen/include/xsm/dummy.h index bbbfce7..9fe372c 100644 --- a/xen/include/xsm/dummy.h +++ b/xen/include/xsm/dummy.h @@ -427,12 +427,6 @@ static XSM_INLINE int xsm_tmem_op(XSM_DEFAULT_VOID) return xsm_default_action(action, current-domain, NULL); } -static XSM_INLINE int xsm_tmem_control(XSM_DEFAULT_VOID) -{ -XSM_ASSERT_ACTION(XSM_PRIV); -return xsm_default_action(action, current-domain, NULL); -} - static XSM_INLINE long xsm_do_xsm_op(XEN_GUEST_HANDLE_PARAM(xsm_op_t) op) { return -ENOSYS; diff --git a/xen/include/xsm/xsm.h b/xen/include/xsm/xsm.h index 3678a93..ba3caed 100644 --- a/xen/include/xsm/xsm.h +++ b/xen/include/xsm/xsm.h @@ -137,7 +137,6 @@ struct xsm_operations { int (*page_offline)(uint32_t cmd); int (*tmem_op)(void); -int (*tmem_control)(void); long (*do_xsm_op) (XEN_GUEST_HANDLE_PARAM(xsm_op_t) op); #ifdef CONFIG_COMPAT @@ -557,11 +556,6 @@ static inline int xsm_tmem_op(xsm_default_t def) return xsm_ops-tmem_op(); } -static inline int xsm_tmem_control(xsm_default_t def) -{ -return xsm_ops-tmem_control(); -} - static inline long xsm_do_xsm_op (XEN_GUEST_HANDLE_PARAM(xsm_op_t) op) { return xsm_ops-do_xsm_op(op); diff --git a/xen/xsm/dummy.c b/xen/xsm/dummy.c index 21b1bf8..72eba40 100644 --- a/xen/xsm/dummy.c +++ b/xen/xsm/dummy.c @@ -112,7 +112,6 @@ void xsm_fixup_ops (struct xsm_operations *ops) set_to_dummy_if_null(ops, page_offline); set_to_dummy_if_null(ops, tmem_op); -set_to_dummy_if_null(ops, tmem_control); set_to_dummy_if_null(ops, hvm_param); set_to_dummy_if_null(ops, hvm_control); set_to_dummy_if_null(ops, hvm_param_nested); diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c index cfad13c..5f5f181 100644 --- a/xen/xsm/flask/hooks.c +++ b/xen/xsm/flask/hooks.c @@ -1132,11 +1132,6 @@ static inline int flask_tmem_op(void) return domain_has_xen(current-domain, XEN__TMEM_OP); } -static inline int flask_tmem_control(void) -{ -return domain_has_xen(current-domain, XEN__TMEM_CONTROL); -} - static int flask_add_to_physmap(struct domain *d1, struct domain *d2) { return domain_has_perm(d1, d2, SECCLASS_MMU, MMU__PHYSMAP); @@ -1696,7 +1691,6 @@ static struct xsm_operations flask_ops = { .page_offline = flask_page_offline, .tmem_op = flask_tmem_op, -.tmem_control = flask_tmem_control, .hvm_param = flask_hvm_param, .hvm_control = flask_hvm_param, .hvm_param_nested = flask_hvm_param_nested, diff --git a/xen/xsm/flask/policy/access_vectors b/xen/xsm/flask/policy/access_vectors index 71495fd..0aa68f8 100644 --- a/xen/xsm/flask/policy/access_vectors +++ b/xen/xsm/flask/policy/access_vectors @@ -69,7 +69,7 @@ class xen cpupool_op # tmem hypercall (any access) tmem_op -# TMEM_CONTROL command of tmem hypercall +# XEN_SYSCTL_tmem_op command of tmem (part of sysctl) tmem_control # XEN_SYSCTL_scheduler_op with XEN_DOMCTL_SCHEDOP_getinfo, XEN_SYSCTL_sched_id getscheduler -- 2.1.0 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v2 7/8] tmem: Remove extra spaces at end and some hard tabbing.
My editor marks these in red glowing red so removing them to make it easier to focus on code. Signed-off-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com --- xen/common/tmem.c | 30 +++--- 1 file changed, 15 insertions(+), 15 deletions(-) diff --git a/xen/common/tmem.c b/xen/common/tmem.c index 1c331ac..66d2852 100644 --- a/xen/common/tmem.c +++ b/xen/common/tmem.c @@ -111,7 +111,7 @@ struct tmem_pool { atomic_t pgp_count; int pgp_count_max; long obj_count; /* atomicity depends on pool_rwlock held for write */ -long obj_count_max; +long obj_count_max; unsigned long objnode_count, objnode_count_max; uint64_t sum_life_cycles; uint64_t sum_evicted_cycles; @@ -1092,7 +1092,7 @@ static int shared_pool_quit(struct tmem_pool *pool, domid_t cli_id) ASSERT(is_shared(pool)); ASSERT(pool-client != NULL); - + ASSERT_WRITELOCK(tmem_rwlock); pool_destroy_objs(pool, cli_id); list_for_each_entry(sl,pool-share_list, share_list) @@ -1180,7 +1180,7 @@ static struct client *client_create(domid_t cli_id) } if ( !d-is_dying ) { d-tmem_client = client; - client-domain = d; +client-domain = d; } rcu_unlock_domain(d); @@ -1229,7 +1229,7 @@ static bool_t client_over_quota(struct client *client) int total = _atomic_read(client_weight_total); ASSERT(client != NULL); -if ( (total == 0) || (client-weight == 0) || +if ( (total == 0) || (client-weight == 0) || (client-eph_count == 0) ) return 0; return ( ((global_eph_count*100L) / client-eph_count ) @@ -1414,7 +1414,7 @@ static int do_tmem_put_compress(struct tmem_page_descriptor *pgp, xen_pfn_t cmfn void *dst, *p; size_t size; int ret = 0; - + ASSERT(pgp != NULL); ASSERT(pgp-us.obj != NULL); ASSERT_SPINLOCK(pgp-us.obj-obj_spinlock); @@ -1559,7 +1559,7 @@ refind: { /* no puts allowed into a frozen pool (except dup puts) */ if ( client-frozen ) - goto unlock_obj; + goto unlock_obj; } } else @@ -1572,10 +1572,10 @@ refind: write_lock(pool-pool_rwlock); /* -* Parallel callers may already allocated obj and inserted to obj_rb_root -* before us. -*/ -if (!obj_rb_insert(pool-obj_rb_root[oid_hash(oidp)], obj)) + * Parallel callers may already allocated obj and inserted to obj_rb_root + * before us. + */ +if ( !obj_rb_insert(pool-obj_rb_root[oid_hash(oidp)], obj) ) { tmem_free(obj, pool); write_unlock(pool-pool_rwlock); @@ -1945,7 +1945,7 @@ static int do_tmem_new_pool(domid_t this_cli_id, (client-shared_auth_uuid[i][1] == uuid_hi) ) break; if ( i == MAX_GLOBAL_SHARED_POOLS ) - { + { tmem_client_info(Shared auth failed, create non shared pool instead!\n); pool-shared = 0; goto out; @@ -2107,7 +2107,7 @@ static int tmemc_list_client(struct client *c, tmem_cli_va_param_t buf, p-obj_count, p-obj_count_max, p-objnode_count, p-objnode_count_max, p-good_puts, p-puts,p-dup_puts_flushed, p-dup_puts_replaced, - p-no_mem_puts, + p-no_mem_puts, p-found_gets, p-gets, p-flushs_found, p-flushs, p-flush_objs_found, p-flush_objs); if ( sum + n = len ) @@ -2146,7 +2146,7 @@ static int tmemc_list_shared(tmem_cli_va_param_t buf, int off, uint32_t len, p-obj_count, p-obj_count_max, p-objnode_count, p-objnode_count_max, p-good_puts, p-puts,p-dup_puts_flushed, p-dup_puts_replaced, - p-no_mem_puts, + p-no_mem_puts, p-found_gets, p-gets, p-flushs_found, p-flushs, p-flush_objs_found, p-flush_objs); if ( sum + n = len ) @@ -2456,7 +2456,7 @@ static int tmemc_save_get_next_page(int cli_id, uint32_t pool_id, /* process the first one */ pool-cur_pgp = pgp = list_entry((pool-persistent_page_list)-next, struct tmem_page_descriptor,us.pool_pers_pages); -} else if ( list_is_last(pool-cur_pgp-us.pool_pers_pages, +} else if ( list_is_last(pool-cur_pgp-us.pool_pers_pages, pool-persistent_page_list) ) { /* already processed the last one in the list */ @@ -2504,7 +2504,7 @@ static int tmemc_save_get_next_inv(int cli_id, tmem_cli_va_param_t buf, pgp = list_entry((client-persistent_invalidated_list)-next, struct tmem_page_descriptor,client_inv_pages); client-cur_pgp = pgp; -} else if ( list_is_last(client-cur_pgp-client_inv_pages, +} else if ( list_is_last(client-cur_pgp-client_inv_pages,
[Xen-devel] [PATCH v2 5/8] tmem: Move TMEM_CONTROL subop of tmem hypercall to sysctl.
The operations are to be used by an control domain to set parameters, list pools, clients, and to be used during migration. There is no need to have them in the tmem hypercall path. This patch moves code without adding fixes - and in fact in some cases makes the parameters soo long that they hurt eyes - but that is for another patch. Note that in regards to existing users: - Only the control domain could call it - which meant that if a guest called it would get -EPERM, so we are OK there. In practice no guests called this TMEM_CONTROL command. - The spec: https://oss.oracle.com/projects/tmem/dist/documentation/api/tmemspec-v001.pdf mentions: TBD [Not sure if this is really needed.] which is a carte blanche as any to do this! Note: The XSM check is the same - we just move it from do_tmem_op to do_sysctl. Signed-off-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com --- tools/libxc/xc_tmem.c | 104 - tools/libxl/libxl.c| 14 ++-- tools/python/xen/lowlevel/xc/xc.c | 20 ++--- tools/xenstat/libxenstat/src/xenstat.c | 4 +- xen/common/sysctl.c| 7 +- xen/common/tmem.c | 136 - xen/include/public/sysctl.h| 44 +++ xen/include/public/tmem.h | 39 +- xen/include/xen/tmem.h | 3 + xen/include/xen/tmem_xen.h | 1 - xen/xsm/flask/hooks.c | 3 + 11 files changed, 195 insertions(+), 180 deletions(-) diff --git a/tools/libxc/xc_tmem.c b/tools/libxc/xc_tmem.c index 467001b..2f6ea17 100644 --- a/tools/libxc/xc_tmem.c +++ b/tools/libxc/xc_tmem.c @@ -47,27 +47,27 @@ static int do_tmem_op(xc_interface *xch, tmem_op_t *op) int xc_tmem_control(xc_interface *xch, int32_t pool_id, -uint32_t subop, +uint32_t cmd, uint32_t cli_id, uint32_t arg1, uint32_t arg2, void *buf) { -tmem_op_t op; +DECLARE_SYSCTL; DECLARE_HYPERCALL_BOUNCE(buf, arg1, XC_HYPERCALL_BUFFER_BOUNCE_OUT); int rc; -op.cmd = TMEM_CONTROL; -op.pool_id = pool_id; -op.u.ctrl.subop = subop; -op.u.ctrl.cli_id = cli_id; -op.u.ctrl.arg1 = arg1; -op.u.ctrl.arg2 = arg2; -op.u.ctrl.oid[0] = 0; -op.u.ctrl.oid[1] = 0; -op.u.ctrl.oid[2] = 0; - -if ( subop == TMEMC_LIST arg1 != 0 ) +sysctl.cmd = XEN_SYSCTL_tmem_op; +sysctl.u.tmem_op.pool_id = pool_id; +sysctl.u.tmem_op.cmd = cmd; +sysctl.u.tmem_op.cli_id = cli_id; +sysctl.u.tmem_op.arg1 = arg1; +sysctl.u.tmem_op.arg2 = arg2; +sysctl.u.tmem_op.oid[0] = 0; +sysctl.u.tmem_op.oid[1] = 0; +sysctl.u.tmem_op.oid[2] = 0; + +if ( cmd == XEN_SYSCTL_TMEM_OP_LIST arg1 != 0 ) { if ( buf == NULL ) { @@ -81,11 +81,11 @@ int xc_tmem_control(xc_interface *xch, } } -set_xen_guest_handle(op.u.ctrl.buf, buf); +set_xen_guest_handle(sysctl.u.tmem_op.buf, buf); -rc = do_tmem_op(xch, op); +rc = do_sysctl(xch, sysctl); -if (subop == TMEMC_LIST arg1 != 0) +if ( cmd == XEN_SYSCTL_TMEM_OP_LIST arg1 != 0 ) xc_hypercall_bounce_post(xch, buf); return rc; @@ -93,28 +93,28 @@ int xc_tmem_control(xc_interface *xch, int xc_tmem_control_oid(xc_interface *xch, int32_t pool_id, -uint32_t subop, +uint32_t cmd, uint32_t cli_id, uint32_t arg1, uint32_t arg2, struct tmem_oid oid, void *buf) { -tmem_op_t op; +DECLARE_SYSCTL; DECLARE_HYPERCALL_BOUNCE(buf, arg1, XC_HYPERCALL_BUFFER_BOUNCE_OUT); int rc; -op.cmd = TMEM_CONTROL; -op.pool_id = pool_id; -op.u.ctrl.subop = subop; -op.u.ctrl.cli_id = cli_id; -op.u.ctrl.arg1 = arg1; -op.u.ctrl.arg2 = arg2; -op.u.ctrl.oid[0] = oid.oid[0]; -op.u.ctrl.oid[1] = oid.oid[1]; -op.u.ctrl.oid[2] = oid.oid[2]; - -if ( subop == TMEMC_LIST arg1 != 0 ) +sysctl.cmd = XEN_SYSCTL_tmem_op; +sysctl.u.tmem_op.pool_id = pool_id; +sysctl.u.tmem_op.cmd = cmd; +sysctl.u.tmem_op.cli_id = cli_id; +sysctl.u.tmem_op.arg1 = arg1; +sysctl.u.tmem_op.arg2 = arg2; +sysctl.u.tmem_op.oid[0] = oid.oid[0]; +sysctl.u.tmem_op.oid[1] = oid.oid[1]; +sysctl.u.tmem_op.oid[2] = oid.oid[2]; + +if ( cmd == XEN_SYSCTL_TMEM_OP_LIST arg1 != 0 ) { if ( buf == NULL ) { @@ -128,11 +128,11 @@ int xc_tmem_control_oid(xc_interface *xch, } } -set_xen_guest_handle(op.u.ctrl.buf, buf); +set_xen_guest_handle(sysctl.u.tmem_op.buf, buf); -rc = do_tmem_op(xch, op); +rc = do_sysctl(xch, sysctl); -if (subop == TMEMC_LIST arg1 != 0) +if (
[Xen-devel] [PATCH v2 3/8] tmem: remove in xc_tmem_control_oid duplicate set_xen_guest_handle call
We are doing another call to set_xen_guest_handle right after the xc_hypercall_bounce_pre (the correct place to do it). Signed-off-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com --- tools/libxc/xc_tmem.c | 1 - 1 file changed, 1 deletion(-) diff --git a/tools/libxc/xc_tmem.c b/tools/libxc/xc_tmem.c index 8352233..0d1bfb4 100644 --- a/tools/libxc/xc_tmem.c +++ b/tools/libxc/xc_tmem.c @@ -110,7 +110,6 @@ int xc_tmem_control_oid(xc_interface *xch, op.pool_id = pool_id; op.u.ctrl.subop = subop; op.u.ctrl.cli_id = cli_id; -set_xen_guest_handle(op.u.ctrl.buf,buf); op.u.ctrl.arg1 = arg1; op.u.ctrl.arg2 = arg2; op.u.ctrl.oid[0] = oid.oid[0]; -- 2.1.0 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v2 2/8] tmem: Add ASSERT in obj_rb_insert for pool-rwlock lock.
Manipulating the obj- structures requires us to hold the pool-rwlock lock. Lets make that obvious in this function to catch any errant users (none found, but we may in future). Signed-off-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com --- xen/common/tmem.c | 5 + 1 file changed, 5 insertions(+) diff --git a/xen/common/tmem.c b/xen/common/tmem.c index 572944e..567ccc5 100644 --- a/xen/common/tmem.c +++ b/xen/common/tmem.c @@ -915,6 +915,11 @@ static int obj_rb_insert(struct rb_root *root, struct tmem_object_root *obj) { struct rb_node **new, *parent = NULL; struct tmem_object_root *this; +struct tmem_pool *pool; + +pool = obj-pool; +ASSERT(pool != NULL); +ASSERT_WRITELOCK(pool-pool_rwlock); new = (root-rb_node); while ( *new ) -- 2.1.0 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v2 8/8] tmem: Spelling mistakes.
Signed-off-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com --- xen/common/tmem.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/xen/common/tmem.c b/xen/common/tmem.c index 66d2852..9bd75d8 100644 --- a/xen/common/tmem.c +++ b/xen/common/tmem.c @@ -2659,7 +2659,7 @@ long do_tmem_op(tmem_cli_op_t uops) return -EFAULT; } -/* Acquire wirte lock for all command at first */ +/* Acquire write lock for all command at first */ write_lock(tmem_rwlock); if ( op.cmd == TMEM_CONTROL_MOVED ) -- 2.1.0 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v2 4/8] tmem: Remove xc_tmem_control mystical arg3
It mentions it but it is never used. The hypercall interface knows nothing of this sort of thing either. Lets just remove it. Signed-off-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com --- tools/libxc/include/xenctrl.h | 2 +- tools/libxc/xc_tmem.c | 38 -- tools/libxl/libxl.c| 10 - tools/python/xen/lowlevel/xc/xc.c | 7 +++ tools/xenstat/libxenstat/src/xenstat.c | 4 ++-- 5 files changed, 29 insertions(+), 32 deletions(-) diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h index de3c0ad..9e34769 100644 --- a/tools/libxc/include/xenctrl.h +++ b/tools/libxc/include/xenctrl.h @@ -2304,7 +2304,7 @@ int xc_tmem_control_oid(xc_interface *xch, int32_t pool_id, uint32_t subop, struct tmem_oid oid, void *buf); int xc_tmem_control(xc_interface *xch, int32_t pool_id, uint32_t subop, uint32_t cli_id, -uint32_t arg1, uint32_t arg2, uint64_t arg3, void *buf); +uint32_t arg1, uint32_t arg2, void *buf); int xc_tmem_auth(xc_interface *xch, int cli_id, char *uuid_str, int arg1); int xc_tmem_save(xc_interface *xch, int dom, int live, int fd, int field_marker); int xc_tmem_save_extra(xc_interface *xch, int dom, int fd, int field_marker); diff --git a/tools/libxc/xc_tmem.c b/tools/libxc/xc_tmem.c index 0d1bfb4..467001b 100644 --- a/tools/libxc/xc_tmem.c +++ b/tools/libxc/xc_tmem.c @@ -51,7 +51,6 @@ int xc_tmem_control(xc_interface *xch, uint32_t cli_id, uint32_t arg1, uint32_t arg2, -uint64_t arg3, void *buf) { tmem_op_t op; @@ -64,7 +63,6 @@ int xc_tmem_control(xc_interface *xch, op.u.ctrl.cli_id = cli_id; op.u.ctrl.arg1 = arg1; op.u.ctrl.arg2 = arg2; -/* use xc_tmem_control_oid if arg3 is required */ op.u.ctrl.oid[0] = 0; op.u.ctrl.oid[1] = 0; op.u.ctrl.oid[2] = 0; @@ -220,28 +218,28 @@ int xc_tmem_save(xc_interface *xch, uint32_t minusone = -1; struct tmem_handle *h; -if ( xc_tmem_control(xch,0,TMEMC_SAVE_BEGIN,dom,live,0,0,NULL) = 0 ) +if ( xc_tmem_control(xch,0,TMEMC_SAVE_BEGIN,dom,live,0,NULL) = 0 ) return 0; if ( write_exact(io_fd, marker, sizeof(marker)) ) return -1; -version = xc_tmem_control(xch,0,TMEMC_SAVE_GET_VERSION,0,0,0,0,NULL); +version = xc_tmem_control(xch,0,TMEMC_SAVE_GET_VERSION,0,0,0,NULL); if ( write_exact(io_fd, version, sizeof(version)) ) return -1; -max_pools = xc_tmem_control(xch,0,TMEMC_SAVE_GET_MAXPOOLS,0,0,0,0,NULL); +max_pools = xc_tmem_control(xch,0,TMEMC_SAVE_GET_MAXPOOLS,0,0,0,NULL); if ( write_exact(io_fd, max_pools, sizeof(max_pools)) ) return -1; if ( version == -1 || max_pools == -1 ) return -1; if ( write_exact(io_fd, minusone, sizeof(minusone)) ) return -1; -flags = xc_tmem_control(xch,0,TMEMC_SAVE_GET_CLIENT_FLAGS,dom,0,0,0,NULL); +flags = xc_tmem_control(xch,0,TMEMC_SAVE_GET_CLIENT_FLAGS,dom,0,0,NULL); if ( write_exact(io_fd, flags, sizeof(flags)) ) return -1; -weight = xc_tmem_control(xch,0,TMEMC_SAVE_GET_CLIENT_WEIGHT,dom,0,0,0,NULL); +weight = xc_tmem_control(xch,0,TMEMC_SAVE_GET_CLIENT_WEIGHT,dom,0,0,NULL); if ( write_exact(io_fd, weight, sizeof(weight)) ) return -1; -cap = xc_tmem_control(xch,0,TMEMC_SAVE_GET_CLIENT_CAP,dom,0,0,0,NULL); +cap = xc_tmem_control(xch,0,TMEMC_SAVE_GET_CLIENT_CAP,dom,0,0,NULL); if ( write_exact(io_fd, cap, sizeof(cap)) ) return -1; if ( flags == -1 || weight == -1 || cap == -1 ) @@ -258,14 +256,14 @@ int xc_tmem_save(xc_interface *xch, int checksum = 0; /* get pool id, flags, pagesize, n_pages, uuid */ -flags = xc_tmem_control(xch,i,TMEMC_SAVE_GET_POOL_FLAGS,dom,0,0,0,NULL); +flags = xc_tmem_control(xch,i,TMEMC_SAVE_GET_POOL_FLAGS,dom,0,0,NULL); if ( flags != -1 ) { pool_id = i; -n_pages = xc_tmem_control(xch,i,TMEMC_SAVE_GET_POOL_NPAGES,dom,0,0,0,NULL); +n_pages = xc_tmem_control(xch,i,TMEMC_SAVE_GET_POOL_NPAGES,dom,0,0,NULL); if ( !(flags TMEM_POOL_PERSIST) ) n_pages = 0; - (void)xc_tmem_control(xch,i,TMEMC_SAVE_GET_POOL_UUID,dom,sizeof(uuid),0,0,uuid); + (void)xc_tmem_control(xch,i,TMEMC_SAVE_GET_POOL_UUID,dom,sizeof(uuid),0,uuid); if ( write_exact(io_fd, pool_id, sizeof(pool_id)) ) return -1; if ( write_exact(io_fd, flags, sizeof(flags)) ) @@ -290,7 +288,7 @@ int xc_tmem_save(xc_interface *xch, int ret; if ( (ret = xc_tmem_control(xch, pool_id, TMEMC_SAVE_GET_NEXT_PAGE, dom, -bufsize, 0, 0, buf))
[Xen-devel] [PATCH v2] Tmem bug-fixes and cleanups.
Hey! At the Xenhackathon we spoke that the tmem code needs a bit of cleanups and simplification. One of the things that Andrew mentioned was that the TMEM_CONTROL should really be part of the sysctl hypercall. As I ventured this path I realized there were some other issues that need to be taken care of (like shared pools blowing up). This patchset has been tested with 32/64 guests, with a 64 hypervisor and 32 bit toolstack (and also with 64bit toolstack) with success. For fun I've also created an Linux module: http://xenbits.xen.org/gitweb/?p=xentesttools/bootstrap.git;a=blob;f=root_image/drivers/tmem_test/tmem_test.c that I will expand to cover in the future more interesting hypercall uses. Going forward the next step will be to move the 'tmem_control' function to its own file to simplify the code. The patches are also in my git tree: git://xenbits.xen.org/people/konradwilk/xen.git for-4.6/tmem.cleanups.v2 tools/libxc/include/xenctrl.h | 2 +- tools/libxc/xc_tmem.c | 111 ++-- tools/libxl/libxl.c| 22 ++-- tools/python/xen/lowlevel/xc/xc.c | 27 +++-- tools/xenstat/libxenstat/src/xenstat.c | 6 +- xen/common/sysctl.c| 7 +- xen/common/tmem.c | 183 + xen/include/public/sysctl.h| 44 xen/include/public/tmem.h | 39 +-- xen/include/xen/tmem.h | 3 + xen/include/xen/tmem_xen.h | 1 - xen/include/xsm/dummy.h| 6 -- xen/include/xsm/xsm.h | 6 -- xen/xsm/dummy.c| 1 - xen/xsm/flask/hooks.c | 9 +- xen/xsm/flask/policy/access_vectors| 2 +- 16 files changed, 237 insertions(+), 232 deletions(-) Konrad Rzeszutek Wilk (8): tmem: Don't crash/hang/leak hypervisor when using shared pools within an guest. tmem: Add ASSERT in obj_rb_insert for pool-rwlock lock. tmem: remove in xc_tmem_control_oid duplicate set_xen_guest_handle call tmem: Remove xc_tmem_control mystical arg3 tmem: Move TMEM_CONTROL subop of tmem hypercall to sysctl. tmem: Remove the old tmem control XSM checks as it is part of sysctl hypercall. tmem: Remove extra spaces at end and some hard tabbing. tmem: Spelling mistakes. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] VT-d faults with Integrated Intel graphics on 4.6
On Thu, Aug 27, 2015 at 11:06:30AM +0800, Chen, Tiejun wrote: On 8/25/2015 10:43 PM, Konrad Rzeszutek Wilk wrote: On Tue, Aug 25, 2015 at 02:55:31PM +0800, Chen, Tiejun wrote: On 8/25/2015 8:19 AM, Tamas K Lengyel wrote: Hi everyone, I saw some people passingly mention this on the list before but just in case it has been missed, my serial is also being spammed with the following printouts with both Xen 4.6 RC1 and the latest staging build: ... (XEN) [VT-D]DMAR:[DMA Read] Request device [:00:02.0] fault addr 33487d7000, iommu reg = 82c000201000 (XEN) [VT-D]DMAR: reason 06 - PTE Read access is not set (XEN) [VT-D]DMAR:[DMA Read] Request device [:00:02.0] fault addr 33487d7000, iommu reg = 82c000201000 (XEN) [VT-D]DMAR: reason 06 - PTE Read access is not set (XEN) [VT-D]DMAR:[DMA Read] Request device [:00:02.0] fault addr 33487d7000, iommu reg = 82c000201000 (XEN) [VT-D]DMAR: reason 06 - PTE Read access is not set (XEN) [VT-D]DMAR:[DMA Read] Request device [:00:02.0] fault addr 33487d7000, iommu reg = 82c000201000 (XEN) [VT-D]DMAR: reason 06 - PTE Read access is not set (XEN) [VT-D]DMAR:[DMA Read] Request device [:00:02.0] fault addr 2610742000, iommu reg = 82c000201000 (XEN) [VT-D]DMAR: reason 07 - Next page table ptr is invalid ... What's your platform? BDW? And how much memory is set to your guest OS? Is see this as well. But oddly enough - only when I use the AMT feature (normally I just use serial console on the machine). The platform is /DQ67SW, BIOS SWQ6710H.86A.0066.2012.1105.1504 11/05/2012 There is no guest OS - this is initial domain. And I boot with 2GB: Released 0 page(s) Xen: [mem 0x-0x00099fff] usable Xen: [mem 0x0009a800-0x000f] reserved Xen: [mem 0x0010-0x1fff] usable Xen: [mem 0x2000-0x201f] reserved Xen: [mem 0x2020-0x3fff] usable Xen: [mem 0x4000-0x401f] reserved Xen: [mem 0x4020-0x80465fff] usable Xen: [mem 0x80466000-0x9e855fff] unusable Xen: [mem 0x9e856000-0x9e85efff] ACPI data Xen: [mem 0x9e85f000-0x9e8a9fff] ACPI NVS Xen: [mem 0x9e8aa000-0x9e8b1fff] unusable Xen: [mem 0x9e8b2000-0x9e9a4fff] reserved Xen: [mem 0x9e9a5000-0x9e9a6fff] unusable Xen: [mem 0x9e9a7000-0x9ebc5fff] reserved Xen: [mem 0x9ebc6000-0x9ebc6fff] unusable Xen: [mem 0x9ebc7000-0x9ebd6fff] reserved Xen: [mem 0x9ebd7000-0x9ebf4fff] ACPI NVS Xen: [mem 0x9ebf5000-0x9ec18fff] reserved Xen: [mem 0x9ec19000-0x9ec5bfff] ACPI NVS Xen: [mem 0x9ec5c000-0x9ee7bfff] reserved Xen: [mem 0x9ee7c000-0x9eff] unusable Xen: [mem 0x9f80-0xbf9f] reserved Xen: [mem 0xfec0-0xfec00fff] reserved Xen: [mem 0xfed1c000-0xfed3] reserved Xen: [mem 0xfed9-0xfed91fff] reserved Xen: [mem 0xfee0-0xfeef] reserved Xen: [mem 0xff00-0x] reserved Xen: [mem 0x0001-0x00043e5f] unusable Just at first glance to fault address, this seems be issued from some As you see those fault addresses are out of the normal memory range here. known erratas on BDS and SKL. I am runnig v4.2-rc8. So I really doubt this is related to some erratas. Currently the pre-fetch unit of IOMMU unit dedicated to IGD can't work well on some platforms, so you can see these wired faults. Do you have some ideas for a solution/patch? Thanks! ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v3] xen/blkfront: convert to blk-mq APIs
On 25/08/15 03:14, Bob Liu wrote: Hi Rafal, Please have a try adding --iodepth_batch=32 --iodepth_batch_complete=32 to the fio command line. I didn't see this issue any more, neither for domU. Thanks, -Bob Hello, Using 4.2-rc8 kernel, I can confirm that merges are happening after adding this option, both in dom0 and in the guest using null_blk device. Using the latest stable kernel 4.1.6 with the same configuration, there are no merges. Do you know which change might have caused this difference? Below are my results for dom0: modprobe null_blk irqmode=2 completion_nsec=100 queue_mode=1 submit_queues=1 READ: io=22111MB, aggrb=754705KB/s, minb=754705KB/s, maxb=754705KB/s, mint=30001msec, maxt=30001msec Disk stats (read/write): nullb0: ios=352340/0, merge=5285340/0, ticks=397340/0, in_queue=396112, util=95.87% modprobe null_blk irqmode=2 completion_nsec=100 queue_mode=2 submit_queues=1 READ: io=22409MB, aggrb=764838KB/s, minb=764838KB/s, maxb=764838KB/s, mint=30002msec, maxt=30002msec Disk stats (read/write): nullb0: ios=357070/0, merge=5356290/0, ticks=420812/0, in_queue=450772, util=99.32% Thanks, Rafal On 08/21/2015 04:46 PM, Rafal Mielniczuk wrote: On 19/08/15 12:12, Bob Liu wrote: Hi Jens Christoph, Rafal reported an issue about this patch, that's after this patch no more merges happen and the performance dropped if modprobe null_blk irqmode=2 completion_nsec=100, but works fine if modprobe null_blk. I'm not sure whether it's as expect or not. Do you have any suggestions? Thank you! Here is the test result: fio --name=test --ioengine=libaio --rw=read --numjobs=8 --iodepth=32 \ --time_based=1 --runtime=30 --bs=4KB --filename=/dev/xvdb \ --direct=1 --group_reporting=1 --iodepth_batch=16 modprobe null_blk *no patch* (avgrq-sz = 8.00 avgqu-sz=5.00) READ: io=10655MB, aggrb=363694KB/s, minb=363694KB/s, maxb=363694KB/s, mint=30001msec, maxt=30001msec Disk stats (read/write): xvdb: ios=2715852/0, merge=1089/0, ticks=126572/0, in_queue=127456, util=100.00% *with patch* (avgrq-sz = 8.00 avgqu-sz=8.00) READ: io=20655MB, aggrb=705010KB/s, minb=705010KB/s, maxb=705010KB/s, mint=30001msec, maxt=30001msec Disk stats (read/write): xvdb: ios=5274633/0, merge=22/0, ticks=243208/0, in_queue=242908, util=99.98% modprobe null_blk irqmode=2 completion_nsec=100 *no patch* (avgrq-sz = 34.00 avgqu-sz=38.00) READ: io=10372MB, aggrb=354008KB/s, minb=354008KB/s, maxb=354008KB/s, mint=30003msec, maxt=30003msec Disk stats (read/write): xvdb: ios=621760/0, *merge=1988170/0*, ticks=1136700/0, in_queue=1146020, util=99.76% *with patch* (avgrq-sz = 8.00 avgqu-sz=28.00) READ: io=2876.8MB, aggrb=98187KB/s, minb=98187KB/s, maxb=98187KB/s, mint=30002msec, maxt=30002msec Disk stats (read/write): xvdb: ios=734048/0, merge=0/0, ticks=843584/0, in_queue=843080, util=99.72% Regards, -Bob Hello, We got a problem with the lack of merges also when we tested on null_blk device in dom0 directly. When we enabled multi queue block-layer we got no merges, even when we set the number of submission queues to 1. If I don't miss anything, that could suggest the problem lays somewhere in the blk-mq layer itself? Please take a look at the results below: fio --name=test --ioengine=libaio --rw=read --numjobs=8 --iodepth=32 \ --time_based=1 --runtime=30 --bs=4KB --filename=/dev/nullb0 \ --direct=1 --group_reporting=1 modprobe null_blk irqmode=2 completion_nsec=100 queue_mode=1 submit_queues=1 READ: io=13692MB, aggrb=467320KB/s, minb=467320KB/s,
Re: [Xen-devel] [PATCH RFC] xen: if on Xen, flatten the scheduling domain hierarchy
On 08/18/2015 04:55 PM, Dario Faggioli wrote: Hey everyone, So, as a followup of what we were discussing in this thread: [Xen-devel] PV-vNUMA issue: topology is misinterpreted by the guest http://lists.xenproject.org/archives/html/xen-devel/2015-07/msg03241.html I started looking in more details at scheduling domains in the Linux kernel. Now, that thread was about CPUID and vNUMA, and their weird way of interacting, while this thing I'm proposing here is completely independent from them both. In fact, no matter whether vNUMA is supported and enabled, and no matter whether CPUID is reporting accurate, random, meaningful or completely misleading information, I think that we should do something about how scheduling domains are build. Fact is, unless we use 1:1, and immutable (across all the guest lifetime) pinning, scheduling domains should not be constructed, in Linux, by looking at *any* topology information, because that just does not make any sense, when vcpus move around. Let me state this again (hoping to make myself as clear as possible): no matter in how much good shape we put CPUID support, no matter how beautifully and consistently that will interact with both vNUMA, licensing requirements and whatever else. It will be always possible for vCPU #0 and vCPU #3 to be scheduled on two SMT threads at time t1, and on two different NUMA nodes at time t2. Hence, the Linux scheduler should really not skew his load balancing logic toward any of those two situations, as neither of them could be considered correct (since nothing is!). For now, this only covers the PV case. HVM case shouldn't be any different, but I haven't looked at how to make the same thing happen in there as well. OVERALL DESCRIPTION === What this RFC patch does is, in the Xen PV case, configure scheduling domains in such a way that there is only one of them, spanning all the pCPUs of the guest. Note that the patch deals directly with scheduling domains, and there is no need to alter the masks that will then be used for building and reporting the topology (via CPUID, /proc/cpuinfo, /sysfs, etc.). That is the main difference between it and the patch proposed by Juergen here: http://lists.xenproject.org/archives/html/xen-devel/2015-07/msg05088.html This means that when, in future, we will fix CPUID handling and make it comply with whatever logic or requirements we want, that won't have any unexpected side effects on scheduling domains. Information about how the scheduling domains are being constructed during boot are available in `dmesg', if the kernel is booted with the 'sched_debug' parameter. It is also possible to look at /proc/sys/kernel/sched_domain/cpu*, and at /proc/schedstat. With the patch applied, only one scheduling domain is created, called the 'VCPU' domain, spanning all the guest's (or Dom0's) vCPUs. You can tell that from the fact that every cpu* folder in /proc/sys/kernel/sched_domain/ only have one subdirectory ('domain0'), with all the tweaks and the tunables for our scheduling domain. EVALUATION == I've tested this with UnixBench, and by looking at Xen build time, on a 16, 24 and 48 pCPUs hosts. I've run the benchmarks in Dom0 only, for now, but I plan to re-run them in DomUs soon (Juergen may be doing something similar to this in DomU already, AFAUI). I've run the benchmarks with and without the patch applied ('patched' and 'vanilla', respectively, in the tables below), and with different number of build jobs (in case of the Xen build) or of parallel copy of the benchmarks (in the case of UnixBench). What I get from the numbers is that the patch almost always brings benefits, in some cases even huge ones. There are a couple of cases where we regress, but always only slightly so, especially if comparing that to the magnitude of some of the improvement that we get. Bear also in mind that these results are gathered from Dom0, and without any overcommitment at the vCPU level (i.e., nr. vCPUs == nr pCPUs). If we move things in DomU and do overcommit at the Xen scheduler level, I am expecting even better results. RESULTS === To have a quick idea of how a benchmark went, look at the '% improvement' row of each table. I'll put these results online, in a googledoc spreadsheet or something like that, to make them easier to read, as soon as possible. *** Intel(R) Xeon(R) E5620 @ 2.40GHz *** pCPUs 16DOM0 vCPUS 16 *** RAM12285 MB DOM0 Memory 9955 MB *** NUMA nodes 2 === MAKE XEN (lower == better)
Re: [Xen-devel] [PATCH v2 for-4.6 2/2] docs: Migration feature document
On 27/08/15 03:15, Jim Fehlig wrote: On 08/25/2015 04:40 AM, Andrew Cooper wrote: Signed-off-by: Andrew Cooper andrew.coop...@citrix.com --- v2: * %Revision and #History, following template review * Grammar fixes --- docs/features/migration.pandoc | 100 1 file changed, 100 insertions(+) create mode 100644 docs/features/migration.pandoc diff --git a/docs/features/migration.pandoc b/docs/features/migration.pandoc new file mode 100644 index 000..0fd227f --- /dev/null +++ b/docs/features/migration.pandoc @@ -0,0 +1,100 @@ +% Migration +% Revision 1 + +\clearpage + +# Basics +--- - +Status: **Supported** + + Architecture: x86 + + Component: Toolstack +--- - + +# Overview + +Migration is a mechanism to move a virtual machine while the VM is +running. Live migration moves a running virtual machine between two +physical servers, but the same mechanism can be used for non-live +migrate (pause and copy) and suspend/resume from disk. s/migrate/migration/ ? Either is valid here, although it would be better to leave the entire sentence in the same tense. + +# User details + +No hardware requirements, although hypervisor logdirty support is +required for live migration. + +From the command line, `xl migrate/save/restore` are the top level +interactions. e.g. + +xl create my-vm.cfg +xl migrate my-vm localhost + +or + +xl create my-vm.cfg +xl save my-vm /path/to/save/file +xl restore /path/to/save/file + +Xen 4.6 sees the instruction of Migration v2. There is no change for Xen 4.6 sees the introduction of Migration v2. Or, Xen 4.6 introduces Migration v2. Oops. I did mean s/instruction/introduction/. +people using `xl`, although the `libxl` API has had an extension. + +# Technical details + +Migration is formed of several layers. `libxc` is responsible for the +contents of the VM (ram, vcpus, etc) and the live migration loop, while +`libxl` is responsible for items such as emulator state. + +The format of the migration v2 stream is specified in two documents, and +is architecture neutral. Compatibility with legacy streams is +maintained via the `convert-legacy-stream` script which transforms a +legacy stream into a migration v2 stream. + +* Documents +* `docs/specs/libxc-migration-stream.pandoc` +* `docs/specs/libxl-migration-stream.pandoc` +* `libxc` +* `tools/libxc/xc_sr_*.[hc]` +* `libxl` +* `tools/libxl/libxl_stream_{read,write}.c` +* Scripts +* `tools/python/xen/migration/*.py` +* `tools/python/scripts/convert-legacy-stream` +* `tools/python/scripts/verify-stream-v2` + +Users of the `libxl` API have a new parameter `stream_version` in +`domain_restore_params` which is used to distinguish between legacy and +v2 migration streams, and hence whether legacy conversion is required. A question better asked when you posted the patches, but is there a way to detect if the receiver is V2 capable? I suspect sending a V2 stream to a receiver capable only of V1 is not advised :-). libxl has no concept of asking the far side for information, and therefore no concept of advertising its features. For querying the local libxl at build time, LIBXL_HAVE_SRM_V2 and LIBXL_HAVE_SRM_V1 from 3a9ace01 Also, commit id introducing the change would be helpful info. Looks like 3a9ace01 in this case. See the reply (I am about to write) on the 0/0 thread, but I hope that putting git sha's in this document is going to be impossible. + +# Limitations + +Hypervisor logdirty support is incompatible with hardware passthrough, +as IOMMU faults cannot be used to track writes. + +While not a bug in migration specifically, VMs are very sensitive to +changes in cpuid information, and cpuid levelling support currently has +its issues. Extreme care should be taken when migrating VMs between +non-identical CPUs until the cpuid levelling improvements are complete. + +# Areas for improvement + +* Arm support +* Linear P2M support for x86 PV +* Live looping parameters * cpuid levelling The topic of cpuid levelling is covered in the paragraph above, and is logically separate from migration. XenServer for examples uses it work around various guest kernel bugs. It is just as broken in the plain boot case as the migrate case, and I don't wish to confuse the issues. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v2 1/8] tmem: Don't crash/hang/leak hypervisor when using shared pools within an guest.
When we are using shared pools we have an global array (on which we put the pool), and an array of pools per domain. We also have an shared list of clients (guests) _except_ for the very first domain that created the shared pool. To deal with multiple guests using an shared pool we have an ref count and a linked list. Whenever an new user of a the shared pool joins we increase the ref count and add to the linked list. Whenever an user quits the shared pool we decrement and remove from the linked list. Unfortunately this ref counting and linked list never worked properly. There are multiple issues: 1) If we have one shared pool (and only one guest creating it) - we do not add it to the shared list of clients. Which means the logic in 'shared_pool_quit' never removed the pool from global_shared_pools. That meant when the pool was de-allocated - we still had an pointer to the pool which would be accessed by tmemc_list_client (xl tmem-list -a) and hit a NULL page! 2). If we have two shared pools in a domain - it (shared_pool_quit) would remove the domain from the share_list linked list, decrements the refcount to zero - and remove the pool from the global shared pool. When done it would also clear the client-pools[] to NULL for itself. Which is good. However since there are two shared pools in the domain the next entry in the client-pools[] would have a stale pointer to the just de-allocated pool. Accessing it and trying to de-allocate it would lead to crashes or hypervisor hang - depending on the build. Fun times! To trigger this use http://xenbits.xen.org/gitweb/?p=xentesttools/bootstrap.git;a=blob;f=root_image/drivers/tmem_test/tmem_test.c This patch fixes it by making the very first domain that created an shared pool to follow the same logic as every domain that is joining a shared pool. That is increment the refcount and also add itself to the shared list of domains using it. We also remove an ASSERT that incorrectly assumed that only one shared pool would exist for a domain. And to mirror the reporting logic in shared_pool_join we also add a printk to advertise inter-domain shared pool joining. Signed-off-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com --- xen/common/tmem.c | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/xen/common/tmem.c b/xen/common/tmem.c index f2dc26e..572944e 100644 --- a/xen/common/tmem.c +++ b/xen/common/tmem.c @@ -1037,6 +1037,9 @@ static int shared_pool_join(struct tmem_pool *pool, struct client *new_client) tmem_client_info(adding new %s %d to shared pool owned by %s %d\n, tmem_client_str, new_client-cli_id, tmem_client_str, pool-client-cli_id); +else if (pool-shared_count) +tmem_client_info(inter-guest sharing of shared pool %s by client %d\n, + tmem_client_str, pool-client-cli_id); ++pool-shared_count; return 0; } @@ -1056,7 +1059,10 @@ static void shared_pool_reassign(struct tmem_pool *pool) } old_client-pools[pool-pool_id] = NULL; sl = list_entry(pool-share_list.next, struct share_list, share_list); -ASSERT(sl-client != old_client); +/* + * The sl-client can be old_client if there are multiple shared pools + * within an guest. + */ pool-client = new_client = sl-client; for (poolid = 0; poolid MAX_POOLS_PER_DOMAIN; poolid++) if (new_client-pools[poolid] == pool) @@ -1982,6 +1988,8 @@ static int do_tmem_new_pool(domid_t this_cli_id, { INIT_LIST_HEAD(pool-share_list); pool-shared_count = 0; +if (shared_pool_join(pool, client)) +goto fail; global_shared_pools[first_unused_s_poolid] = pool; } } -- 2.1.0 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] tmem vs Xen 4.6
On Thu, Aug 27, 2015 at 04:02:08PM +0800, Bob Liu wrote: On 08/27/2015 02:55 PM, Jan Beulich wrote: Konrad, I thought I'd remind you of the some progress per release criteria to avoid it becoming subject to removal. The most recent changes date back to spring 2014 afaics... Jan AFAIR Konrad holds a few patches, I believe he will send out them shortly (Friday or Monday). Did it just now and also pushed them to a git tree. Thanks! -- Regards, -Bob ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [Draft B] Boot ABI for HVM guests without a device-model
On 27.08.15 at 11:57, roger@citrix.com wrote: El 27/08/15 a les 11.43, Andrew Cooper ha escrit: On 27/08/15 09:04, Jan Beulich wrote: On 26.08.15 at 16:44, roger@citrix.com wrote: El 26/08/15 a les 14.12, Andrew Cooper ha escrit: On 26/08/15 13:00, Jan Beulich wrote: This structure is guaranteed to always be placed in memory after the DYM These structures are ...? loaded kernel and modules. There is no requirement for the command line/module information to be after the loaded kernel. All it needs to do is not overlap. IMHO, this is helpful in order to get last used physical address, after which free memory starts. Current FreeBSD implementation relies on this, if we didn't do it that way I would have to calculate where the symtab + strtab ends, which is more complex. But the statement leaves open whether there is any free memory at all after those structures, or whether instead all free memory lives at lower addresses. Nor do I consider it appropriate to take a present (one might say overly simplistic) implementation as a basis for setting arbitrary restrictions. Can we just state that the hvm_start_info structure and associated metadata is placed after the loaded kernel and modules? I think we should try to avoid introducing any restrictions that aren't technically warranted: The more restrictions we add now, the less flexible we're going to be when we want/need to change some of the implementation later on. I agree. This sounds like a FreeBSD bug, and absolutely shouldn't be a written restriction in the boot ABI. Bug? The FreeBSD native loader passes to the FreeBSD kernel the last used address, after which free memory starts. IMHO, it is not a bug, it's just how FreeBSD boots. I understand that Linux might not pass such a parameter, and there are other ways I can use to find this, but they are more complex. Considering that you claimed that after that free memory starts, when - as said - there might not be any free memory there, it indeed sounds like a bug to me. We already did something very similar with PV guests, see the comment before the start_info structure: I think we'd better avoid leaving basically no room for alterations. It has been problematic on the PV side in at least one case (the ordering of page table pages for compat guests). IMHO it is important to mention how things are loaded into memory, and placing the hvm_start_info struct after the loaded kernel and modules is also the most natural way to do it, I don't foresee this changing in the future. The only thing we should make sure is that all pieces can be easily found, i.e. the kernel doesn't need to do any guessing. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel