Re: [Xen-devel] [PATCH v2 21/23] x86/boot: implement early command line parser in C

2015-08-27 Thread Jan Beulich
 On 20.07.15 at 16:29, daniel.ki...@oracle.com wrote:
 Current early command line parser implementation in assembler
 is very difficult to change to relocatable stuff using segment
 registers. This requires a lot of changes in very weird and
 fragile code. So, reimplement this functionality in C. This
 way code will be relocatable out of the box and much easier
 to maintain.

All appreciated and nice, but the goal of making the code
relocatable by playing with segment registers sounds fragile:
This breaks assumptions the compiler may validly make.

  xen/arch/x86/boot/cmdline.S|  367 -
  xen/arch/x86/boot/cmdline.c|  396 
 

A fundamental expectation I would have had is for the C file to be
noticeably smaller than the assembly file.

 --- /dev/null
 +++ b/xen/arch/x86/boot/cmdline.c
[...]
 +#define VESA_WIDTH   0
 +#define VESA_HEIGHT  1
 +#define VESA_DEPTH   2
 +
 +#define VESA_SIZE3

These should go away in favor of using individual (sub)structure fields.

 +#define __cdecl  __attribute__((__cdecl__))

???

 +#define __packed __attribute__((__packed__))
 +#define __text   __attribute__((__section__(.text)))
 +#define __used   __attribute__((__used__))

Likely better to include compiler.h instead.

 +#define max(x,y) ({ \
 +const typeof(x) _x = (x);   \
 +const typeof(y) _y = (y);   \
 +(void) (_x == _y);\
 +_x  _y ? _x : _y; })

I also wonder whether -imacros .../xen/kernel.h wouldn't be a better
approach here. Please really think hard on how to avoid duplications
like these.

 +#define strlen_static(s) (sizeof(s) - 1)

What is this good for? A decent compiler should be able to deal with
strlen(...). Plus your macro is longer that what it tries to abbreviate.

 +static const char empty_chars[] __text =  \n\r\t;

What is empty about them? DYM blank or (white) space or separator
or delimiter? I also wonder whether \n and \r are actually usefully here,
as they should (if at all) only end the line.

 +/**
 + * strlen - Find the length of a string
 + * @s: The string to be sized
 + */
 +static size_t strlen(const char *s)

Comments are certainly nice, but in this special case I'd rather suggest
against bloating the code by commenting standard library functions.

 +static int strtoi(const char *s, const char *stop, const char **next)
 +{
 +int base = 10, i, ores = 0, res = 0;
 +
 +if ( *s == '0' )
 +  base = (tolower(*++s) == 'x') ? (++s, 16) : 8;
 +
 +for ( ; *s != '\0'; ++s )
 +{
 +for ( i = 0; stop  stop[i] != '\0'; ++i )
 +if ( *s == stop[i] )
 +goto out;
 +
 +if ( *s  '0' || (*s  '7'  base == 8) )
 +{
 +res = -1;
 +goto out;
 +}
 +
 +if ( *s  '9'  (base != 16 || tolower(*s)  'a' || tolower(*s)  
 'f') )
 +{
 +res = -1;
 +goto out;
 +}
 +
 +res *= base;
 +res += (tolower(*s) = 'a') ? (tolower(*s) - 'a' + 10) : (*s - '0');
 +
 +if ( ores  res )
 +{
 +res = -1;
 +goto out;
 +}
 +
 +ores = res;
 +}
 +
 +out:

C labels intended by at least one space please.

 +static const char *find_opt(const char *cmdline, const char *opt, int arg)
 +{
 +size_t lc, lo;
 +static const char mm[] __text = --;

I'd be surprised if there weren't compiler/assembler versions
complaining about a section type conflict here. I can see why you
want everything in one section, but I'd rather suggest achieving
this at the linking step (which I would suppose to already be taking
care of this).

 +static u8 skip_realmode(const char *cmdline)
 +{
 +static const char nrm[] __text = no-real-mode;
 +static const char tboot[] __text = tboot=;
 +
 +if ( find_opt(cmdline, nrm, 0) || find_opt(cmdline, tboot, 1) )
 +return 1;
 +
 +return 0;

return find_opt(cmdline, nrm, 0) || find_opt(cmdline, tboot, 1);

 +static u8 edd_parse(const char *cmdline)
 +{
 +const char *c;
 +size_t la;
 +static const char edd[] __text = edd=;
 +static const char edd_off[] __text = off;
 +static const char edd_skipmbr[] __text = skipmbr;
 +
 +c = find_opt(cmdline, edd, 1);
 +
 +if ( !c )
 +return 0;
 +
 +c += strlen_static(edd);
 +la = strcspn(c, empty_chars);
 +
 +if ( !strncmp(c, edd_off, max(la, strlen_static(edd_off))) )
 +return 2;
 +else if ( !strncmp(c, edd_skipmbr, max(la, strlen_static(edd_skipmbr))) )

Pointless else.

 +return 1;
 +
 +return 0;

And the last two returns can be folded again anyway.

 +static void __cdecl __used cmdline_parse_early(const char *cmdline, 
 early_boot_opts_t *ebo)

I don't see the point of the __cdecl, and (as said before) dislike the
static __used pair.

 --- a/xen/arch/x86/boot/head.S
 +++ b/xen/arch/x86/boot/head.S
 @@ 

Re: [Xen-devel] [PATCH v2 4/8] tmem: Remove xc_tmem_control mystical arg3

2015-08-27 Thread Andrew Cooper
On 27/08/15 12:01, Konrad Rzeszutek Wilk wrote:
 It mentions it but it is never used. The hypercall interface
 knows nothing of this sort of thing either. Lets just remove it.

 Signed-off-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com

It would be nice if you could take the opportunity of changing every
caller to fix the style issues in affected code.  There are a huge
number of missing spaces in the parameter lists of calls to
xc_tmem_control()

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2] Tmem bug-fixes and cleanups.

2015-08-27 Thread Andrew Cooper
On 27/08/15 12:01, Konrad Rzeszutek Wilk wrote:
 Hey!

 At the Xenhackathon we spoke that the tmem code needs a bit of cleanups
 and simplification. One of the things that Andrew mentioned was that the
 TMEM_CONTROL should really be part of the sysctl hypercall. As I ventured
 this path I realized there were some other issues that need to be taken
 care of (like shared pools blowing up).

 This patchset has been tested with 32/64 guests, with a 64 hypervisor
 and 32 bit toolstack (and also with 64bit toolstack) with success.

 For fun I've also created an Linux module:
 http://xenbits.xen.org/gitweb/?p=xentesttools/bootstrap.git;a=blob;f=root_image/drivers/tmem_test/tmem_test.c
 that I will expand to cover in the future more interesting hypercall
 uses.

 Going forward the next step will be to move the 'tmem_control' function to
 its own file to simplify the code.

 The patches are also in my git tree:

Patches 1-4, 6-8 are definitely a net improvement, so Reviewed-by:
Andrew Cooper andrew.coop...@citrix.com, with the proviso that I am
not knowlegable about tmem internals.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 5/8] tmem: Move TMEM_CONTROL subop of tmem hypercall to sysctl.

2015-08-27 Thread Andrew Cooper
On 27/08/15 12:02, Konrad Rzeszutek Wilk wrote:
 --- a/tools/python/xen/lowlevel/xc/xc.c
 +++ b/tools/python/xen/lowlevel/xc/xc.c
 @@ -1808,25 +1808,25 @@ static PyObject *pyxc_tmem_control(XcObject *self,
  pool_id, subop, cli_id, arg1, arg2, buf) )
  return NULL;
  
 -if ( (subop == TMEMC_LIST)  (arg1  32768) )
 +if ( (subop == XEN_SYSCTL_TMEM_OP_LIST)  (arg1  32768) )
  arg1 = 32768;
  
  if ( (rc = xc_tmem_control(self-xc_handle, pool_id, subop, cli_id, 
 arg1, arg2, buffer))  0 )
  return Py_BuildValue(i, rc);
  
  switch (subop) {
 -case TMEMC_LIST:
 +case XEN_SYSCTL_TMEM_OP_LIST:
  return Py_BuildValue(s, buffer);
 -case TMEMC_FLUSH:
 +case XEN_SYSCTL_TMEM_OP_FLUSH:
  return Py_BuildValue(i, rc);
 -case TMEMC_QUERY_FREEABLE_MB:
 +case XEN_SYSCTL_TMEM_OP_QUERY_FREEABLE_MB:
  return Py_BuildValue(i, rc);
 -case TMEMC_THAW:
 -case TMEMC_FREEZE:
 -case TMEMC_DESTROY:
 -case TMEMC_SET_WEIGHT:
 -case TMEMC_SET_CAP:
 -case TMEMC_SET_COMPRESS:
 +case XEN_SYSCTL_TMEM_OP_THAW:
 +case XEN_SYSCTL_TMEM_OP_FREEZE:
 +case XEN_SYSCTL_TMEM_OP_DESTROY:
 +case XEN_SYSCTL_TMEM_OP_SET_WEIGHT:
 +case XEN_SYSCTL_TMEM_OP_SET_CAP:
 +case XEN_SYSCTL_TMEM_OP_SET_COMPRESS:

Any case statements falling through into default like this can simply be
dropped.

 @@ -2555,77 +2556,72 @@ static int tmemc_restore_flush_page(int cli_id, 
 uint32_t pool_id, struct oid *oi
  return do_tmem_flush_page(pool,oidp,index);
  }
  
 -static int do_tmem_control(struct tmem_op *op)
 +int tmem_control(struct xen_sysctl_tmem_op *op)
  {
  int ret;
  uint32_t pool_id = op-pool_id;
 -uint32_t subop = op-u.ctrl.subop;
 -struct oid *oidp = (struct oid *)(op-u.ctrl.oid[0]);
 +uint32_t cmd = op-cmd;
 +struct oid *oidp = (struct oid *)(op-oid[0]);

Please put a struct xen_sysctl_tmem_oid definition in sysctl.h and avoid
this typesystem abuse.

 --- a/xen/include/public/sysctl.h
 +++ b/xen/include/public/sysctl.h
 @@ -710,6 +710,48 @@ struct xen_sysctl_psr_cat_op {
  typedef struct xen_sysctl_psr_cat_op xen_sysctl_psr_cat_op_t;
  DEFINE_XEN_GUEST_HANDLE(xen_sysctl_psr_cat_op_t);
  
 +#define XEN_SYSCTL_TMEM_OP_ALL_CLIENTS 0xU
 +
 +#define XEN_SYSCTL_TMEM_OP_THAW   0
 +#define XEN_SYSCTL_TMEM_OP_FREEZE 1
 +#define XEN_SYSCTL_TMEM_OP_FLUSH  2
 +#define XEN_SYSCTL_TMEM_OP_DESTROY3
 +#define XEN_SYSCTL_TMEM_OP_LIST   4
 +#define XEN_SYSCTL_TMEM_OP_SET_WEIGHT 5
 +#define XEN_SYSCTL_TMEM_OP_SET_CAP6
 +#define XEN_SYSCTL_TMEM_OP_SET_COMPRESS   7
 +#define XEN_SYSCTL_TMEM_OP_QUERY_FREEABLE_MB  8
 +#define XEN_SYSCTL_TMEM_OP_SAVE_BEGIN 10
 +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_VERSION   11
 +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_MAXPOOLS  12
 +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_CLIENT_WEIGHT 13
 +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_CLIENT_CAP14
 +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_CLIENT_FLAGS  15
 +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_POOL_FLAGS16
 +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_POOL_NPAGES   17
 +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_POOL_UUID 18
 +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_NEXT_PAGE 19
 +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_NEXT_INV  20
 +#define XEN_SYSCTL_TMEM_OP_SAVE_END   21
 +#define XEN_SYSCTL_TMEM_OP_RESTORE_BEGIN  30
 +#define XEN_SYSCTL_TMEM_OP_RESTORE_PUT_PAGE   32
 +#define XEN_SYSCTL_TMEM_OP_RESTORE_FLUSH_PAGE 33
 +
 +struct xen_sysctl_tmem_op {
 +uint32_t cmd;   /* IN: XEN_SYSCTL_TMEM_OP_* . */
 +int32_t pool_id;/* IN: 0 by default unless _SAVE_*, RESTORE_* .*/
 +uint32_t cli_id;/* IN: client id, 0 for 
 XEN_SYSCTL_TMEM_QUERY_FREEABLE_MB
 +   for all others can be the domain id or
 +   XEN_SYSCTL_TMEM_OP_ALL_CLIENTS for all. */
 +uint32_t arg1;  /* IN: If not applicable to command use 0. */
 +uint32_t arg2;  /* IN: If not applicable to command use 0. */

Please can this interface be fixed as part of the move, even if it is in
subsequent patches for clarity.

Part of the issue with the XSA-17 security audit was that these args are
completely opaque.

 +uint8_t  pad[4];/* Padding so structure is the same under 32 and 64. 
 */

probably better to use a uint32_t here.

 +uint64_t oid[3];/* IN: If not applicable to command use 0. */
 +XEN_GUEST_HANDLE_64(char) buf; /* IN/OUT: Buffer to save and restore 
 ops. */
 +};
 +typedef struct xen_sysctl_tmem_op xen_sysctl_tmem_op_t;
 +DEFINE_XEN_GUEST_HANDLE(xen_sysctl_tmem_op_t);
 +
  struct xen_sysctl {
  uint32_t cmd;
  #define XEN_SYSCTL_readconsole1
 @@ -734,6 +776,7 @@ struct xen_sysctl {
  #define 

[Xen-devel] [linux-linus test] 60877: regressions - FAIL

2015-08-27 Thread osstest service owner
flight 60877 linux-linus real [real]
http://logs.test-lab.xenproject.org/osstest/logs/60877/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-i386-xl   14 guest-saverestore fail REGR. vs. 59254
 test-amd64-i386-xl-xsm   14 guest-saverestore fail REGR. vs. 59254
 test-amd64-i386-pair   21 guest-migrate/src_host/dst_host fail REGR. vs. 59254

Regressions which are regarded as allowable (not blocking):
 test-amd64-amd64-rumpuserxen-amd64 15 
rumpuserxen-demo-xenstorels/xenstorels.repeat fail REGR. vs. 59254
 test-armhf-armhf-xl-rtds 11 guest-start   fail REGR. vs. 59254
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 16 
guest-start/debianhvm.repeat fail baseline untested
 test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stop fail like 59254
 test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop  fail like 59254
 test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm 9 debian-hvm-install fail 
like 59423-bisect
 test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsm 13 guest-localmigrate 
fail like 60772-bisect

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-pvh-amd  11 guest-start  fail   never pass
 test-armhf-armhf-libvirt-vhd  9 debian-di-installfail   never pass
 test-amd64-i386-libvirt-xsm  12 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  14 guest-saverestorefail   never pass
 test-armhf-armhf-xl-raw   9 debian-di-installfail   never pass
 test-armhf-armhf-xl-qcow2 9 debian-di-installfail   never pass
 test-armhf-armhf-xl-vhd   9 debian-di-installfail   never pass
 test-armhf-armhf-libvirt-raw  9 debian-di-installfail   never pass
 test-amd64-amd64-xl-pvh-intel 14 guest-saverestorefail  never pass
 test-amd64-i386-libvirt  14 guest-saverestorefail   never pass
 test-amd64-i386-libvirt  12 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-qcow2  9 debian-di-installfail never pass
 test-armhf-armhf-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-xsm 14 guest-saverestorefail   never pass
 test-amd64-amd64-libvirt-pair 21 guest-migrate/src_host/dst_host fail never 
pass
 test-amd64-amd64-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-pair 21 guest-migrate/src_host/dst_host fail never pass
 test-armhf-armhf-xl-arndale  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  13 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qcow2 11 migrate-support-checkfail never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-raw 11 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-xsm  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-xsm  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 12 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 13 saverestore-support-checkfail never pass
 test-amd64-amd64-libvirt-vhd 11 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  12 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 12 migrate-support-checkfail   never pass
 test-amd64-amd64-xl-qemut-win7-amd64 17 guest-stop fail never pass
 test-amd64-i386-libvirt-qcow2 11 migrate-support-checkfail  never pass
 test-amd64-i386-libvirt-vhd  11 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-raw  11 migrate-support-checkfail   never pass
 test-amd64-i386-xl-qemut-win7-amd64 17 guest-stop  fail never pass
 test-armhf-armhf-libvirt 14 guest-saverestorefail   never pass
 test-armhf-armhf-libvirt 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 13 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 12 migrate-support-checkfail  never pass

version targeted for testing:
 linuxc13dcf9f2d6f5f06ef1bf79ec456df614c5e058b
baseline version:
 linux45820c294fe1b1a9df495d57f40585ef2d069a39

Last test of basis59254  2015-07-09 04:20:48 Z   49 days
Failing since 59348  2015-07-10 04:24:05 Z   48 days   32 attempts
Testing same since60877  2015-08-25 15:46:48 Z1 days1 attempts


Re: [Xen-devel] [PATCH v2] Tmem bug-fixes and cleanups.

2015-08-27 Thread Wei Liu
On Thu, Aug 27, 2015 at 07:01:55AM -0400, Konrad Rzeszutek Wilk wrote:
 Hey!
 
 At the Xenhackathon we spoke that the tmem code needs a bit of cleanups
 and simplification. One of the things that Andrew mentioned was that the
 TMEM_CONTROL should really be part of the sysctl hypercall. As I ventured
 this path I realized there were some other issues that need to be taken
 care of (like shared pools blowing up).
 
 This patchset has been tested with 32/64 guests, with a 64 hypervisor
 and 32 bit toolstack (and also with 64bit toolstack) with success.
 
 For fun I've also created an Linux module:
 http://xenbits.xen.org/gitweb/?p=xentesttools/bootstrap.git;a=blob;f=root_image/drivers/tmem_test/tmem_test.c
 that I will expand to cover in the future more interesting hypercall
 uses.
 
 Going forward the next step will be to move the 'tmem_control' function to
 its own file to simplify the code.
 
 The patches are also in my git tree:
 
 git://xenbits.xen.org/people/konradwilk/xen.git for-4.6/tmem.cleanups.v2
 
  tools/libxc/include/xenctrl.h  |   2 +-
  tools/libxc/xc_tmem.c  | 111 ++--
  tools/libxl/libxl.c|  22 ++--
  tools/python/xen/lowlevel/xc/xc.c  |  27 +++--
  tools/xenstat/libxenstat/src/xenstat.c |   6 +-
  xen/common/sysctl.c|   7 +-
  xen/common/tmem.c  | 183 
 +
  xen/include/public/sysctl.h|  44 
  xen/include/public/tmem.h  |  39 +--
  xen/include/xen/tmem.h |   3 +
  xen/include/xen/tmem_xen.h |   1 -
  xen/include/xsm/dummy.h|   6 --
  xen/include/xsm/xsm.h  |   6 --
  xen/xsm/dummy.c|   1 -
  xen/xsm/flask/hooks.c  |   9 +-
  xen/xsm/flask/policy/access_vectors|   2 +-
  16 files changed, 237 insertions(+), 232 deletions(-)
 Konrad Rzeszutek Wilk (8):
   tmem: Don't crash/hang/leak hypervisor when using shared pools within 
 an guest.
   tmem: Add ASSERT in obj_rb_insert for pool-rwlock lock.
   tmem: remove in xc_tmem_control_oid duplicate set_xen_guest_handle call
   tmem: Remove xc_tmem_control mystical arg3
   tmem: Move TMEM_CONTROL subop of tmem hypercall to sysctl.
   tmem: Remove the old tmem control XSM checks as it is part of sysctl 
 hypercall.
   tmem: Remove extra spaces at end and some hard tabbing.
   tmem: Spelling mistakes.

This series is pretty isolated to TMEM. I'm in principle happy to let
this series go in:

Release-acked-by: Wei Liu wei.l...@citrix.com

The only one that needs more careful thought is #5, I'm going to leave
that to HV maintainers.

Wei.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] xen 4.6 regression: xl create fails if domU have custom vifname

2015-08-27 Thread Wei Liu
On Thu, Aug 27, 2015 at 04:41:46PM +0200, Fabio Fantoni wrote:
 Today trying xen 4.6.0-rc2 with all things needed for future production
 server I found a regression: xl create fails if domU have custom vifname.
 xl create test.cfg
 Parsing config from test.cfg
 libxl: error: libxl_exec.c:118:libxl_report_child_exitstatus:
 /etc/xen/scripts/vif-bridge add [14581] exited with error status 1
 libxl: error: libxl_device.c:1085:device_hotplug_child_death_cb: script: ip
 link set vif14.0-emu name frc-0-0-emu failed

This is the culprit.

Did you do a clean install?

You need to remove udev files in /etc. I should have mentioned this in
testing instructions.

Wei.

 libxl: error: libxl_create.c:1380:domcreate_attach_vtpms: unable to add nic
 devices
 libxl: error: libxl_exec.c:118:libxl_report_child_exitstatus:
 /etc/xen/scripts/block remove [14615] exited with error status 1
 libxl: error: libxl_device.c:1085:device_hotplug_child_death_cb: script:
 /etc/xen/scripts/block failed; error detected.
 libxl: error: libxl.c:1580:libxl__destroy_domid: non-existant domain 14
 libxl: error: libxl.c:1538:domain_destroy_callback: unable to destroy guest
 with domid 14
 libxl: error: libxl.c:1465:domain_destroy_cb: destruction of domain 14
 failed
 
 From domU xl cfg:
 vif=['bridge=xenbr0,mac=00:16:3e:dd:e3:f1,vifname=frc-0-0']
 
 Same domU removing only vifname is working.
 
 On production server with xen 4.4.2 or 4.5.1 domUs with vifname are working.
 
 I did a fast search without found the exact bug/regression cause.
 
 If you need more informations/tests tell me and I'll post them.
 
 Thanks for any reply and sorry for my bad english.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH] IOMMU: skip domains without page tables when dumping

2015-08-27 Thread Jan Beulich
Reported-by: Roger Pau Monné roger@citrix.com
Signed-off-by: Jan Beulich jbeul...@suse.com

--- a/xen/drivers/passthrough/iommu.c
+++ b/xen/drivers/passthrough/iommu.c
@@ -411,7 +411,7 @@ static void iommu_dump_p2m_table(unsigne
 ops = iommu_get_ops();
 for_each_domain(d)
 {
-if ( is_hardware_domain(d) )
+if ( is_hardware_domain(d) || need_iommu(d) = 0 )
 continue;

 if ( iommu_use_hap_pt(d) )


IOMMU: skip domains without page tables when dumping

Reported-by: Roger Pau Monné roger@citrix.com
Signed-off-by: Jan Beulich jbeul...@suse.com

--- a/xen/drivers/passthrough/iommu.c
+++ b/xen/drivers/passthrough/iommu.c
@@ -411,7 +411,7 @@ static void iommu_dump_p2m_table(unsigne
 ops = iommu_get_ops();
 for_each_domain(d)
 {
-if ( is_hardware_domain(d) )
+if ( is_hardware_domain(d) || need_iommu(d) = 0 )
 continue;
 
 if ( iommu_use_hap_pt(d) )
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] IOMMU: skip domains without page tables when dumping

2015-08-27 Thread Roger Pau Monné
Thanks, this solves the problem.

El 27/08/15 a les 16.04, Jan Beulich ha escrit:
 Reported-by: Roger Pau Monné roger@citrix.com
 Signed-off-by: Jan Beulich jbeul...@suse.com

Tested-by: Roger Pau Monné roger@citrix.com
Acked-by: Roger Pau Monné roger@citrix.com

 --- a/xen/drivers/passthrough/iommu.c
 +++ b/xen/drivers/passthrough/iommu.c
 @@ -411,7 +411,7 @@ static void iommu_dump_p2m_table(unsigne
  ops = iommu_get_ops();
  for_each_domain(d)
  {
 -if ( is_hardware_domain(d) )
 +if ( is_hardware_domain(d) || need_iommu(d) = 0 )
  continue;
  
  if ( iommu_use_hap_pt(d) )
 
 


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [URGENT RFC] Branching and reopening -unstable

2015-08-27 Thread George Dunlap
On Tue, Aug 11, 2015 at 12:13 PM, Ian Jackson ian.jack...@eu.citrix.com wrote:
 B 1(c)(maintainer)/2(a)/3(a)

   Branch.

   Maintainers may choose to defer patch series based on risk of
   conflicts with bugfixes required for 4.6.  Clear communication with
   submitters is required.

   Bugfixes for bugs in 4.6 will be accepted onto the 4.6 branch.
   Maintainers are required to cherry pick them onto unstable.

   Bugfixes will not be accepted for unstable unless it is clear that
   the bug was introduced in unstable since 4.6 branched.

 I am happy with B because it gives the relevant maintainers the
 option.

Did we ever come to a conclusion on this?

FWIW I think Ian's proposal B here is the most sensible: the
maintainer evaluates each patch for how much work it will create them,
and decides whether they want to accept that work or not.

 -George

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] x86/NUMA: make init_node_heap() respect Xen heap limit

2015-08-27 Thread Wei Liu
On Thu, Aug 27, 2015 at 02:37:17AM -0600, Jan Beulich wrote:
 On NUMA systems, where we try to use node local memory for the basic
 control structures of the buddy allocator, this special case needs to
 take into consideration a possible address width limit placed on the
 Xen heap. In turn this (but also other, more abstract considerations)
 requires that xenheap_max_mfn() not be called more than once (at most
 we might permit it to be called a second time with a larger value than
 was passed the first time), and be called only before calling
 end_boot_allocator().
 
 While inspecting all the involved code, a couple of off-by-one issues
 were found (and are being corrected here at once):
 - arch_init_memory() cleared one too many page table slots
 - the highmem_start based invocation of xenheap_max_mfn() passed too
   big a value
 - xenheap_max_mfn() calculated the wrong bit count in edge cases
 
 Signed-off-by: Jan Beulich jbeul...@suse.com

Release-acked-by: Wei Liu wei.l...@citrix.com

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 7/8] tmem: Remove extra spaces at end and some hard tabbing.

2015-08-27 Thread Andrew Cooper
On 27/08/15 12:02, Konrad Rzeszutek Wilk wrote:
 @@ -1559,7 +1559,7 @@ refind:
  {
  /* no puts allowed into a frozen pool (except dup puts) */
  if ( client-frozen )
 - goto unlock_obj;
 + goto unlock_obj;

Need to lose 3 spaces here.

  }
  }
  else
 @@ -1572,10 +1572,10 @@ refind:
  
  write_lock(pool-pool_rwlock);
  /*
 -  * Parallel callers may already allocated obj and inserted to 
 obj_rb_root
 -  * before us.
 -  */
 -if (!obj_rb_insert(pool-obj_rb_root[oid_hash(oidp)], obj))
 + * Parallel callers may already allocated obj and inserted to 
 obj_rb_root
 + * before us.
 + */
 +if ( !obj_rb_insert(pool-obj_rb_root[oid_hash(oidp)], obj) )
  {
  tmem_free(obj, pool);
  write_unlock(pool-pool_rwlock);
 @@ -1945,7 +1945,7 @@ static int do_tmem_new_pool(domid_t this_cli_id,
   (client-shared_auth_uuid[i][1] == uuid_hi) )
  break;
  if ( i == MAX_GLOBAL_SHARED_POOLS )
 - {
 + {

And here.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [BUG] 'o' debug key panics the hypervisor

2015-08-27 Thread Roger Pau Monné
El 27/08/15 a les 14.02, Andrew Cooper ha escrit:
 On 27/08/15 12:29, Roger Pau Monné wrote:
 Hello,

 When using Intel hardware without shared page tables between the IOMMU 
 and EPT (I have not tried if the same happens when sharing the page 
 tables), the following crash happens if I press the 'o' debug key with 
 a HVM guest running. The guest doesn't have any device passed-through.

 (XEN) domain1 IOMMU p2m table:
 (XEN) p2m table has 4 levels
 (XEN) Assertion 'pfn_to_pdx(ma  PAGE_SHIFT)  (DIRECTMAP_SIZE  
 PAGE_SHIFT)' failed at /root/src/xen/xen/include/asm/x86_64/page.h:80
 (XEN) [ Xen-4.6.0-rc  x86_64  debug=y  Tainted:C ]
 (XEN) CPU:0
 (XEN) RIP:e008:[82d080165f20] map_domain_page+0x1ef/0x5e6
 (XEN) RFLAGS: 00010216   CONTEXT: hypervisor
 
 The use of paddr_to_pfn() in map_vtd_domain_page() is incorrect.
 
 Does this fix the issue?

Sadly no. Here is the panic trace with your patch applied:

(XEN) domain1 IOMMU p2m table:
(XEN) p2m table has 4 levels
(XEN) Assertion 'pfn_to_pdx(ma  PAGE_SHIFT)  (DIRECTMAP_SIZE  PAGE_SHIFT)' 
failed at /root/src/xen/xen/include/asm/x86_64/page.h:80
(XEN) [ Xen-4.6.0-rc  x86_64  debug=y  Tainted:C ]
(XEN) CPU:0
(XEN) RIP:e008:[82d080165815] map_domain_page+0x1ef/0x5e6
(XEN) RFLAGS: 00010216   CONTEXT: hypervisor
(XEN) rax: 000a8bcf000e   rbx: a8bcf000e000   rcx: 
(XEN) rdx: 0007c7ff   rsi: dfa9f000   rdi: 000a8bcf000e
(XEN) rbp: 8300dfaf7d60   rsp: 8300dfaf7d20   r8:  
(XEN) r9:  0001   r10: 0004   r11: 0001
(XEN) r12: 83019d263000   r13: 0003   r14: 8300dfdf2000
(XEN) r15: 000ff000   cr0: 8005003b   cr4: 26e0
(XEN) cr3: dfa9f000   cr2: 880012b18c90
(XEN) ds:    es:    fs:    gs:    ss: e010   cs: e008
(XEN) Xen stack trace from rsp=8300dfaf7d20:
(XEN)82d08012cfda 82d0802a1bc0 8300dfaf7d40 a8bcf000e000
(XEN)8300 0003 0080 000ff000
(XEN)8300dfaf7d70 82d080152998 8300dfaf7dc0 82d08014c389
(XEN)0001dfaf7db0 82d08012c71a  0001
(XEN)8300 0003  000ff000
(XEN)8300dfaf7e10 82d08014c402 dfaf7e20 00270001
(XEN)8300dfdf2000 83019ff11000 82d080291240 82d0802688f8
(XEN)82d080339488  8300dfaf7e30 82d08014c4a6
(XEN) 83019ff11000 8300dfaf7e60 82d08014786a
(XEN) 82d0802a1d60 006f 
(XEN)8300dfaf7e90 82d080114906 8300dfaf7e90 82d0802a15c0
(XEN) 82d080339560 8300dfaf7ea0 82d080114940
(XEN)8300dfaf7ec0 82d08012f47d 8300dfaf7ec0 82d080339570
(XEN)8300dfaf7ef0 82d08012f7b3 054782099709 8300dfaf
(XEN)054782099709 8300dfdf2000 8300dfaf7f10 82d0801607c3
(XEN)82d08012c7c4 8300dfdf2000 8300dfaf7e10 
(XEN)  88001faa7e80 8168f160
(XEN)8160 0246 0020 88001f60da80
(XEN)  810013aa 
(XEN)deadbeef deadbeef 0100 810013aa
(XEN) Xen call trace:
(XEN)[82d080165815] map_domain_page+0x1ef/0x5e6
(XEN)[82d080152998] map_vtd_domain_page+0x4a/0x4c
(XEN)[82d08014c389] vtd_dump_p2m_table_level+0x2a/0xf6
(XEN)[82d08014c402] vtd_dump_p2m_table_level+0xa3/0xf6
(XEN)[82d08014c4a6] vtd_dump_p2m_table+0x51/0x59
(XEN)[82d08014786a] iommu_dump_p2m_table+0xaa/0xbe
(XEN)[82d080114906] handle_keypress+0x68/0x8d
(XEN)[82d080114940] keypress_action+0x15/0x17
(XEN)[82d08012f47d] do_tasklet_work+0x78/0xab
(XEN)[82d08012f7b3] do_tasklet+0x5e/0x8a
(XEN)[82d0801607c3] idle_loop+0x56/0x6b
(XEN)
(XEN)
(XEN) 
(XEN) Panic on CPU 0:
(XEN) Assertion 'pfn_to_pdx(ma  PAGE_SHIFT)  (DIRECTMAP_SIZE  PAGE_SHIFT)' 
failed at /root/src/xen/xen/include/asm/x86_64/page.h
(XEN) 
(XEN)
(XEN) Reboot in five seconds...

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Design doc of adding ACPI support for arm64 on Xen - version 4

2015-08-27 Thread Shannon Zhao



On 2015/8/27 15:52, Jan Beulich wrote:

On 27.08.15 at 02:37, julien.gr...@citrix.com wrote:

On 20/08/2015 19:25, Shannon Zhao wrote:

On 2015/8/20 22:06, Jan Beulich wrote:

So can the two of you please sort out whether these are Linux
internal tags (which Xen has no business generating, or even
knowing of) or some form of publicly documented interface?


They are Linux internal tags. But to support Xen ACPI on ARM, we just
reuse the existing mechanism to pass the information to Dom0.


My previous mail was unclear, sorry. I was explaining the current usage
of those properties because you seemed to be concern about the backward
compatibility.

We'd like to formalize those properties in order to use them to pass
information between Xen and the kernel image.


Ah, that clarifies things. Thanks.


This would avoid us to create a Xen specific entry path in the kernel
and use the non-EFI one (also used by the EFI stub) and make easier to
support new DOM0 OS (for instance FreeBSD).

So I totally agree with Shannon mails.


One other aspect completely left off so far is that of proper isolation:
What x86 exposes to Dom0 is specifically limited to what Dom0 is
supposed to know. I'm getting the impression that by exposing more
EFI tables this is being violated just for the purpose of avoiding any
changes to Linux. But maybe I'm misremembering, and all the extra
tables exposed are actually fake ones rather than cloned host ones.



Currently we create EFI system table and EFI memory descriptor table as 
Linux requires. I think the EFI memory descriptor table is necessary. 
What we didn't reach an agreement is only the EFI system table. Yes, we 
only use the Configure table of the EFI system table to get the ACPI 
root address. As you mentioned before, it could pass only the Configure 
table to Dom0, but we should change the process of parsing the DT and 
consider the backwards compatibility.


On the other hand, as the RUNTIME service is not supported, it could 
assign the runtime service members of EFI system table invalid values 
and let Dom0 not initialize RUNTIME service(This could be done by making 
the memory attribute not be EFI_MEMORY_RUNTIME when we create the EFI 
memory descriptor table). If the RUNTIME service is supported in the 
future, it doesn't need to change the Linux again. So it could avoid 
changing back.


Jan, how do you think about this?

Thanks,
--
Shannon

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [BUG] 'o' debug key panics the hypervisor

2015-08-27 Thread Jan Beulich
 On 27.08.15 at 13:29, roger@citrix.com wrote:
 When using Intel hardware without shared page tables between the IOMMU 
 and EPT (I have not tried if the same happens when sharing the page 
 tables), the following crash happens if I press the 'o' debug key with 
 a HVM guest running. The guest doesn't have any device passed-through.

This last thing is the relevant one: No-one ever cared to use 'o' without
also passing through a device.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Design doc of adding ACPI support for arm64 on Xen - version 4

2015-08-27 Thread Shannon Zhao



On 2015/8/27 22:13, Jan Beulich wrote:

On 27.08.15 at 15:50, shannon.z...@linaro.org wrote:

On 2015/8/27 15:52, Jan Beulich wrote:

One other aspect completely left off so far is that of proper isolation:
What x86 exposes to Dom0 is specifically limited to what Dom0 is
supposed to know. I'm getting the impression that by exposing more
EFI tables this is being violated just for the purpose of avoiding any
changes to Linux. But maybe I'm misremembering, and all the extra
tables exposed are actually fake ones rather than cloned host ones.


Currently we create EFI system table and EFI memory descriptor table as
Linux requires. I think the EFI memory descriptor table is necessary.
What we didn't reach an agreement is only the EFI system table. Yes, we
only use the Configure table of the EFI system table to get the ACPI
root address. As you mentioned before, it could pass only the Configure
table to Dom0, but we should change the process of parsing the DT and
consider the backwards compatibility.


A made up system table would (as said before) be fine with me too.
Just not a clone of the host one.



Yeah, it's a made up one.


On the other hand, as the RUNTIME service is not supported, it could
assign the runtime service members of EFI system table invalid values
and let Dom0 not initialize RUNTIME service(This could be done by making
the memory attribute not be EFI_MEMORY_RUNTIME when we create the EFI
memory descriptor table). If the RUNTIME service is supported in the
future, it doesn't need to change the Linux again. So it could avoid
changing back.


I'd strongly advise against such hackery - it will get you (and Xen)
into the bad firmware corner. EFI without runtime services doesn't
exist. And runtime services code/data not marked as such are a
firmware bug (sadly existing in reality on the x86 side). But remember
that under Xen the Dom0 kernel mustn't care about runtime services
(other than wanting to be able to invoke them through hypercalls).



Oh, I see. Thanks for your explanation.

--
Shannon

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] xen 4.6 regression: xl create fails if domU have custom vifname

2015-08-27 Thread Fabio Fantoni
Today trying xen 4.6.0-rc2 with all things needed for future production 
server I found a regression: xl create fails if domU have custom vifname.

xl create test.cfg
Parsing config from test.cfg
libxl: error: libxl_exec.c:118:libxl_report_child_exitstatus: 
/etc/xen/scripts/vif-bridge add [14581] exited with error status 1
libxl: error: libxl_device.c:1085:device_hotplug_child_death_cb: script: 
ip link set vif14.0-emu name frc-0-0-emu failed
libxl: error: libxl_create.c:1380:domcreate_attach_vtpms: unable to add 
nic devices
libxl: error: libxl_exec.c:118:libxl_report_child_exitstatus: 
/etc/xen/scripts/block remove [14615] exited with error status 1
libxl: error: libxl_device.c:1085:device_hotplug_child_death_cb: script: 
/etc/xen/scripts/block failed; error detected.

libxl: error: libxl.c:1580:libxl__destroy_domid: non-existant domain 14
libxl: error: libxl.c:1538:domain_destroy_callback: unable to destroy 
guest with domid 14
libxl: error: libxl.c:1465:domain_destroy_cb: destruction of domain 14 
failed


From domU xl cfg:
vif=['bridge=xenbr0,mac=00:16:3e:dd:e3:f1,vifname=frc-0-0']

Same domU removing only vifname is working.

On production server with xen 4.4.2 or 4.5.1 domUs with vifname are working.

I did a fast search without found the exact bug/regression cause.

If you need more informations/tests tell me and I'll post them.

Thanks for any reply and sorry for my bad english.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2] libxenstore: prefer using the character device

2015-08-27 Thread Jonathan Creekmore
With the addition of FMODE_ATOMIC_POS in the Linux 3.14 kernel,
concurrent blocking file accesses to a single open file descriptor can
cause a deadlock trying to grab the file position lock. If a watch has
been set up, causing a read_thread to blocking read on the file
descriptor, then future writes that would cause the background read to
complete will block waiting on the file position lock before they can
execute. This race condition only occurs when libxenstore is accessing
the xenstore daemon through the /proc/xen/xenbus file and not through
the unix domain socket, which is the case when the xenstore daemon is
running as a stub domain or when oxenstored is passed
--disable-socket. Accessing the daemon from the true character device
also does not exhibit this problem.

On Linux, prefer using the character device file over the proc file if
the character device exists.

Signed-off-by: Jonathan Creekmore jonathan.creekm...@gmail.com
---
 tools/xenstore/xs_lib.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/tools/xenstore/xs_lib.c b/tools/xenstore/xs_lib.c
index af4f75a..0c7744e 100644
--- a/tools/xenstore/xs_lib.c
+++ b/tools/xenstore/xs_lib.c
@@ -81,6 +81,8 @@ const char *xs_domain_dev(void)
 #if defined(__RUMPUSER_XEN__) || defined(__RUMPRUN__)
return /dev/xen/xenbus;
 #elif defined(__linux__)
+   if (access(/dev/xen/xenbus, F_OK) == 0)
+   return /dev/xen/xenbus;
return /proc/xen/xenbus;
 #elif defined(__NetBSD__)
return /kern/xen/xenbus;
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Design doc of adding ACPI support for arm64 on Xen - version 4

2015-08-27 Thread Jan Beulich
 On 27.08.15 at 15:50, shannon.z...@linaro.org wrote:
 On 2015/8/27 15:52, Jan Beulich wrote:
 One other aspect completely left off so far is that of proper isolation:
 What x86 exposes to Dom0 is specifically limited to what Dom0 is
 supposed to know. I'm getting the impression that by exposing more
 EFI tables this is being violated just for the purpose of avoiding any
 changes to Linux. But maybe I'm misremembering, and all the extra
 tables exposed are actually fake ones rather than cloned host ones.
 
 Currently we create EFI system table and EFI memory descriptor table as 
 Linux requires. I think the EFI memory descriptor table is necessary. 
 What we didn't reach an agreement is only the EFI system table. Yes, we 
 only use the Configure table of the EFI system table to get the ACPI 
 root address. As you mentioned before, it could pass only the Configure 
 table to Dom0, but we should change the process of parsing the DT and 
 consider the backwards compatibility.

A made up system table would (as said before) be fine with me too.
Just not a clone of the host one.

 On the other hand, as the RUNTIME service is not supported, it could 
 assign the runtime service members of EFI system table invalid values 
 and let Dom0 not initialize RUNTIME service(This could be done by making 
 the memory attribute not be EFI_MEMORY_RUNTIME when we create the EFI 
 memory descriptor table). If the RUNTIME service is supported in the 
 future, it doesn't need to change the Linux again. So it could avoid 
 changing back.

I'd strongly advise against such hackery - it will get you (and Xen)
into the bad firmware corner. EFI without runtime services doesn't
exist. And runtime services code/data not marked as such are a
firmware bug (sadly existing in reality on the x86 side). But remember
that under Xen the Dom0 kernel mustn't care about runtime services
(other than wanting to be able to invoke them through hypercalls).

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 22/23] x86: make Xen early boot code relocatable

2015-08-27 Thread Jan Beulich
 On 20.07.15 at 16:29, daniel.ki...@oracle.com wrote:
   - %fs register is filled with segment descriptor which describes memory 
 region
 with Xen image (it could be relocated or not);

This is too fuzzy. Please be very precise which region it is that %fs
is supposed to point to (so that reviewers have a chance to validate
uses).

 --- a/xen/arch/x86/Makefile
 +++ b/xen/arch/x86/Makefile
 @@ -72,8 +72,10 @@ efi-$(x86_64) := $(shell if [ ! -r 
 $(BASEDIR)/include/xen/compile.h -o \
   echo '$(TARGET).efi'; fi)
  
  $(TARGET): $(TARGET)-syms $(efi-y) boot/mkelf32
 - ./boot/mkelf32 $(TARGET)-syms $(TARGET) 0x10 \
 - `$(NM) -nr $(TARGET)-syms | head -n 1 | sed -e 's/^\([^ ]*\).*/0x\1/'`
 +#THIS IS UGLY HACK! PLEASE DO NOT COMPLAIN. I WILL FIX IT IN NEXT 
 RELEASE.
 + ./boot/mkelf32 $(TARGET)-syms $(TARGET) $(XEN_IMG_PHYS_START) 
 0x82d08100
 +#./boot/mkelf32 $(TARGET)-syms $(TARGET) 0x10 \
 +#`$(NM) -nr $(TARGET)-syms | head -n 1 | sed -e 's/^\([^ ]*\).*/0x\1/'`

I do complain nevertheless: Such a patch should be tagged RFC. And
with such a hack in place I'm not even sure it's worth reviewing.

 --- a/xen/arch/x86/boot/head.S
 +++ b/xen/arch/x86/boot/head.S
 @@ -12,13 +12,15 @@
  .text
  .code32
  
 -#define sym_phys(sym) ((sym) - __XEN_VIRT_START)
 +#define sym_phys(sym) ((sym) - __XEN_VIRT_START + XEN_IMG_PHYS_START - 
 XEN_IMG_OFFSET)
 +#define sym_offset(sym)   ((sym) - __XEN_VIRT_START)

With XEN_IMG_PHYS_START == XEN_IMG_OFFSET - what's the point?
(If there is a point, then there's obviously a comment missing here
explaining things, including when to use which.)

 @@ -105,12 +107,13 @@ multiboot1_header_end:
  
  .word   0
  gdt_boot_descr:
 -.word   6*8-1
 -.long   sym_phys(trampoline_gdt)
 +.word   7*8-1
 +gdt_boot_descr_addr:

gdt_boot_base would seem a better name; with C in mind what
you use currently could be easily mistaken for a variable holding
gdt_boot_descr.

 @@ -263,12 +271,8 @@ __start:
  cld
  cli
  
 -/* Initialise GDT and basic data segments. */
 -lgdt%cs:sym_phys(gdt_boot_descr)
 -mov $BOOT_DS,%ecx
 -mov %ecx,%ds
 -mov %ecx,%es
 -mov %ecx,%ss
 +/* Load default Xen image base address. */
 +mov $sym_phys(__image_base__),%ebp

Isn't this always zero? I.e. wouldn't you better use xor %ebp,%ebp?

  /* Save the Multiboot info struct (after relocation) for later use. 
 */
 -mov $sym_phys(cpu0_stack)+1024,%esp
 +lea (sym_offset(cpu0_stack)+1024)(%ebp),%esp

Please avoid obfuscating the code by adding pointless parentheses.

 @@ -390,72 +432,111 @@ trampoline_setup:
  
  /* Stash TSC to calculate a good approximation of time-since-boot */
  rdtsc
 -mov %eax,sym_phys(boot_tsc_stamp)
 -mov %edx,sym_phys(boot_tsc_stamp+4)
 +mov %eax,%fs:sym_offset(boot_tsc_stamp)
 +mov %edx,%fs:sym_offset(boot_tsc_stamp+4)
  
 -/* Initialise L2 boot-map page table entries (16MB). */
 -mov $sym_phys(l2_bootmap),%edx
 -mov $PAGE_HYPERVISOR|_PAGE_PSE,%eax
 -mov $8,%ecx
 +/* Update frame addreses in page tables. */
 +lea sym_offset(__page_tables_start)(%ebp),%edx
 +mov $((__page_tables_end-__page_tables_start)/8),%ecx
 +1:  testl   $_PAGE_PRESENT,(%edx)
 +jz  2f
 +add %ebp,(%edx)
 +2:  add $8,%edx
 +loop1b
 +
 +/* Initialise L2 boot-map page table entries (14MB). */
 +lea sym_offset(l2_bootmap)(%ebp),%edx
 +lea sym_offset(start)(%ebp),%eax
 +and $~((1L2_PAGETABLE_SHIFT)-1),%eax
 +mov %eax,%ebx
 +shr $(L2_PAGETABLE_SHIFT-3),%ebx
 +and $(L2_PAGETABLE_ENTRIES*4*8-1),%ebx
 +add %ebx,%edx
 +add $(PAGE_HYPERVISOR|_PAGE_PSE),%eax
 +mov $7,%ecx
  1:  mov %eax,(%edx)
  add $8,%edx
  add $(1L2_PAGETABLE_SHIFT),%eax
  loop1b
 +
  /* Initialise L3 boot-map page directory entry. */
 -mov $sym_phys(l2_bootmap)+__PAGE_HYPERVISOR,%eax
 -mov %eax,sym_phys(l3_bootmap) + 0*8
 +lea (sym_offset(l2_bootmap)+__PAGE_HYPERVISOR)(%ebp),%eax
 +lea sym_offset(l3_bootmap)(%ebp),%ebx
 +mov $4,%ecx
 +1:  mov %eax,(%ebx)
 +add $8,%ebx
 +add $(L2_PAGETABLE_ENTRIES*8),%eax
 +loop1b
 +
 +/* Initialise L2 direct map page table entries (14MB). */
 +lea sym_offset(l2_identmap)(%ebp),%edx
 +lea sym_offset(start)(%ebp),%eax
 +and $~((1L2_PAGETABLE_SHIFT)-1),%eax
 +mov %eax,%ebx
 +shr $(L2_PAGETABLE_SHIFT-3),%ebx
 +and $(L2_PAGETABLE_ENTRIES*4*8-1),%ebx
 +add 

Re: [Xen-devel] [BUG] 'o' debug key panics the hypervisor

2015-08-27 Thread Roger Pau Monné
El 27/08/15 a les 16.01, Jan Beulich ha escrit:
 On 27.08.15 at 13:29, roger@citrix.com wrote:
 When using Intel hardware without shared page tables between the IOMMU 
 and EPT (I have not tried if the same happens when sharing the page 
 tables), the following crash happens if I press the 'o' debug key with 
 a HVM guest running. The guest doesn't have any device passed-through.
 
 This last thing is the relevant one: No-one ever cared to use 'o' without
 also passing through a device.

I have to admit I discovered this when I thought I was typing at the
Dom0 console but it turns out I had it switched to the Xen console...
Anyway, it shouldn't crash Xen.

Roger.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 22/23] x86: make Xen early boot code relocatable

2015-08-27 Thread Ben Hildred
On Thu, Aug 27, 2015 at 9:29 AM, Jan Beulich jbeul...@suse.com wrote:

  On 27.08.15 at 17:10, daniel.ki...@oracle.com wrote:
  On Thu, Aug 27, 2015 at 07:12:38AM -0600, Jan Beulich wrote:
   On 20.07.15 at 16:29, daniel.ki...@oracle.com wrote:
/* Copy bootstrap trampoline to low memory, below 1MB. */
   -mov $sym_phys(trampoline_start),%esi
   +lea sym_offset(trampoline_start)(%ebp),%esi
mov $trampoline_end - trampoline_start,%ecx
rep movsb
  
 
  The latest at this final hunk I'm getting tired (and upset). I'd much
  rather not touch all this code in these fragile ways, and instead
  require Xen to be booted without grub2 on badly written firmware
  like the one on the machine you quote in the description.
 
  Let's start discussion from here. My further work on this patch series
  is pointless until we do not agree details in this case.
 
  So, I agree that any software (including firmware) should not use
  precious resources (i.e. low memory in that case) if it is not strictly
  required. And I do not think so that latest UEFI implementations
  on new hardware really need this low memory to survive (at least page
  tables could be put anywhere above low memory). However, in case of
  UEFI it is a wish of smart people not requirement set by spec. I was
  checking UEFI docs a few times but I was not able to find anything
  which says that e.g. ...developer MUST allocate memory outside of low
  memory  Even I have not found any suggestion about that. However,
  I must admit that I do not know whole UEFI spec, so, if you know
 something
  which is similar to above please tell me where it is (title, revision,
  page, paragraph, etc.). Hence, if there is not strict requirement in
  regards to memory allocation in specs we are lost. Of course we can
  encourage people to not do nasty things but I do not believe that many
  will listen. So, sadly, I suppose that we will see more and more machines
  in the wild which are broken (well, let's say do not align to good
  practices).
 
  On the other hand I think that we should not assume that a given memory
  region (in our case starting at 1 MiB) is (or will be) available in every
  machine. I have not seen anything which says that it is true. We should
  stop doing that even if it works right now. I think that it works by
  chance and there is a chance that it will stop working one day because
  somebody will discover that he or she can put there e.g. device or hole.
 
  Last but not least. I suppose that Xen users and especially distros will
  complain when they are not able to use GRUB2 with Xen on every platform
  just because somebody (i.e. platform designers, developers, or whatever)
  do not accept our beliefs or we require that platform must obey rules
  (i.e. memory map requirements) which are specified nowhere.

 You're right, there's no such requirement on memory use in the spec.
 But you're missing the point. Supporting grub2 on UEFI is already a
 hack (ignoring all intentions EFI had from its first days). And now
 you've found an environment where that hack needs another hack
 (in Xen) to actually work. That's too much hackery for my taste, the
 more that things on this system can (afaict) work quite okay (without
 grub2, or with using its chainloader mechanism).

 If you advocate direct booting ( no boot loader)  on production machines I
wont argue much, as long as there is good recovery tools to deal with
failed boots (grub does this very well, I am not aware of anything
comparable that is pure efi). however the other use case which care more
about is dual booting. I am a novice when it comes to Xen, although
otherwise competent. The test machines I have for playing with Xen are also
used for other tests, some of which require raw hardware support, so I want
the ability to use a boot menu. I am aware of refit, but it does not have
serial support which makes automating testing more difficult. and I have
not yet got ipxe to successfully boot Xen (although this is purely because
I have not yet devoted enough time to that project and I am not yet
familiar with how Xen boots).

So no, I'm still not convinced of the need for this patch.


I am at a loss for alternatives. Yes grub on efi violates the spirit of
efi. Propose a better way forward rather than deriding those who have found
a successful, portable way around the limitations of efi implementations.


 Jan


 ___
 Grub-devel mailing list
 grub-de...@gnu.org
 https://lists.gnu.org/mailman/listinfo/grub-devel




-- 
--
Ben Hildred
Automation Support Services
303 815 6721
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] x86: wrap kexec feature with CONFIG_KEXEC

2015-08-27 Thread Jan Beulich
 On 27.08.15 at 17:22, andrew.coop...@citrix.com wrote:
 On 27/08/15 15:47, Jonathan Creekmore wrote:
 @@ -812,7 +816,11 @@ ENTRY(hypercall_args_table)
  .byte 2 /* do_hvm_op*/
  .byte 1 /* do_sysctl*/  /* 35 */
  .byte 1 /* do_domctl*/
 +#ifdef CONFIG_KEXEC
  .byte 2 /* do_kexec */
 +#else
 +.byte 0 /* do_ni_hypercall  */
 +#endif
 
 These changes will corrupt guest registers in debug builds.

It's quite the other way around: Other than now (where two
registers get clobbered), no registers would be clobbered
anymore. Of course that's not to say the argument count
tables shouldn't be left alone.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH RFC] xen: if on Xen, flatten the scheduling domain hierarchy

2015-08-27 Thread George Dunlap
On Thu, Aug 27, 2015 at 11:24 AM, George Dunlap
george.dun...@citrix.com wrote:
 On 08/18/2015 04:55 PM, Dario Faggioli wrote:
 Hey everyone,

 So, as a followup of what we were discussing in this thread:

  [Xen-devel] PV-vNUMA issue: topology is misinterpreted by the guest
  http://lists.xenproject.org/archives/html/xen-devel/2015-07/msg03241.html

 I started looking in more details at scheduling domains in the Linux
 kernel. Now, that thread was about CPUID and vNUMA, and their weird way
 of interacting, while this thing I'm proposing here is completely
 independent from them both.

 In fact, no matter whether vNUMA is supported and enabled, and no matter
 whether CPUID is reporting accurate, random, meaningful or completely
 misleading information, I think that we should do something about how
 scheduling domains are build.

 Fact is, unless we use 1:1, and immutable (across all the guest
 lifetime) pinning, scheduling domains should not be constructed, in
 Linux, by looking at *any* topology information, because that just does
 not make any sense, when vcpus move around.

 Let me state this again (hoping to make myself as clear as possible): no
 matter in  how much good shape we put CPUID support, no matter how
 beautifully and consistently that will interact with both vNUMA,
 licensing requirements and whatever else. It will be always possible for
 vCPU #0 and vCPU #3 to be scheduled on two SMT threads at time t1, and
 on two different NUMA nodes at time t2. Hence, the Linux scheduler
 should really not skew his load balancing logic toward any of those two
 situations, as neither of them could be considered correct (since
 nothing is!).

 For now, this only covers the PV case. HVM case shouldn't be any
 different, but I haven't looked at how to make the same thing happen in
 there as well.

 OVERALL DESCRIPTION
 ===
 What this RFC patch does is, in the Xen PV case, configure scheduling
 domains in such a way that there is only one of them, spanning all the
 pCPUs of the guest.

 Note that the patch deals directly with scheduling domains, and there is
 no need to alter the masks that will then be used for building and
 reporting the topology (via CPUID, /proc/cpuinfo, /sysfs, etc.). That is
 the main difference between it and the patch proposed by Juergen here:
 http://lists.xenproject.org/archives/html/xen-devel/2015-07/msg05088.html

 This means that when, in future, we will fix CPUID handling and make it
 comply with whatever logic or requirements we want, that won't have  any
 unexpected side effects on scheduling domains.

 Information about how the scheduling domains are being constructed
 during boot are available in `dmesg', if the kernel is booted with the
 'sched_debug' parameter. It is also possible to look
 at /proc/sys/kernel/sched_domain/cpu*, and at /proc/schedstat.

 With the patch applied, only one scheduling domain is created, called
 the 'VCPU' domain, spanning all the guest's (or Dom0's) vCPUs. You can
 tell that from the fact that every cpu* folder
 in /proc/sys/kernel/sched_domain/ only have one subdirectory
 ('domain0'), with all the tweaks and the tunables for our scheduling
 domain.

 EVALUATION
 ==
 I've tested this with UnixBench, and by looking at Xen build time, on a
 16, 24 and 48 pCPUs hosts. I've run the benchmarks in Dom0 only, for
 now, but I plan to re-run them in DomUs soon (Juergen may be doing
 something similar to this in DomU already, AFAUI).

 I've run the benchmarks with and without the patch applied ('patched'
 and 'vanilla', respectively, in the tables below), and with different
 number of build jobs (in case of the Xen build) or of parallel copy of
 the benchmarks (in the case of UnixBench).

 What I get from the numbers is that the patch almost always brings
 benefits, in some cases even huge ones. There are a couple of cases
 where we regress, but always only slightly so, especially if comparing
 that to the magnitude of some of the improvement that we get.

 Bear also in mind that these results are gathered from Dom0, and without
 any overcommitment at the vCPU level (i.e., nr. vCPUs == nr pCPUs). If
 we move things in DomU and do overcommit at the Xen scheduler level, I
 am expecting even better results.

 RESULTS
 ===
 To have a quick idea of how a benchmark went, look at the '%
 improvement' row of each table.

 I'll put these results online, in a googledoc spreadsheet or something
 like that, to make them easier to read, as soon as possible.

 *** Intel(R) Xeon(R) E5620 @ 2.40GHz
 *** pCPUs  16DOM0 vCPUS  16
 *** RAM12285 MB  DOM0 Memory 9955 MB
 *** NUMA nodes 2
 ===
 MAKE XEN (lower == better)
 ===
 # of build jobs -j1

Re: [Xen-devel] [RFC v2 for-4.6 0/2] In-tree feature documentation

2015-08-27 Thread Ian Jackson
Andrew Cooper writes (Re: [RFC v2 for-4.6 0/2] In-tree feature documentation):
 On 27/08/15 15:52, Ian Jackson wrote:
  I do wonder whether cross-referencing all the issues is a good idea.
  It seems like it might be a lot of work to keep them in step.
 
 I don't expect all the issues to be enumerated (generally, they would be
 found the first time someone falls over the issue), but where known
 interaction issues exist, we need to have some place to leave them.

I was prompted to ask this because it seemed to me that some of the
issues were discussed in other parts of the text or in other patches
as well as in `issues'.

 There are plenty of examples where known issues are documented somewhere
 in the xen-devel archives, or in an individuals head, and neither of
 these are useful places for the information to exist.

I agree with this.  I think things should be in the tree, but once.

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2] libxenstore: prefer using the character device

2015-08-27 Thread Wei Liu
On Thu, Aug 27, 2015 at 09:04:38AM -0500, Jonathan Creekmore wrote:
 With the addition of FMODE_ATOMIC_POS in the Linux 3.14 kernel,
 concurrent blocking file accesses to a single open file descriptor can
 cause a deadlock trying to grab the file position lock. If a watch has
 been set up, causing a read_thread to blocking read on the file
 descriptor, then future writes that would cause the background read to
 complete will block waiting on the file position lock before they can
 execute. This race condition only occurs when libxenstore is accessing
 the xenstore daemon through the /proc/xen/xenbus file and not through
 the unix domain socket, which is the case when the xenstore daemon is
 running as a stub domain or when oxenstored is passed
 --disable-socket. Accessing the daemon from the true character device
 also does not exhibit this problem.
 
 On Linux, prefer using the character device file over the proc file if
 the character device exists.
 
 Signed-off-by: Jonathan Creekmore jonathan.creekm...@gmail.com

Acked-by: Wei Liu wei.l...@citrix.com

I think this is a candidate for 4.6.

 ---
  tools/xenstore/xs_lib.c | 2 ++
  1 file changed, 2 insertions(+)
 
 diff --git a/tools/xenstore/xs_lib.c b/tools/xenstore/xs_lib.c
 index af4f75a..0c7744e 100644
 --- a/tools/xenstore/xs_lib.c
 +++ b/tools/xenstore/xs_lib.c
 @@ -81,6 +81,8 @@ const char *xs_domain_dev(void)
  #if defined(__RUMPUSER_XEN__) || defined(__RUMPRUN__)
   return /dev/xen/xenbus;
  #elif defined(__linux__)
 + if (access(/dev/xen/xenbus, F_OK) == 0)
 + return /dev/xen/xenbus;
   return /proc/xen/xenbus;
  #elif defined(__NetBSD__)
   return /kern/xen/xenbus;
 -- 
 2.1.4
 
 
 ___
 Xen-devel mailing list
 Xen-devel@lists.xen.org
 http://lists.xen.org/xen-devel

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] PAGE_SIZE (64KB), while block driver 'struct request' deals with PAGE_SIZE (up to 44Kb). Was:Re: [RFC] Support of non-indirect grant backend on 64KB guest

2015-08-27 Thread Julien Grall
Hi,

On 21/08/15 18:10, Konrad Rzeszutek Wilk wrote:
 On Fri, Aug 21, 2015 at 05:08:35PM +0100, David Vrabel wrote:
 On 21/08/15 17:05, Konrad Rzeszutek Wilk wrote:

 I have to concur with that. We can't mandate that ARM 64k page MUST use
 indirect descriptors.

 Then it has to be fixed in the block layer to allow  PAGE_SIZE segments
 and to get the block layer to split requests for blkfront.
 
 Hey Jens,
 
 I am hoping you can help us figure this problem out.
 
 The Linux ARM is capable of using 4KB pages and 64KB pages. Our block
 driver (xen-blkfront) was built with 4KB pages in mind and without using
 any fancy flags (which some backends lack) the maximum amount of I/O it can
 fit on a ring is 44KB.
 
 This has the unfortunate effect that when the xen-blkfront
 gets an 'struct request' it can have on page (64KB) and it can't actually
 fit it on the ring! And the lowest segment size it advertises is PAGE_SIZE
 (64KB). I believe Julien (who found this) tried initially advertising
 smaller segment size than PAGE_SIZE (32KB). However looking at
 __blk_segment_map_sg it looks to assume smallest size is PAGE_SIZE so
 that would explain why it did not work.

To be honest, I haven't tried to see how the block layer will act if I
dropped those checks in blk-settings.c until today.

I don't see any assumption about PAGE_SIZE in __blk_segment_map_sg.
Although dropping the checks in blk-settings (see quick patch [1]),
I got the following error in the frontend:

bio too big device xvda (128  88)
Buffer I/O error on dev xvda, logical block 0, async page read
bio too big device xvda (128  88)
Buffer I/O error on dev xvda, logical block 0, async page read

The bio too big device ... comes from generic_make_request_checks
(linux/block/blk-core.c) and the stack trace is:

[fe096c7c] dump_backtrace+0x0/0x124
[fe096db0] show_stack+0x10/0x1c
[fe5885e8] dump_stack+0x78/0xbc
[fe0bf7f8] warn_slowpath_common+0x98/0xd0
[fe0bf8f0] warn_slowpath_null+0x14/0x20
[fe2df304] generic_make_request_checks+0x114/0x230
[fe2e0580] generic_make_request+0x10/0x108
[fe2e070c] submit_bio+0x94/0x1e0
[fe1d573c] submit_bh_wbc.isra.36+0x100/0x1a8
[fe1d5bf8] block_read_full_page+0x320/0x3e8
[fe1d877c] blkdev_readpage+0x14/0x20
[fe14582c] do_read_cache_page+0x16c/0x1a0
[fe145870] read_cache_page+0x10/0x1c
[fe2f2908] read_dev_sector+0x30/0x9c
[fe2f3d84] msdos_partition+0x84/0x554
[fe2f38e4] check_partition+0xf8/0x21c
[fe2f2f28] rescan_partitions+0xb0/0x2a4
[fe1d98b0] __blkdev_get+0x228/0x34c
[fe1d9a14] blkdev_get+0x40/0x364
[fe2f0b6c] add_disk+0x398/0x424
[fe3d8500] blkback_changed+0x1200/0x152c
[fe36a954] xenbus_otherend_changed+0x9c/0xa4
[fe36c984] backend_changed+0xc/0x18
[fe36a088] xenwatch_thread+0xa0/0x13c
[fe0d98d0] kthread+0xd8/0xf0

The fs buffer code seems to assume that the block driver will always support
at least a bio of PAGE_SIZE.

 One wya to make this work is for the driver (xen-blkfront) to split
 the 'struct request' I/O in two internal requests.
 
 But this has to be a normal problem. Surely there are other drivers
 (MMC?) that can't handle PAGE_SIZE and have to break their I/Os.
 Would it make sense for the common block code to be able to deal
 with this?

It will take me a bit of time to fully understand the block layer
as the changes doesn't seem trivial from POV (I don't have any
knowledge in it). So I will wait a feedback from Jens before
going further on this approach.

Regards,

[1] patch

diff --git a/block/blk-settings.c b/block/blk-settings.c
index e0057d0..ac024e7 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -251,12 +251,15 @@ EXPORT_SYMBOL(blk_queue_bounce_limit);
  **/
 void blk_limits_max_hw_sectors(struct queue_limits *limits, unsigned int 
max_hw_sectors)
 {
+#if 0
if ((max_hw_sectors  9)  PAGE_CACHE_SIZE) {
max_hw_sectors = 1  (PAGE_CACHE_SHIFT - 9);
printk(KERN_INFO %s: set to minimum %d\n,
   __func__, max_hw_sectors);
}
 
+#endif
+
limits-max_sectors = limits-max_hw_sectors = max_hw_sectors;
 }
 EXPORT_SYMBOL(blk_limits_max_hw_sectors);
@@ -351,11 +354,14 @@ EXPORT_SYMBOL(blk_queue_max_segments);
  **/
 void blk_queue_max_segment_size(struct request_queue *q, unsigned int max_size)
 {
+#if 0
if (max_size  PAGE_CACHE_SIZE) {
max_size = PAGE_CACHE_SIZE;
printk(KERN_INFO %s: set to minimum %d\n,
   __func__, max_size);
}
+#endif
+
 
q-limits.max_segment_size = max_size;
 }
@@ -777,11 +783,14 @@ EXPORT_SYMBOL_GPL(blk_queue_dma_drain);
  **/
 void blk_queue_segment_boundary(struct request_queue *q, unsigned long mask)
 {
+#if 0
if (mask  PAGE_CACHE_SIZE - 1) {
mask = PAGE_CACHE_SIZE - 1;
printk(KERN_INFO %s: set 

Re: [Xen-devel] [PATCH for 4.6] build: fix tarball stubdom build

2015-08-27 Thread Ian Jackson
Wei Liu writes ([PATCH for 4.6] build: fix tarball stubdom build):
 When we create a source code tarball, mini-os is extracted to
 extras/mini-os directory. When building a source code tarball, we
 shouldn't clone mini-os again.
 
 Only clone mini-os when that directory doesn't exist. This fixes tarball
 build and doesn't affect non-tarball build.

Acked-by: Ian Jackson ian.jack...@eu.citrix.com

I am about to commit this.

Thanks,
Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2] libxenstore: prefer using the character device

2015-08-27 Thread Ian Jackson
Wei Liu writes (Re: [Xen-devel] [PATCH v2] libxenstore: prefer using the 
character device):
 On Thu, Aug 27, 2015 at 09:04:38AM -0500, Jonathan Creekmore wrote:
  With the addition of FMODE_ATOMIC_POS in the Linux 3.14 kernel,
  concurrent blocking file accesses to a single open file descriptor can
  cause a deadlock trying to grab the file position lock. If a watch has
  been set up, causing a read_thread to blocking read on the file
  descriptor, then future writes that would cause the background read to
  complete will block waiting on the file position lock before they can
  execute. This race condition only occurs when libxenstore is accessing
  the xenstore daemon through the /proc/xen/xenbus file and not through
  the unix domain socket, which is the case when the xenstore daemon is
  running as a stub domain or when oxenstored is passed
  --disable-socket. Accessing the daemon from the true character device
  also does not exhibit this problem.
  
  On Linux, prefer using the character device file over the proc file if
  the character device exists.

I confess I still see this as working around a kernel bug.  Only this
time we are switching from a buggy to non-buggy kernel interface.

Why don't we have the kernel provide only non-buggy interfaces ?

  diff --git a/tools/xenstore/xs_lib.c b/tools/xenstore/xs_lib.c
  index af4f75a..0c7744e 100644
  --- a/tools/xenstore/xs_lib.c
  +++ b/tools/xenstore/xs_lib.c
  @@ -81,6 +81,8 @@ const char *xs_domain_dev(void)
   #if defined(__RUMPUSER_XEN__) || defined(__RUMPRUN__)
  return /dev/xen/xenbus;
   #elif defined(__linux__)
  +   if (access(/dev/xen/xenbus, F_OK) == 0)
  +   return /dev/xen/xenbus;

Also, previously xs_domain_dev was a function which simply returned a
static value.  I feel vaguely uneasy at putting this kind of
autodetection logic here.

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 22/23] x86: make Xen early boot code relocatable

2015-08-27 Thread Andrew Cooper
On 27/08/15 16:29, Jan Beulich wrote:
 On 27.08.15 at 17:10, daniel.ki...@oracle.com wrote:
 On Thu, Aug 27, 2015 at 07:12:38AM -0600, Jan Beulich wrote:
 On 20.07.15 at 16:29, daniel.ki...@oracle.com wrote:
  /* Copy bootstrap trampoline to low memory, below 1MB. */
 -mov $sym_phys(trampoline_start),%esi
 +lea sym_offset(trampoline_start)(%ebp),%esi
  mov $trampoline_end - trampoline_start,%ecx
  rep movsb

 The latest at this final hunk I'm getting tired (and upset). I'd much
 rather not touch all this code in these fragile ways, and instead
 require Xen to be booted without grub2 on badly written firmware
 like the one on the machine you quote in the description.
 Let's start discussion from here. My further work on this patch series
 is pointless until we do not agree details in this case.

 So, I agree that any software (including firmware) should not use
 precious resources (i.e. low memory in that case) if it is not strictly
 required. And I do not think so that latest UEFI implementations
 on new hardware really need this low memory to survive (at least page
 tables could be put anywhere above low memory). However, in case of
 UEFI it is a wish of smart people not requirement set by spec. I was
 checking UEFI docs a few times but I was not able to find anything
 which says that e.g. ...developer MUST allocate memory outside of low
 memory  Even I have not found any suggestion about that. However,
 I must admit that I do not know whole UEFI spec, so, if you know something
 which is similar to above please tell me where it is (title, revision,
 page, paragraph, etc.). Hence, if there is not strict requirement in
 regards to memory allocation in specs we are lost. Of course we can
 encourage people to not do nasty things but I do not believe that many
 will listen. So, sadly, I suppose that we will see more and more machines
 in the wild which are broken (well, let's say do not align to good 
 practices).

 On the other hand I think that we should not assume that a given memory
 region (in our case starting at 1 MiB) is (or will be) available in every
 machine. I have not seen anything which says that it is true. We should
 stop doing that even if it works right now. I think that it works by
 chance and there is a chance that it will stop working one day because
 somebody will discover that he or she can put there e.g. device or hole.

 Last but not least. I suppose that Xen users and especially distros will
 complain when they are not able to use GRUB2 with Xen on every platform
 just because somebody (i.e. platform designers, developers, or whatever)
 do not accept our beliefs or we require that platform must obey rules
 (i.e. memory map requirements) which are specified nowhere.
 You're right, there's no such requirement on memory use in the spec.
 But you're missing the point. Supporting grub2 on UEFI is already a
 hack (ignoring all intentions EFI had from its first days). And now
 you've found an environment where that hack needs another hack
 (in Xen) to actually work. That's too much hackery for my taste, the
 more that things on this system can (afaict) work quite okay (without
 grub2, or with using its chainloader mechanism).

 So no, I'm still not convinced of the need for this patch.

I disagree.  There are many things a grub environment gives us which
plain EFI doesn't, most notably a familiar booting environment and
recovery facilities for users (which vastly trumps how EFI expects to
run things).

Furthermore, it offers common management of /boot in the dom0 filesystem
between legacy and EFI boots.

Therefore I am very much +1 get grub working.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] build: use correct qemu emulator binary

2015-08-27 Thread Ian Jackson
Wei Liu writes (Re: [Xen-devel] [PATCH] build: use correct qemu emulator 
binary):
 On Tue, Aug 25, 2015 at 08:45:07AM -0500, Doug Goldstein wrote:
  Per http://wiki.qemu.org/ChangeLog/1.0 and the fact that no currently
  supported distro ships the x86 system emulator binary as 'qemu', this
  changes the default when a user specifies --with-system-qemu without a
  PATH to 'qemu-system-i386', otherwise the default results in a
  non-functional setup.
  
  Signed-off-by: Doug Goldstein car...@cardoe.com
 
 Acked-by: Wei Liu wei.l...@citrix.com

Committed-by: Ian Jackson ian.jack...@eu.citrix.com

 Ian, please rerun autogen.sh when applying this patch.

Done.

Thanks,
Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [DOCS DAY] [PATCH for-4.6 0/5] Cleanup of docs spread throughout the tree

2015-08-27 Thread Ian Jackson
Wei Liu writes (Re: [Xen-devel] [DOCS DAY] [PATCH for-4.6 0/5] Cleanup of docs 
spread throughout the tree):
 On Wed, Aug 26, 2015 at 10:09:46AM +0100, Andrew Cooper wrote:
 Whole series:
 
 Acked-by: Wei Liu wei.l...@citrix.com

Committed-by: Ian Jackson ian.jack...@eu.citrix.com

Thanks,
Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC v2 for-4.6 0/2] In-tree feature documentation

2015-08-27 Thread Andrew Cooper
On 27/08/15 18:58, Ian Jackson wrote:
 Andrew Cooper writes (Re: [RFC v2 for-4.6 0/2] In-tree feature 
 documentation):
 On 27/08/15 15:52, Ian Jackson wrote:
 I do wonder whether cross-referencing all the issues is a good idea.
 It seems like it might be a lot of work to keep them in step.
 I don't expect all the issues to be enumerated (generally, they would be
 found the first time someone falls over the issue), but where known
 interaction issues exist, we need to have some place to leave them.
 I was prompted to ask this because it seemed to me that some of the
 issues were discussed in other parts of the text or in other patches
 as well as in `issues'.

Which issues are you concerned about?


 There are plenty of examples where known issues are documented somewhere
 in the xen-devel archives, or in an individuals head, and neither of
 these are useful places for the information to exist.
 I agree with this.  I think things should be in the tree, but once.

I am certainly not advocating needless repetition, but I don't see
anything in the submitted text which would quality.  I will happily
correct the text if I am mistaken.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v4 2/4] x86/compat: Test both PV and PVH guests for compat mode

2015-08-27 Thread Jan Beulich
 On 13.08.15 at 20:12, boris.ostrov...@oracle.com wrote:
 @@ -777,7 +777,7 @@ int arch_set_info_guest(
  
  /* The context is a compat-mode one if the target domain is compat-mode;
   * we expect the tools to DTRT even in compat-mode callers. */
 -compat = is_pv_32bit_domain(d);
 +compat = is_pv_32bit_domain(d) || is_pvh_32bit_domain(d);

I continue to think that this should include a v-domain ==
current-domain check (to match behavior for HVM guests
from the tool stack perspective). Having looked at patch 4, I also
can't see how the tool stack is being made expect a non-native
guest context record in the 32-bit PVH case (i.e. I'd appreciate
if you could point out where that hides).

 @@ -1203,7 +1204,7 @@ void arch_get_info_guest(struct vcpu *v, 
 vcpu_guest_context_u c)
  {
  unsigned int i;
  const struct domain *d = v-domain;
 -bool_t compat = is_pv_32bit_domain(d);
 +bool_t compat = is_pv_32bit_domain(d) || is_pvh_32bit_domain(d);

Same here then naturally.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for 4.6] build: fix tarball stubdom build

2015-08-27 Thread Wei Liu
On Thu, Aug 27, 2015 at 10:05:56AM -0600, Jan Beulich wrote:
  On 27.08.15 at 17:54, wei.l...@citrix.com wrote:
  When we create a source code tarball, mini-os is extracted to
  extras/mini-os directory. When building a source code tarball, we
  shouldn't clone mini-os again.
  
  Only clone mini-os when that directory doesn't exist. This fixes tarball
  build and doesn't affect non-tarball build.
  
  Signed-off-by: Wei Liu wei.l...@citrix.com
  Cc: Ian Campbell ian.campb...@citrix.com
  Cc: Ian Jackson ian.jack...@eu.citrix.com
  ---
   Makefile | 10 ++
   1 file changed, 6 insertions(+), 4 deletions(-)
  
  diff --git a/Makefile b/Makefile
  index e8a75ff..ba0df70 100644
  --- a/Makefile
  +++ b/Makefile
  @@ -19,10 +19,12 @@ include Config.mk
   
   .PHONY: mini-os-dir
   mini-os-dir:
  -   GIT=$(GIT) $(XEN_ROOT)/scripts/git-checkout.sh \
  -   $(MINIOS_UPSTREAM_URL) \
  -   $(MINIOS_UPSTREAM_REVISION) \
  -   $(XEN_ROOT)/extras/mini-os
  +   if [ ! -d $(XEN_ROOT)/extras/mini-os ]; then \
  +   GIT=$(GIT) $(XEN_ROOT)/scripts/git-checkout.sh \
  +   $(MINIOS_UPSTREAM_URL) \
  +   $(MINIOS_UPSTREAM_REVISION) \
  +   $(XEN_ROOT)/extras/mini-os ; \
  +   fi
 
 Wouldn't his better be done (avoiding the need for the shell
 conditional) by simply renaming the make target from
 mini-os-dir to extras/mini-os (and dropping the .PHONY)?
 

All targets for external trees follow some conventions defined in
tools/Makefile. Although I didn't strictly follow the same rules in
mini-os's case, I don't want it to deviate from the original rules too
much.

Wei.

 Jan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 22/23] x86: make Xen early boot code relocatable

2015-08-27 Thread Daniel Kiper
On Thu, Aug 27, 2015 at 07:12:38AM -0600, Jan Beulich wrote:
  On 20.07.15 at 16:29, daniel.ki...@oracle.com wrote:

[...]

   /* Copy bootstrap trampoline to low memory, below 1MB. */
  -mov $sym_phys(trampoline_start),%esi
  +lea sym_offset(trampoline_start)(%ebp),%esi
   mov $trampoline_end - trampoline_start,%ecx
   rep movsb
 

 The latest at this final hunk I'm getting tired (and upset). I'd much
 rather not touch all this code in these fragile ways, and instead
 require Xen to be booted without grub2 on badly written firmware
 like the one on the machine you quote in the description.

Let's start discussion from here. My further work on this patch series
is pointless until we do not agree details in this case.

So, I agree that any software (including firmware) should not use
precious resources (i.e. low memory in that case) if it is not strictly
required. And I do not think so that latest UEFI implementations
on new hardware really need this low memory to survive (at least page
tables could be put anywhere above low memory). However, in case of
UEFI it is a wish of smart people not requirement set by spec. I was
checking UEFI docs a few times but I was not able to find anything
which says that e.g. ...developer MUST allocate memory outside of low
memory  Even I have not found any suggestion about that. However,
I must admit that I do not know whole UEFI spec, so, if you know something
which is similar to above please tell me where it is (title, revision,
page, paragraph, etc.). Hence, if there is not strict requirement in
regards to memory allocation in specs we are lost. Of course we can
encourage people to not do nasty things but I do not believe that many
will listen. So, sadly, I suppose that we will see more and more machines
in the wild which are broken (well, let's say do not align to good practices).

On the other hand I think that we should not assume that a given memory
region (in our case starting at 1 MiB) is (or will be) available in every
machine. I have not seen anything which says that it is true. We should
stop doing that even if it works right now. I think that it works by
chance and there is a chance that it will stop working one day because
somebody will discover that he or she can put there e.g. device or hole.

Last but not least. I suppose that Xen users and especially distros will
complain when they are not able to use GRUB2 with Xen on every platform
just because somebody (i.e. platform designers, developers, or whatever)
do not accept our beliefs or we require that platform must obey rules
(i.e. memory map requirements) which are specified nowhere.

So, I think that early Xen boot code should be relocatable. However, I agree
that there is a chance that may solution is not perfect. So, I suppose,
taking into account above, we should discuss how to improve it instead
of discussing whether we should do it.

Daniel

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 5/8] tmem: Move TMEM_CONTROL subop of tmem hypercall to sysctl.

2015-08-27 Thread Jan Beulich
 On 27.08.15 at 13:02, konrad.w...@oracle.com wrote:
 @@ -2666,9 +2662,9 @@ long do_tmem_op(tmem_cli_op_t uops)
  /* Acquire wirte lock for all command at first */
  write_lock(tmem_rwlock);
  
 -if ( op.cmd == TMEM_CONTROL )
 +if ( op.cmd == TMEM_CONTROL_MOVED )
  {
 -rc = do_tmem_control(op);
 +rc = -ENOSYS;

As in other similar cases I'd prefer -EOPNOTSUPP here to prevent
callers from implying the tmem hypercall as a whole is unimplemented.

 --- a/xen/include/public/sysctl.h
 +++ b/xen/include/public/sysctl.h
 @@ -710,6 +710,48 @@ struct xen_sysctl_psr_cat_op {
  typedef struct xen_sysctl_psr_cat_op xen_sysctl_psr_cat_op_t;
  DEFINE_XEN_GUEST_HANDLE(xen_sysctl_psr_cat_op_t);
  
 +#define XEN_SYSCTL_TMEM_OP_ALL_CLIENTS 0xU
 +
 +#define XEN_SYSCTL_TMEM_OP_THAW   0
 +#define XEN_SYSCTL_TMEM_OP_FREEZE 1
 +#define XEN_SYSCTL_TMEM_OP_FLUSH  2
 +#define XEN_SYSCTL_TMEM_OP_DESTROY3
 +#define XEN_SYSCTL_TMEM_OP_LIST   4
 +#define XEN_SYSCTL_TMEM_OP_SET_WEIGHT 5
 +#define XEN_SYSCTL_TMEM_OP_SET_CAP6
 +#define XEN_SYSCTL_TMEM_OP_SET_COMPRESS   7
 +#define XEN_SYSCTL_TMEM_OP_QUERY_FREEABLE_MB  8
 +#define XEN_SYSCTL_TMEM_OP_SAVE_BEGIN 10
 +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_VERSION   11
 +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_MAXPOOLS  12
 +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_CLIENT_WEIGHT 13
 +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_CLIENT_CAP14
 +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_CLIENT_FLAGS  15
 +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_POOL_FLAGS16
 +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_POOL_NPAGES   17
 +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_POOL_UUID 18
 +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_NEXT_PAGE 19
 +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_NEXT_INV  20
 +#define XEN_SYSCTL_TMEM_OP_SAVE_END   21
 +#define XEN_SYSCTL_TMEM_OP_RESTORE_BEGIN  30
 +#define XEN_SYSCTL_TMEM_OP_RESTORE_PUT_PAGE   32
 +#define XEN_SYSCTL_TMEM_OP_RESTORE_FLUSH_PAGE 33

Perhaps better to have all these in tmem.h, to not clutter this
header?

 +struct xen_sysctl_tmem_op {
 +uint32_t cmd;   /* IN: XEN_SYSCTL_TMEM_OP_* . */
 +int32_t pool_id;/* IN: 0 by default unless _SAVE_*, RESTORE_* .*/
 +uint32_t cli_id;/* IN: client id, 0 for 
 XEN_SYSCTL_TMEM_QUERY_FREEABLE_MB
 +   for all others can be the domain id or
 +   XEN_SYSCTL_TMEM_OP_ALL_CLIENTS for all. */
 +uint32_t arg1;  /* IN: If not applicable to command use 0. */
 +uint32_t arg2;  /* IN: If not applicable to command use 0. */
 +uint8_t  pad[4];/* Padding so structure is the same under 32 and 64. 
 */

uint32_t please. And despite this being an (easily changeable) sysctl,
verifying that it's zero on input would be nice.

 --- a/xen/include/public/tmem.h
 +++ b/xen/include/public/tmem.h
 @@ -33,7 +33,7 @@
  #define TMEM_SPEC_VERSION  1
  
  /* Commands to HYPERVISOR_tmem_op() */
 -#define TMEM_CONTROL   0
 +#define TMEM_CONTROL_MOVED 0

Perhaps say where it moved in a brief comment?

 --- a/xen/xsm/flask/hooks.c
 +++ b/xen/xsm/flask/hooks.c
 @@ -761,6 +761,9 @@ static int flask_sysctl(int cmd)
  case XEN_SYSCTL_tbuf_op:
  return domain_has_xen(current-domain, XEN__TBUFCONTROL);
  
 +case XEN_SYSCTL_tmem_op:
 +return domain_has_xen(current-domain, XEN__TMEM_CONTROL);
 +
  case XEN_SYSCTL_sched_id:
  return domain_has_xen(current-domain, XEN__GETSCHEDULER);

Hmm, these cases appear to be roughly sorted numerically, i.e.
yours would normally go at the end.

Despite the comments I see no reason for this (once adjusted where
needed) not to go in for 4.6.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC v2 for-4.6 0/2] In-tree feature documentation

2015-08-27 Thread Andrew Cooper
On 27/08/15 15:52, Ian Jackson wrote:
 Andrew Cooper writes ([RFC v2 for-4.6 0/2] In-tree feature documentation):
 An issue which Xen has is an uncertain support statement for features.
 Given the success seen with docs/misc/xen-command-line.markdown, and in
 particular keeping it up to date, introduce a similar system for
 features.

 Patch 1 introduces a proposed template (and a makefile tweak to include
 the new docs/features subdirectory), while patch 2 is a feature document
 covering the topic of migration.

 v2 Adds %Revision and #History table, following feedback from v1.

 This is tagged RFC as I expect people to have different views as to what
 is useful to include.  I would particilarly appreciate feedback on the
 template before it starts getting used widely.

 Lars: Does this look like a reasonable counterpart to your formal
 support statement document?

 Jim: Per your request at the summit for new information, is patch 2
 suitable?
 I have read both patches.

 I do wonder whether cross-referencing all the issues is a good idea.
 It seems like it might be a lot of work to keep them in step.

I don't expect all the issues to be enumerated (generally, they would be
found the first time someone falls over the issue), but where known
interaction issues exist, we need to have some place to leave them.

There are plenty of examples where known issues are documented somewhere
in the xen-devel archives, or in an individuals head, and neither of
these are useful places for the information to exist.

I would hope that over time the number of issues decreases to 0 (that
would be the day!), but a good second is at least to have a formal
acknowledge that there are issues, which can be located by the next
unfortunate sole to hit the issue.


 I don't have anything else to add to the comments that others have
 made.

 Overall I think this is a good template.  The extra overhead may even
 be negative.  The work of writing up a feature in the style of this
 document may obviate the need to put much of the same information in a
 0/N or a design document, and the existing template is quite
 lightweight.

I hadn't considered this side benefit when designing the template.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] x86: wrap kexec feature with CONFIG_KEXEC

2015-08-27 Thread Andrew Cooper
On 27/08/15 16:34, Jan Beulich wrote:
 On 27.08.15 at 17:22, andrew.coop...@citrix.com wrote:
 On 27/08/15 15:47, Jonathan Creekmore wrote:
 @@ -812,7 +816,11 @@ ENTRY(hypercall_args_table)
  .byte 2 /* do_hvm_op*/
  .byte 1 /* do_sysctl*/  /* 35 */
  .byte 1 /* do_domctl*/
 +#ifdef CONFIG_KEXEC
  .byte 2 /* do_kexec */
 +#else
 +.byte 0 /* do_ni_hypercall  */
 +#endif
 These changes will corrupt guest registers in debug builds.
 It's quite the other way around:

Ah yes - of course.  Sorry for the noise.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] x86: wrap kexec feature with CONFIG_KEXEC

2015-08-27 Thread David Vrabel
On 27/08/15 15:47, Jonathan Creekmore wrote:
 Add the appropriate #if checks around the kexec code in the x86 codebase
 so that the feature can actually be turned off by the flag instead of
 always required to be enabled on x86.

What's your use case for this?

I think you should consider providing empty stub functions for !KEXEC
instead of all the #ifdefs.

 --- a/xen/drivers/passthrough/vtd/dmar.h
 +++ b/xen/drivers/passthrough/vtd/dmar.h
 @@ -108,6 +108,7 @@ struct acpi_atsr_unit *acpi_find_matched_atsr_unit(const 
 struct pci_dev *);
  
  #define DMAR_OPERATION_TIMEOUT MILLISECS(1000)
  
 +#ifdef CONFIG_KEXEC
  #define IOMMU_WAIT_OP(iommu, offset, op, cond, sts) \
  do {\
  s_time_t start_time = NOW();\
 @@ -125,6 +126,22 @@ do {\
  cpu_relax();\
  }   \
  } while (0)
 +#else
 +#define IOMMU_WAIT_OP(iommu, offset, op, cond, sts) \
 +do {\
 +s_time_t start_time = NOW();\
 +while (1) { \
 +sts = op(iommu-reg, offset);   \
 +if ( cond ) \
 +break;  \
 +if ( NOW()  start_time + DMAR_OPERATION_TIMEOUT ) {\
 +   panic(%s:%d:%s: DMAR hardware is malfunctional, \
 + __FILE__, __LINE__, __func__);  \
 +}   \
 +cpu_relax();\
 +}   \
 +} while (0)
 +#endif

This is particular might be best done by making kexecing a static const
variable equal to 0 in the !KEXEC case.

 --- a/xen/include/asm-x86/config.h
 +++ b/xen/include/asm-x86/config.h
 @@ -1,6 +1,6 @@
  
 /**
   * config.h
 - * 
 + *
   * A Linux-style configuration list.
   */

Stray whitespace change.

David


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [added to the 3.18 stable tree] x86/xen: Probe target addresses in set_aliased_prot() before the hypercall

2015-08-27 Thread Sasha Levin
From: Andy Lutomirski l...@kernel.org

This patch has been added to the 3.18 stable tree. If you have any
objections, please let us know.

===

[ Upstream commit aa1acff356bbedfd03b544051f5b371746735d89 ]

The update_va_mapping hypercall can fail if the VA isn't present
in the guest's page tables.  Under certain loads, this can
result in an OOPS when the target address is in unpopulated vmap
space.

While we're at it, add comments to help explain what's going on.

This isn't a great long-term fix.  This code should probably be
changed to use something like set_memory_ro.

Signed-off-by: Andy Lutomirski l...@kernel.org
Cc: Andrew Cooper andrew.coop...@citrix.com
Cc: Andy Lutomirski l...@amacapital.net
Cc: Boris Ostrovsky boris.ostrov...@oracle.com
Cc: Borislav Petkov b...@alien8.de
Cc: Brian Gerst brge...@gmail.com
Cc: David Vrabel dvra...@cantab.net
Cc: Denys Vlasenko dvlas...@redhat.com
Cc: H. Peter Anvin h...@zytor.com
Cc: Jan Beulich jbeul...@suse.com
Cc: Konrad Rzeszutek Wilk konrad.w...@oracle.com
Cc: Linus Torvalds torva...@linux-foundation.org
Cc: Peter Zijlstra pet...@infradead.org
Cc: Sasha Levin sasha.le...@oracle.com
Cc: Steven Rostedt rost...@goodmis.org
Cc: Thomas Gleixner t...@linutronix.de
Cc: secur...@kernel.org secur...@kernel.org
Cc: sta...@vger.kernel.org
Cc: xen-devel xen-devel@lists.xen.org
Link: 
http://lkml.kernel.org/r/0b0e55b995cda11e7829f140b833ef932fcabe3a.1438291540.git.l...@kernel.org
Signed-off-by: Ingo Molnar mi...@kernel.org
Signed-off-by: Sasha Levin sasha.le...@oracle.com
---
 arch/x86/xen/enlighten.c | 40 
 1 file changed, 40 insertions(+)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 7d67146..e180e09 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -481,6 +481,7 @@ static void set_aliased_prot(void *v, pgprot_t prot)
pte_t pte;
unsigned long pfn;
struct page *page;
+   unsigned char dummy;
 
ptep = lookup_address((unsigned long)v, level);
BUG_ON(ptep == NULL);
@@ -490,6 +491,32 @@ static void set_aliased_prot(void *v, pgprot_t prot)
 
pte = pfn_pte(pfn, prot);
 
+   /*
+* Careful: update_va_mapping() will fail if the virtual address
+* we're poking isn't populated in the page tables.  We don't
+* need to worry about the direct map (that's always in the page
+* tables), but we need to be careful about vmap space.  In
+* particular, the top level page table can lazily propagate
+* entries between processes, so if we've switched mms since we
+* vmapped the target in the first place, we might not have the
+* top-level page table entry populated.
+*
+* We disable preemption because we want the same mm active when
+* we probe the target and when we issue the hypercall.  We'll
+* have the same nominal mm, but if we're a kernel thread, lazy
+* mm dropping could change our pgd.
+*
+* Out of an abundance of caution, this uses __get_user() to fault
+* in the target address just in case there's some obscure case
+* in which the target address isn't readable.
+*/
+
+   preempt_disable();
+
+   pagefault_disable();/* Avoid warnings due to being atomic. */
+   __get_user(dummy, (unsigned char __user __force *)v);
+   pagefault_enable();
+
if (HYPERVISOR_update_va_mapping((unsigned long)v, pte, 0))
BUG();
 
@@ -501,6 +528,8 @@ static void set_aliased_prot(void *v, pgprot_t prot)
BUG();
} else
kmap_flush_unused();
+
+   preempt_enable();
 }
 
 static void xen_alloc_ldt(struct desc_struct *ldt, unsigned entries)
@@ -508,6 +537,17 @@ static void xen_alloc_ldt(struct desc_struct *ldt, 
unsigned entries)
const unsigned entries_per_page = PAGE_SIZE / LDT_ENTRY_SIZE;
int i;
 
+   /*
+* We need to mark the all aliases of the LDT pages RO.  We
+* don't need to call vm_flush_aliases(), though, since that's
+* only responsible for flushing aliases out the TLBs, not the
+* page tables, and Xen will flush the TLB for us if needed.
+*
+* To avoid confusing future readers: none of this is necessary
+* to load the LDT.  The hypervisor only checks this when the
+* LDT is faulted in due to subsequent descriptor access.
+*/
+
for(i = 0; i  entries; i += entries_per_page)
set_aliased_prot(ldt + i, PAGE_KERNEL_RO);
 }
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] x86: wrap kexec feature with CONFIG_KEXEC

2015-08-27 Thread Jonathan Creekmore

 On Aug 27, 2015, at 10:27 AM, David Vrabel david.vra...@citrix.com wrote:
 
 On 27/08/15 15:47, Jonathan Creekmore wrote:
 Add the appropriate #if checks around the kexec code in the x86 codebase
 so that the feature can actually be turned off by the flag instead of
 always required to be enabled on x86.
 
 What's your use case for this?
 

The use case is for a slimmed down version of the hypervisor that can be used 
as a security hypervisor, exposing as little extra functionality as possible. 
When looking for features to trim out to reduce the attack surface, I saw the 
flag for KEXEC and wanted to disable that, then ran into compile problems.

 I think you should consider providing empty stub functions for !KEXEC
 instead of all the #ifdefs.
 

Good points, all, who mentioned the empty stub functions. I am revising it now 
to use stubs instead of all of the #ifdefs.

 
 --- a/xen/include/asm-x86/config.h
 +++ b/xen/include/asm-x86/config.h
 @@ -1,6 +1,6 @@
 /**
  * config.h
 - * 
 + *
  * A Linux-style configuration list.
  */
 
 Stray whitespace change.

I guess I really need to just turn off that feature (stripping trailing 
whitespace) in my editor. So useful when writing new code, not as helpful when 
trying to modify existing code.
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for 4.6] build: fix tarball stubdom build

2015-08-27 Thread Jan Beulich
 On 27.08.15 at 17:54, wei.l...@citrix.com wrote:
 When we create a source code tarball, mini-os is extracted to
 extras/mini-os directory. When building a source code tarball, we
 shouldn't clone mini-os again.
 
 Only clone mini-os when that directory doesn't exist. This fixes tarball
 build and doesn't affect non-tarball build.
 
 Signed-off-by: Wei Liu wei.l...@citrix.com
 Cc: Ian Campbell ian.campb...@citrix.com
 Cc: Ian Jackson ian.jack...@eu.citrix.com
 ---
  Makefile | 10 ++
  1 file changed, 6 insertions(+), 4 deletions(-)
 
 diff --git a/Makefile b/Makefile
 index e8a75ff..ba0df70 100644
 --- a/Makefile
 +++ b/Makefile
 @@ -19,10 +19,12 @@ include Config.mk
  
  .PHONY: mini-os-dir
  mini-os-dir:
 - GIT=$(GIT) $(XEN_ROOT)/scripts/git-checkout.sh \
 - $(MINIOS_UPSTREAM_URL) \
 - $(MINIOS_UPSTREAM_REVISION) \
 - $(XEN_ROOT)/extras/mini-os
 + if [ ! -d $(XEN_ROOT)/extras/mini-os ]; then \
 + GIT=$(GIT) $(XEN_ROOT)/scripts/git-checkout.sh \
 + $(MINIOS_UPSTREAM_URL) \
 + $(MINIOS_UPSTREAM_REVISION) \
 + $(XEN_ROOT)/extras/mini-os ; \
 + fi

Wouldn't his better be done (avoiding the need for the shell
conditional) by simply renaming the make target from
mini-os-dir to extras/mini-os (and dropping the .PHONY)?

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC v2 for-4.6 0/2] In-tree feature documentation

2015-08-27 Thread Ian Jackson
Andrew Cooper writes ([RFC v2 for-4.6 0/2] In-tree feature documentation):
 An issue which Xen has is an uncertain support statement for features.
 Given the success seen with docs/misc/xen-command-line.markdown, and in
 particular keeping it up to date, introduce a similar system for
 features.
 
 Patch 1 introduces a proposed template (and a makefile tweak to include
 the new docs/features subdirectory), while patch 2 is a feature document
 covering the topic of migration.
 
 v2 Adds %Revision and #History table, following feedback from v1.
 
 This is tagged RFC as I expect people to have different views as to what
 is useful to include.  I would particilarly appreciate feedback on the
 template before it starts getting used widely.
 
 Lars: Does this look like a reasonable counterpart to your formal
 support statement document?
 
 Jim: Per your request at the summit for new information, is patch 2
 suitable?

I have read both patches.

I do wonder whether cross-referencing all the issues is a good idea.
It seems like it might be a lot of work to keep them in step.

I don't have anything else to add to the comments that others have
made.

Overall I think this is a good template.  The extra overhead may even
be negative.  The work of writing up a feature in the style of this
document may obviate the need to put much of the same information in a
0/N or a design document, and the existing template is quite
lightweight.

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 5/8] tmem: Move TMEM_CONTROL subop of tmem hypercall to sysctl.

2015-08-27 Thread Jan Beulich
 On 27.08.15 at 15:12, andrew.coop...@citrix.com wrote:
 On 27/08/15 12:02, Konrad Rzeszutek Wilk wrote:
 --- a/xen/include/xen/tmem.h
 +++ b/xen/include/xen/tmem.h
 @@ -9,6 +9,9 @@
  #ifndef __XEN_TMEM_H__
  #define __XEN_TMEM_H__
  
 +struct xen_sysctl_tmem_op;
 
 #includepublic/sysctl.h please.

Why? Forward declarations like this help not including all headers
everywhere.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 8/8] tmem: Spelling mistakes.

2015-08-27 Thread Jan Beulich
 On 27.08.15 at 13:02, konrad.w...@oracle.com wrote:
 Signed-off-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
 ---
  xen/common/tmem.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)
 
 diff --git a/xen/common/tmem.c b/xen/common/tmem.c
 index 66d2852..9bd75d8 100644
 --- a/xen/common/tmem.c
 +++ b/xen/common/tmem.c
 @@ -2659,7 +2659,7 @@ long do_tmem_op(tmem_cli_op_t uops)
  return -EFAULT;
  }
  
 -/* Acquire wirte lock for all command at first */
 +/* Acquire write lock for all command at first */

Mind adding the missing full stop at once?

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] x86: wrap kexec feature with CONFIG_KEXEC

2015-08-27 Thread Jan Beulich
 On 27.08.15 at 16:47, jonathan.creekm...@gmail.com wrote:
 Add the appropriate #if checks around the kexec code in the x86 codebase
 so that the feature can actually be turned off by the flag instead of
 always required to be enabled on x86.

But you realize that these HAVE_* variables aren't meant to be used
for disabling features? If you really wanted something like that, you'd
need to introduce kexec=y in the top level Makefile, overridable by
kexec=n on the make command line.

 --- a/xen/arch/x86/Makefile
 +++ b/xen/arch/x86/Makefile
 @@ -56,8 +56,8 @@ obj-y += trace.o
  obj-y += traps.o
  obj-y += usercopy.o
  obj-y += x86_emulate.o
 -obj-y += machine_kexec.o
 -obj-y += crash.o
 +obj-$(HAS_KEXEC) += machine_kexec.o
 +obj-$(HAS_KEXEC) += crash.o
  obj-y += tboot.o
  obj-y += hpet.o
  obj-y += vm_event.o

If you already touch this, please move the two altered lines to their
proper (alphabetical) slot.

 --- a/xen/arch/x86/crash.c
 +++ b/xen/arch/x86/crash.c
 @@ -64,7 +64,9 @@ static void noreturn do_nmi_crash(const struct 
 cpu_user_regs *regs)
   */
  set_ist(idt_tables[cpu][TRAP_machine_check], IST_NONE);
  
 +#ifdef CONFIG_KEXEC
  kexec_crash_save_cpu();
 +#endif

In cases like this code will look much better if you introduced a no-op
inline function for the !CONFIG_KEXEC case in the appropriate header.

 --- a/xen/arch/x86/x86_64/compat/entry.S
 +++ b/xen/arch/x86/x86_64/compat/entry.S

Similarly at least the changes here ...

 --- a/xen/arch/x86/x86_64/entry.S
 +++ b/xen/arch/x86/x86_64/entry.S

.. and here will want better abstraction, to not further uglify the code.

 --- a/xen/drivers/passthrough/vtd/dmar.h
 +++ b/xen/drivers/passthrough/vtd/dmar.h
 @@ -108,6 +108,7 @@ struct acpi_atsr_unit *acpi_find_matched_atsr_unit(const 
 struct pci_dev *);
  
  #define DMAR_OPERATION_TIMEOUT MILLISECS(1000)
  
 +#ifdef CONFIG_KEXEC
  #define IOMMU_WAIT_OP(iommu, offset, op, cond, sts) \
  do {\
  s_time_t start_time = NOW();\
 @@ -125,6 +126,22 @@ do {\
  cpu_relax();\
  }   \
  } while (0)
 +#else
 +#define IOMMU_WAIT_OP(iommu, offset, op, cond, sts) \
 +do {\
 +s_time_t start_time = NOW();\
 +while (1) { \
 +sts = op(iommu-reg, offset);   \
 +if ( cond ) \
 +break;  \
 +if ( NOW()  start_time + DMAR_OPERATION_TIMEOUT ) {\
 +   panic(%s:%d:%s: DMAR hardware is malfunctional, \
 + __FILE__, __LINE__, __func__);  \
 +}   \
 +cpu_relax();\
 +}   \
 +} while (0)
 +#endif

Along the same lines here - no way for such a macro to be duplicated
for a case likely no-one but you will ever build-test.

 --- a/xen/include/asm-x86/config.h
 +++ b/xen/include/asm-x86/config.h
 @@ -1,6 +1,6 @@
  
 /
 **
   * config.h
 - * 
 + *
   * A Linux-style configuration list.
   */
  

Please leave out this unrelated white space change.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] x86: wrap kexec feature with CONFIG_KEXEC

2015-08-27 Thread Andrew Cooper
On 27/08/15 15:47, Jonathan Creekmore wrote:
 Add the appropriate #if checks around the kexec code in the x86 codebase
 so that the feature can actually be turned off by the flag instead of
 always required to be enabled on x86.

 Signed-off-by: Jonathan Creekmore jonathan.creekm...@gmail.com

In principle, this is a good change, but is definitely aimed at 4.7 now.

 diff --git a/xen/arch/x86/x86_64/compat/entry.S 
 b/xen/arch/x86/x86_64/compat/entry.S
 index 1521779..e2fe49c 100644
 --- a/xen/arch/x86/x86_64/compat/entry.S
 +++ b/xen/arch/x86/x86_64/compat/entry.S
 @@ -428,7 +428,11 @@ ENTRY(compat_hypercall_table)
  .quad do_hvm_op
  .quad do_sysctl /* 35 */
  .quad do_domctl
 +#if CONFIG_KEXEC
  .quad compat_kexec_op
 +#else
 +.quad do_ni_hypercall
 +#endif
  .quad do_tmem_op
  .quad do_ni_hypercall   /* reserved for XenClient */
  .quad do_xenpmu_op  /* 40 */
 @@ -479,6 +483,11 @@ ENTRY(compat_hypercall_args_table)
  .byte 2 /* do_hvm_op*/
  .byte 1 /* do_sysctl*/  /* 35 */
  .byte 1 /* do_domctl*/
 +#if CONFIG_KEXEC
 +.byte 2 /* compat_kexec_op  */
 +#else
 + .byte 0 /* do_ni_hypercall  */
 +#endif

You have a hard tab here.

  .byte 2 /* compat_kexec_op  */
  .byte 1 /* do_tmem_op   */
  .byte 0 /* reserved for XenClient   */
 diff --git a/xen/arch/x86/x86_64/entry.S b/xen/arch/x86/x86_64/entry.S
 index 74677a2..357f3d5 100644
 --- a/xen/arch/x86/x86_64/entry.S
 +++ b/xen/arch/x86/x86_64/entry.S
 @@ -761,7 +761,11 @@ ENTRY(hypercall_table)
  .quad do_hvm_op
  .quad do_sysctl /* 35 */
  .quad do_domctl
 +#ifdef CONFIG_KEXEC
  .quad do_kexec_op
 +#else
 + .quad do_ni_hypercall
 +#endif
  .quad do_tmem_op
  .quad do_ni_hypercall   /* reserved for XenClient */
  .quad do_xenpmu_op  /* 40 */
 @@ -812,7 +816,11 @@ ENTRY(hypercall_args_table)
  .byte 2 /* do_hvm_op*/
  .byte 1 /* do_sysctl*/  /* 35 */
  .byte 1 /* do_domctl*/
 +#ifdef CONFIG_KEXEC
  .byte 2 /* do_kexec */
 +#else
 +.byte 0 /* do_ni_hypercall  */
 +#endif

These changes will corrupt guest registers in debug builds.

Don't alter the args table at all, and use

#ifndef CONFIG_KEXEC
#define do_kexec_op do_ni_hypercall
#define compat_kexec_op do_ni_hypercall
#endif

rather than patching the hypercall table itself.

~Andrew

  .byte 1 /* do_tmem_op   */
  .byte 0 /* reserved for XenClient */
  .byte 2 /* do_xenpmu_op */  /* 40 */
 diff --git a/xen/drivers/passthrough/vtd/dmar.h 
 b/xen/drivers/passthrough/vtd/dmar.h
 index 729b603..697e5e5 100644
 --- a/xen/drivers/passthrough/vtd/dmar.h
 +++ b/xen/drivers/passthrough/vtd/dmar.h
 @@ -108,6 +108,7 @@ struct acpi_atsr_unit *acpi_find_matched_atsr_unit(const 
 struct pci_dev *);
  
  #define DMAR_OPERATION_TIMEOUT MILLISECS(1000)
  
 +#ifdef CONFIG_KEXEC
  #define IOMMU_WAIT_OP(iommu, offset, op, cond, sts) \
  do {\
  s_time_t start_time = NOW();\
 @@ -125,6 +126,22 @@ do {\
  cpu_relax();\
  }   \
  } while (0)
 +#else
 +#define IOMMU_WAIT_OP(iommu, offset, op, cond, sts) \
 +do {\
 +s_time_t start_time = NOW();\
 +while (1) { \
 +sts = op(iommu-reg, offset);   \
 +if ( cond ) \
 +break;  \
 +if ( NOW()  start_time + DMAR_OPERATION_TIMEOUT ) {\
 +   panic(%s:%d:%s: DMAR hardware is malfunctional, \
 + __FILE__, __LINE__, __func__);  \
 +}   \
 +cpu_relax();\
 +}   \
 +} while (0)
 +#endif
  
  int vtd_hw_check(void);
  void disable_pmr(struct iommu *iommu);
 diff --git a/xen/include/asm-x86/config.h b/xen/include/asm-x86/config.h
 index 3e9be83..e84e489 100644
 --- a/xen/include/asm-x86/config.h
 +++ b/xen/include/asm-x86/config.h
 @@ -1,6 +1,6 @@
  
 /**
   * config.h
 - * 
 + *
   * A Linux-style configuration list.
   */
  

This is an unrelated hunk and should be dropped.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] x86: wrap kexec feature with CONFIG_KEXEC

2015-08-27 Thread Jan Beulich
 On 27.08.15 at 17:27, david.vra...@citrix.com wrote:
 On 27/08/15 15:47, Jonathan Creekmore wrote:
 @@ -125,6 +126,22 @@ do {\
  cpu_relax();\
  }   \
  } while (0)
 +#else
 +#define IOMMU_WAIT_OP(iommu, offset, op, cond, sts) \
 +do {\
 +s_time_t start_time = NOW();\
 +while (1) { \
 +sts = op(iommu-reg, offset);   \
 +if ( cond ) \
 +break;  \
 +if ( NOW()  start_time + DMAR_OPERATION_TIMEOUT ) {\
 +  panic(%s:%d:%s: DMAR hardware is malfunctional, \
 +__FILE__, __LINE__, __func__);  \
 +}   \
 +cpu_relax();\
 +}   \
 +} while (0)
 +#endif
 
 This is particular might be best done by making kexecing a static const
 variable equal to 0 in the !KEXEC case.

Or even a #define (unless there's some use somewhere preventing
that).

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v4 3/4] x86/pvh: Handle hypercalls for 32b PVH guests

2015-08-27 Thread Jan Beulich
 On 13.08.15 at 20:12, boris.ostrov...@oracle.com wrote:
 Signed-off-by: Boris Ostrovsky boris.ostrov...@oracle.com

Reviewed-by: Jan Beulich jbeul...@suse.com


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 22/23] x86: make Xen early boot code relocatable

2015-08-27 Thread Jan Beulich
 On 27.08.15 at 17:10, daniel.ki...@oracle.com wrote:
 On Thu, Aug 27, 2015 at 07:12:38AM -0600, Jan Beulich wrote:
  On 20.07.15 at 16:29, daniel.ki...@oracle.com wrote:
   /* Copy bootstrap trampoline to low memory, below 1MB. */
  -mov $sym_phys(trampoline_start),%esi
  +lea sym_offset(trampoline_start)(%ebp),%esi
   mov $trampoline_end - trampoline_start,%ecx
   rep movsb
 

 The latest at this final hunk I'm getting tired (and upset). I'd much
 rather not touch all this code in these fragile ways, and instead
 require Xen to be booted without grub2 on badly written firmware
 like the one on the machine you quote in the description.
 
 Let's start discussion from here. My further work on this patch series
 is pointless until we do not agree details in this case.
 
 So, I agree that any software (including firmware) should not use
 precious resources (i.e. low memory in that case) if it is not strictly
 required. And I do not think so that latest UEFI implementations
 on new hardware really need this low memory to survive (at least page
 tables could be put anywhere above low memory). However, in case of
 UEFI it is a wish of smart people not requirement set by spec. I was
 checking UEFI docs a few times but I was not able to find anything
 which says that e.g. ...developer MUST allocate memory outside of low
 memory  Even I have not found any suggestion about that. However,
 I must admit that I do not know whole UEFI spec, so, if you know something
 which is similar to above please tell me where it is (title, revision,
 page, paragraph, etc.). Hence, if there is not strict requirement in
 regards to memory allocation in specs we are lost. Of course we can
 encourage people to not do nasty things but I do not believe that many
 will listen. So, sadly, I suppose that we will see more and more machines
 in the wild which are broken (well, let's say do not align to good 
 practices).
 
 On the other hand I think that we should not assume that a given memory
 region (in our case starting at 1 MiB) is (or will be) available in every
 machine. I have not seen anything which says that it is true. We should
 stop doing that even if it works right now. I think that it works by
 chance and there is a chance that it will stop working one day because
 somebody will discover that he or she can put there e.g. device or hole.
 
 Last but not least. I suppose that Xen users and especially distros will
 complain when they are not able to use GRUB2 with Xen on every platform
 just because somebody (i.e. platform designers, developers, or whatever)
 do not accept our beliefs or we require that platform must obey rules
 (i.e. memory map requirements) which are specified nowhere.

You're right, there's no such requirement on memory use in the spec.
But you're missing the point. Supporting grub2 on UEFI is already a
hack (ignoring all intentions EFI had from its first days). And now
you've found an environment where that hack needs another hack
(in Xen) to actually work. That's too much hackery for my taste, the
more that things on this system can (afaict) work quite okay (without
grub2, or with using its chainloader mechanism).

So no, I'm still not convinced of the need for this patch.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 2/8] tmem: Add ASSERT in obj_rb_insert for pool-rwlock lock.

2015-08-27 Thread Jan Beulich
 On 27.08.15 at 13:01, konrad.w...@oracle.com wrote:
 --- a/xen/common/tmem.c
 +++ b/xen/common/tmem.c
 @@ -915,6 +915,11 @@ static int obj_rb_insert(struct rb_root *root, struct 
 tmem_object_root *obj)
  {
  struct rb_node **new, *parent = NULL;
  struct tmem_object_root *this;
 +struct tmem_pool *pool;
 +
 +pool = obj-pool;
 +ASSERT(pool != NULL);
 +ASSERT_WRITELOCK(pool-pool_rwlock);

Considering the new variable is used only in ASSERT()s, I'd
recommend against its introduction. But it's your code ...

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH for 4.6] build: fix tarball stubdom build

2015-08-27 Thread Wei Liu
When we create a source code tarball, mini-os is extracted to
extras/mini-os directory. When building a source code tarball, we
shouldn't clone mini-os again.

Only clone mini-os when that directory doesn't exist. This fixes tarball
build and doesn't affect non-tarball build.

Signed-off-by: Wei Liu wei.l...@citrix.com
Cc: Ian Campbell ian.campb...@citrix.com
Cc: Ian Jackson ian.jack...@eu.citrix.com
---
 Makefile | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/Makefile b/Makefile
index e8a75ff..ba0df70 100644
--- a/Makefile
+++ b/Makefile
@@ -19,10 +19,12 @@ include Config.mk
 
 .PHONY: mini-os-dir
 mini-os-dir:
-   GIT=$(GIT) $(XEN_ROOT)/scripts/git-checkout.sh \
-   $(MINIOS_UPSTREAM_URL) \
-   $(MINIOS_UPSTREAM_REVISION) \
-   $(XEN_ROOT)/extras/mini-os
+   if [ ! -d $(XEN_ROOT)/extras/mini-os ]; then \
+   GIT=$(GIT) $(XEN_ROOT)/scripts/git-checkout.sh \
+   $(MINIOS_UPSTREAM_URL) \
+   $(MINIOS_UPSTREAM_REVISION) \
+   $(XEN_ROOT)/extras/mini-os ; \
+   fi
 
 .PHONY: mini-os-dir-force-update
 mini-os-dir-force-update: mini-os-dir
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 5/8] tmem: Move TMEM_CONTROL subop of tmem hypercall to sysctl.

2015-08-27 Thread Konrad Rzeszutek Wilk
On Thu, Aug 27, 2015 at 09:11:35AM -0600, Jan Beulich wrote:
  On 27.08.15 at 13:02, konrad.w...@oracle.com wrote:
  @@ -2666,9 +2662,9 @@ long do_tmem_op(tmem_cli_op_t uops)
   /* Acquire wirte lock for all command at first */
   write_lock(tmem_rwlock);
   
  -if ( op.cmd == TMEM_CONTROL )
  +if ( op.cmd == TMEM_CONTROL_MOVED )
   {
  -rc = do_tmem_control(op);
  +rc = -ENOSYS;
 
 As in other similar cases I'd prefer -EOPNOTSUPP here to prevent
 callers from implying the tmem hypercall as a whole is unimplemented.
 
  --- a/xen/include/public/sysctl.h
  +++ b/xen/include/public/sysctl.h
  @@ -710,6 +710,48 @@ struct xen_sysctl_psr_cat_op {
   typedef struct xen_sysctl_psr_cat_op xen_sysctl_psr_cat_op_t;
   DEFINE_XEN_GUEST_HANDLE(xen_sysctl_psr_cat_op_t);
   
  +#define XEN_SYSCTL_TMEM_OP_ALL_CLIENTS 0xU
  +
  +#define XEN_SYSCTL_TMEM_OP_THAW   0
  +#define XEN_SYSCTL_TMEM_OP_FREEZE 1
  +#define XEN_SYSCTL_TMEM_OP_FLUSH  2
  +#define XEN_SYSCTL_TMEM_OP_DESTROY3
  +#define XEN_SYSCTL_TMEM_OP_LIST   4
  +#define XEN_SYSCTL_TMEM_OP_SET_WEIGHT 5
  +#define XEN_SYSCTL_TMEM_OP_SET_CAP6
  +#define XEN_SYSCTL_TMEM_OP_SET_COMPRESS   7
  +#define XEN_SYSCTL_TMEM_OP_QUERY_FREEABLE_MB  8
  +#define XEN_SYSCTL_TMEM_OP_SAVE_BEGIN 10
  +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_VERSION   11
  +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_MAXPOOLS  12
  +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_CLIENT_WEIGHT 13
  +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_CLIENT_CAP14
  +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_CLIENT_FLAGS  15
  +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_POOL_FLAGS16
  +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_POOL_NPAGES   17
  +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_POOL_UUID 18
  +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_NEXT_PAGE 19
  +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_NEXT_INV  20
  +#define XEN_SYSCTL_TMEM_OP_SAVE_END   21
  +#define XEN_SYSCTL_TMEM_OP_RESTORE_BEGIN  30
  +#define XEN_SYSCTL_TMEM_OP_RESTORE_PUT_PAGE   32
  +#define XEN_SYSCTL_TMEM_OP_RESTORE_FLUSH_PAGE 33
 
 Perhaps better to have all these in tmem.h, to not clutter this
 header?

Yes and no. The other sysctl had their #defines for commands here so I figured I
would follow that rule. But I am OK keeping it in tmem.h and just do

/* For the 'cmd' values consule tmem.h. Look for XEN_SYSCTL_TMEM_OP_* */
 
  +struct xen_sysctl_tmem_op {
  +uint32_t cmd;   /* IN: XEN_SYSCTL_TMEM_OP_* . */
  +int32_t pool_id;/* IN: 0 by default unless _SAVE_*, RESTORE_* .*/
  +uint32_t cli_id;/* IN: client id, 0 for 
  XEN_SYSCTL_TMEM_QUERY_FREEABLE_MB
  +   for all others can be the domain id or
  +   XEN_SYSCTL_TMEM_OP_ALL_CLIENTS for all. */
  +uint32_t arg1;  /* IN: If not applicable to command use 0. */
  +uint32_t arg2;  /* IN: If not applicable to command use 0. */
  +uint8_t  pad[4];/* Padding so structure is the same under 32 and 
  64. */
 
 uint32_t please. And despite this being an (easily changeable) sysctl,
 verifying that it's zero on input would be nice.

Yes! That was what I forgotten!
 
  --- a/xen/include/public/tmem.h
  +++ b/xen/include/public/tmem.h
  @@ -33,7 +33,7 @@
   #define TMEM_SPEC_VERSION  1
   
   /* Commands to HYPERVISOR_tmem_op() */
  -#define TMEM_CONTROL   0
  +#define TMEM_CONTROL_MOVED 0
 
 Perhaps say where it moved in a brief comment?

Yes, good idea.

/* Deprecated. These operations are done now via XEN_SYSCTL_tmem_op
 * using the sysctl hypercall. */

 
  --- a/xen/xsm/flask/hooks.c
  +++ b/xen/xsm/flask/hooks.c
  @@ -761,6 +761,9 @@ static int flask_sysctl(int cmd)
   case XEN_SYSCTL_tbuf_op:
   return domain_has_xen(current-domain, XEN__TBUFCONTROL);
   
  +case XEN_SYSCTL_tmem_op:
  +return domain_has_xen(current-domain, XEN__TMEM_CONTROL);
  +
   case XEN_SYSCTL_sched_id:
   return domain_has_xen(current-domain, XEN__GETSCHEDULER);
 
 Hmm, these cases appear to be roughly sorted numerically, i.e.
 yours would normally go at the end.

I recall a comment from Andrew asking the newly introduced commands to
be in alphabetical order. But perhaps that was for domctl which is more
.. ah.. random?

Anyhow will make it in numerical order.
 
 Despite the comments I see no reason for this (once adjusted where
 needed) not to go in for 4.6.

Thank you.
 
 Jan
 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 5/8] tmem: Move TMEM_CONTROL subop of tmem hypercall to sysctl.

2015-08-27 Thread Konrad Rzeszutek Wilk
On Thu, Aug 27, 2015 at 02:12:35PM +0100, Andrew Cooper wrote:
 On 27/08/15 12:02, Konrad Rzeszutek Wilk wrote:
  --- a/tools/python/xen/lowlevel/xc/xc.c
  +++ b/tools/python/xen/lowlevel/xc/xc.c
  @@ -1808,25 +1808,25 @@ static PyObject *pyxc_tmem_control(XcObject *self,
   pool_id, subop, cli_id, arg1, arg2, buf) )
   return NULL;
   
  -if ( (subop == TMEMC_LIST)  (arg1  32768) )
  +if ( (subop == XEN_SYSCTL_TMEM_OP_LIST)  (arg1  32768) )
   arg1 = 32768;
   
   if ( (rc = xc_tmem_control(self-xc_handle, pool_id, subop, cli_id, 
  arg1, arg2, buffer))  0 )
   return Py_BuildValue(i, rc);
   
   switch (subop) {
  -case TMEMC_LIST:
  +case XEN_SYSCTL_TMEM_OP_LIST:
   return Py_BuildValue(s, buffer);
  -case TMEMC_FLUSH:
  +case XEN_SYSCTL_TMEM_OP_FLUSH:
   return Py_BuildValue(i, rc);
  -case TMEMC_QUERY_FREEABLE_MB:
  +case XEN_SYSCTL_TMEM_OP_QUERY_FREEABLE_MB:
   return Py_BuildValue(i, rc);
  -case TMEMC_THAW:
  -case TMEMC_FREEZE:
  -case TMEMC_DESTROY:
  -case TMEMC_SET_WEIGHT:
  -case TMEMC_SET_CAP:
  -case TMEMC_SET_COMPRESS:
  +case XEN_SYSCTL_TMEM_OP_THAW:
  +case XEN_SYSCTL_TMEM_OP_FREEZE:
  +case XEN_SYSCTL_TMEM_OP_DESTROY:
  +case XEN_SYSCTL_TMEM_OP_SET_WEIGHT:
  +case XEN_SYSCTL_TMEM_OP_SET_CAP:
  +case XEN_SYSCTL_TMEM_OP_SET_COMPRESS:
 
 Any case statements falling through into default like this can simply be
 dropped.

Lets do that in another patch. This patch is just to move the operation
from one hypercall to another - with the minimum amount of changes.

I will gladly modify the case statements in subsequent patches.

 
  @@ -2555,77 +2556,72 @@ static int tmemc_restore_flush_page(int cli_id, 
  uint32_t pool_id, struct oid *oi
   return do_tmem_flush_page(pool,oidp,index);
   }
   
  -static int do_tmem_control(struct tmem_op *op)
  +int tmem_control(struct xen_sysctl_tmem_op *op)
   {
   int ret;
   uint32_t pool_id = op-pool_id;
  -uint32_t subop = op-u.ctrl.subop;
  -struct oid *oidp = (struct oid *)(op-u.ctrl.oid[0]);
  +uint32_t cmd = op-cmd;
  +struct oid *oidp = (struct oid *)(op-oid[0]);
 
 Please put a struct xen_sysctl_tmem_oid definition in sysctl.h and avoid
 this typesystem abuse.

Again, I tried to make this move as minimal as possible and not
introduce other sensible changes - and defer them to later patches.


 
  --- a/xen/include/public/sysctl.h
  +++ b/xen/include/public/sysctl.h
  @@ -710,6 +710,48 @@ struct xen_sysctl_psr_cat_op {
   typedef struct xen_sysctl_psr_cat_op xen_sysctl_psr_cat_op_t;
   DEFINE_XEN_GUEST_HANDLE(xen_sysctl_psr_cat_op_t);
   
  +#define XEN_SYSCTL_TMEM_OP_ALL_CLIENTS 0xU
  +
  +#define XEN_SYSCTL_TMEM_OP_THAW   0
  +#define XEN_SYSCTL_TMEM_OP_FREEZE 1
  +#define XEN_SYSCTL_TMEM_OP_FLUSH  2
  +#define XEN_SYSCTL_TMEM_OP_DESTROY3
  +#define XEN_SYSCTL_TMEM_OP_LIST   4
  +#define XEN_SYSCTL_TMEM_OP_SET_WEIGHT 5
  +#define XEN_SYSCTL_TMEM_OP_SET_CAP6
  +#define XEN_SYSCTL_TMEM_OP_SET_COMPRESS   7
  +#define XEN_SYSCTL_TMEM_OP_QUERY_FREEABLE_MB  8
  +#define XEN_SYSCTL_TMEM_OP_SAVE_BEGIN 10
  +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_VERSION   11
  +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_MAXPOOLS  12
  +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_CLIENT_WEIGHT 13
  +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_CLIENT_CAP14
  +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_CLIENT_FLAGS  15
  +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_POOL_FLAGS16
  +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_POOL_NPAGES   17
  +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_POOL_UUID 18
  +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_NEXT_PAGE 19
  +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_NEXT_INV  20
  +#define XEN_SYSCTL_TMEM_OP_SAVE_END   21
  +#define XEN_SYSCTL_TMEM_OP_RESTORE_BEGIN  30
  +#define XEN_SYSCTL_TMEM_OP_RESTORE_PUT_PAGE   32
  +#define XEN_SYSCTL_TMEM_OP_RESTORE_FLUSH_PAGE 33
  +
  +struct xen_sysctl_tmem_op {
  +uint32_t cmd;   /* IN: XEN_SYSCTL_TMEM_OP_* . */
  +int32_t pool_id;/* IN: 0 by default unless _SAVE_*, RESTORE_* .*/
  +uint32_t cli_id;/* IN: client id, 0 for 
  XEN_SYSCTL_TMEM_QUERY_FREEABLE_MB
  +   for all others can be the domain id or
  +   XEN_SYSCTL_TMEM_OP_ALL_CLIENTS for all. */
  +uint32_t arg1;  /* IN: If not applicable to command use 0. */
  +uint32_t arg2;  /* IN: If not applicable to command use 0. */
 
 Please can this interface be fixed as part of the move, even if it is in
 subsequent patches for clarity.

I will gladly fix this interface in further patches. By all means!
 
 Part of the issue with the XSA-17 security audit was that these args are

Re: [Xen-devel] [PATCH v2 5/8] tmem: Move TMEM_CONTROL subop of tmem hypercall to sysctl.

2015-08-27 Thread Andrew Cooper
On 27/08/15 19:43, Konrad Rzeszutek Wilk wrote:

 --- a/xen/include/public/sysctl.h
 +++ b/xen/include/public/sysctl.h
 @@ -710,6 +710,48 @@ struct xen_sysctl_psr_cat_op {
  typedef struct xen_sysctl_psr_cat_op xen_sysctl_psr_cat_op_t;
  DEFINE_XEN_GUEST_HANDLE(xen_sysctl_psr_cat_op_t);
  
 +#define XEN_SYSCTL_TMEM_OP_ALL_CLIENTS 0xU
 +
 +#define XEN_SYSCTL_TMEM_OP_THAW   0
 +#define XEN_SYSCTL_TMEM_OP_FREEZE 1
 +#define XEN_SYSCTL_TMEM_OP_FLUSH  2
 +#define XEN_SYSCTL_TMEM_OP_DESTROY3
 +#define XEN_SYSCTL_TMEM_OP_LIST   4
 +#define XEN_SYSCTL_TMEM_OP_SET_WEIGHT 5
 +#define XEN_SYSCTL_TMEM_OP_SET_CAP6
 +#define XEN_SYSCTL_TMEM_OP_SET_COMPRESS   7
 +#define XEN_SYSCTL_TMEM_OP_QUERY_FREEABLE_MB  8
 +#define XEN_SYSCTL_TMEM_OP_SAVE_BEGIN 10
 +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_VERSION   11
 +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_MAXPOOLS  12
 +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_CLIENT_WEIGHT 13
 +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_CLIENT_CAP14
 +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_CLIENT_FLAGS  15
 +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_POOL_FLAGS16
 +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_POOL_NPAGES   17
 +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_POOL_UUID 18
 +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_NEXT_PAGE 19
 +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_NEXT_INV  20
 +#define XEN_SYSCTL_TMEM_OP_SAVE_END   21
 +#define XEN_SYSCTL_TMEM_OP_RESTORE_BEGIN  30
 +#define XEN_SYSCTL_TMEM_OP_RESTORE_PUT_PAGE   32
 +#define XEN_SYSCTL_TMEM_OP_RESTORE_FLUSH_PAGE 33
 Perhaps better to have all these in tmem.h, to not clutter this
 header?
 Yes and no. The other sysctl had their #defines for commands here so I 
 figured I
 would follow that rule. But I am OK keeping it in tmem.h and just do

 /* For the 'cmd' values consule tmem.h. Look for XEN_SYSCTL_TMEM_OP_* */

Better to stay consistent with sysctl.h.   Separating the structure and
the subops is unhelpful to people reading the code.

 +struct xen_sysctl_tmem_op {
 +uint32_t cmd;   /* IN: XEN_SYSCTL_TMEM_OP_* . */
 +int32_t pool_id;/* IN: 0 by default unless _SAVE_*, RESTORE_* .*/
 +uint32_t cli_id;/* IN: client id, 0 for 
 XEN_SYSCTL_TMEM_QUERY_FREEABLE_MB
 +   for all others can be the domain id or
 +   XEN_SYSCTL_TMEM_OP_ALL_CLIENTS for all. */
 +uint32_t arg1;  /* IN: If not applicable to command use 0. */
 +uint32_t arg2;  /* IN: If not applicable to command use 0. */
 +uint8_t  pad[4];/* Padding so structure is the same under 32 and 
 64. */
 uint32_t please. And despite this being an (easily changeable) sysctl,
 verifying that it's zero on input would be nice.
 Yes! That was what I forgotten!
 --- a/xen/include/public/tmem.h
 +++ b/xen/include/public/tmem.h
 @@ -33,7 +33,7 @@
  #define TMEM_SPEC_VERSION  1
  
  /* Commands to HYPERVISOR_tmem_op() */
 -#define TMEM_CONTROL   0
 +#define TMEM_CONTROL_MOVED 0
 Perhaps say where it moved in a brief comment?
 Yes, good idea.

 /* Deprecated. These operations are done now via XEN_SYSCTL_tmem_op
  * using the sysctl hypercall. */

I hope XEN_SYSCTL_tmem_op is sufficient to imply using the sysctl
hypercall ;)


 --- a/xen/xsm/flask/hooks.c
 +++ b/xen/xsm/flask/hooks.c
 @@ -761,6 +761,9 @@ static int flask_sysctl(int cmd)
  case XEN_SYSCTL_tbuf_op:
  return domain_has_xen(current-domain, XEN__TBUFCONTROL);
  
 +case XEN_SYSCTL_tmem_op:
 +return domain_has_xen(current-domain, XEN__TMEM_CONTROL);
 +
  case XEN_SYSCTL_sched_id:
  return domain_has_xen(current-domain, XEN__GETSCHEDULER);
 Hmm, these cases appear to be roughly sorted numerically, i.e.
 yours would normally go at the end.
 I recall a comment from Andrew asking the newly introduced commands to
 be in alphabetical order. But perhaps that was for domctl which is more
 .. ah.. random?

Really? that doesn't sound like me.  Apologies if it was.  (Alphabetic
for obj-y in a Makefile perhaps?)

Numeric would be far better here.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [BUG] 'o' debug key panics the hypervisor

2015-08-27 Thread Konrad Rzeszutek Wilk
On Thu, Aug 27, 2015 at 04:07:36PM +0200, Roger Pau Monné wrote:
 El 27/08/15 a les 16.01, Jan Beulich ha escrit:
  On 27.08.15 at 13:29, roger@citrix.com wrote:
  When using Intel hardware without shared page tables between the IOMMU 
  and EPT (I have not tried if the same happens when sharing the page 
  tables), the following crash happens if I press the 'o' debug key with 
  a HVM guest running. The guest doesn't have any device passed-through.
  
  This last thing is the relevant one: No-one ever cared to use 'o' without
  also passing through a device.
 
 I have to admit I discovered this when I thought I was typing at the
 Dom0 console but it turns out I had it switched to the Xen console...
 Anyway, it shouldn't crash Xen.

And it would be called by '*' which is often used in the field to
collect data.

I concur on the not-crashing Xen part.
 
 Roger.
 
 
 ___
 Xen-devel mailing list
 Xen-devel@lists.xen.org
 http://lists.xen.org/xen-devel

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 4/8] tmem: Remove xc_tmem_control mystical arg3

2015-08-27 Thread Konrad Rzeszutek Wilk
On Thu, Aug 27, 2015 at 01:53:17PM +0100, Andrew Cooper wrote:
 On 27/08/15 12:01, Konrad Rzeszutek Wilk wrote:
  It mentions it but it is never used. The hypercall interface
  knows nothing of this sort of thing either. Lets just remove it.
 
  Signed-off-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
 
 It would be nice if you could take the opportunity of changing every
 caller to fix the style issues in affected code.  There are a huge
 number of missing spaces in the parameter lists of calls to
 xc_tmem_control()

I was soo tempted to do it - but I've been hammered in the rule of
'one logical change per patch' so abstained from mission creep.

Do not fear - I will send out another patch that will fix this up :-)
 
 ~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [DOCSDAY PATCH for-4.6] docs: Fix installation of man8 pages

2015-08-27 Thread Andrew Cooper
c/s a430436 docs: Support for generating man(8) pages accidentally
failed to update to the install and clean rules for man8 pages, meaning
that c/s 7b21214 docs: Move xentrace.8 to docs/man/xentrace.pod.8
caused a packaging regression when it came to xentop.8

To avoid similar bugs in the future, move the generation of the build,
install and clean rules into the manpage metarule.

Signed-off-by: Andrew Cooper andrew.coop...@citrix.com
---
CC: Ian Campbell ian.campb...@citrix.com
CC: Ian Jackson ian.jack...@eu.citrix.com
CC: Wei Liu wei.l...@citrix.com
---
 docs/Makefile |   35 +++
 1 file changed, 23 insertions(+), 12 deletions(-)

diff --git a/docs/Makefile b/docs/Makefile
index 3d77913..d25e6c7 100644
--- a/docs/Makefile
+++ b/docs/Makefile
@@ -58,20 +58,15 @@ else
@echo fig2dev (transfig) not installed; skipping figs.
 endif
 
-.PHONY: man-pages
-man-pages: $(DOC_MAN1) $(DOC_MAN5) $(DOC_MAN8)
-
 .PHONY: pdf
 pdf: $(DOC_PDF)
 
 .PHONY: clean
-clean:
+clean: clean-man-pages
$(MAKE) -C figs clean
rm -rf .word_count *.aux *.dvi *.bbl *.blg *.glo *.idx *~
rm -rf *.ilg *.log *.ind *.toc *.bak *.tmp core
rm -rf html txt pdf
-   rm -rf man5
-   rm -rf man1
 
 .PHONY: distclean
 distclean: clean
@@ -80,6 +75,8 @@ distclean: clean
 
 # Top level install targets
 
+.PHONY: man-pages install-man-pages clean-man-pages
+
 # Metarules for generating manpages.  Run with $(1) substitued for section
 define GENERATE_MANPAGE_RULES
 
@@ -110,17 +107,31 @@ else
@echo pod2text not installed; skipping $$@
 endif
 
+# Build
+.PHONY: man$(1)-pages
+man$(1)-pages: $$(DOC_MAN$(1))
+
+# Install
+.PHONY: install-man$(1)-pages
+install-man$(1)-pages: man$(1)-pages
+   $(INSTALL_DIR) $(DESTDIR)$(mandir)
+   cp -r man$(1) $(DESTDIR)$(mandir)
+
+# Clean
+.PHONY: clean-man$(1)-pages
+clean-man$(1)-pages:
+   rm -rf man$(1)
+
+# Link buld/install/clean to toplevel rules
+man-pages: man$(1)-pages
+install-man-pages: install-man$(1)-pages
+clean-man-pages: clean-man$(1)-pages
+
 endef
 
 # Generate manpage rules for each section
 $(foreach i,1 5 8,$(eval $(call GENERATE_MANPAGE_RULES,$(i
 
-.PHONY: install-man-pages
-install-man-pages: man-pages
-   $(INSTALL_DIR) $(DESTDIR)$(mandir)
-   cp -R man1 $(DESTDIR)$(mandir)
-   cp -R man5 $(DESTDIR)$(mandir)
-
 .PHONY: install-html
 install-html: html txt figs
$(INSTALL_DIR) $(DESTDIR)$(docdir)
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [xen-4.4-testing test] 60873: regressions - FAIL

2015-08-27 Thread osstest service owner
flight 60873 xen-4.4-testing real [real]
http://logs.test-lab.xenproject.org/osstest/logs/60873/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-i386-xl-qemuu-debianhvm-amd64 19 guest-start/debianhvm.repeat fail 
REGR. vs. 60727
 test-armhf-armhf-xl-credit2  16 guest-start/debian.repeat fail REGR. vs. 60727

Regressions which are regarded as allowable (not blocking):
 test-amd64-amd64-libvirt-raw  9 debian-di-install fail REGR. vs. 60727
 test-amd64-i386-libvirt-vhd   9 debian-di-install fail REGR. vs. 60727
 test-armhf-armhf-xl-multivcpu 16 guest-start/debian.repeatfail  like 60696
 test-amd64-i386-libvirt-qcow2  9 debian-di-installfail  like 60727
 test-amd64-amd64-xl-vhd   9 debian-di-installfail   like 60727
 test-amd64-i386-libvirt  11 guest-start  fail   like 60727
 test-amd64-amd64-libvirt 11 guest-start  fail   like 60727
 test-amd64-i386-xl-vhd9 debian-di-installfail   like 60727
 test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stop fail like 60727

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-rumpuserxen-amd64  1 build-check(1)   blocked n/a
 test-amd64-amd64-migrupgrade  1 build-check(1)   blocked  n/a
 test-amd64-i386-rumpuserxen-i386  1 build-check(1)   blocked  n/a
 test-amd64-i386-migrupgrade   1 build-check(1)   blocked  n/a
 build-amd64-rumpuserxen   6 xen-buildfail   never pass
 test-armhf-armhf-xl-vhd   9 debian-di-installfail   never pass
 test-armhf-armhf-libvirt-vhd  9 debian-di-installfail   never pass
 test-amd64-amd64-xl-qcow2 9 debian-di-installfail   never pass
 test-armhf-armhf-xl-raw   9 debian-di-installfail   never pass
 build-amd64-prev  5 xen-buildfail   never pass
 build-i386-prev   5 xen-buildfail   never pass
 test-armhf-armhf-xl-qcow2 9 debian-di-installfail   never pass
 build-i386-rumpuserxen6 xen-buildfail   never pass
 test-amd64-i386-libvirt-pair 21 guest-migrate/src_host/dst_host fail never pass
 test-amd64-amd64-libvirt-pair 21 guest-migrate/src_host/dst_host fail never 
pass
 test-armhf-armhf-libvirt 11 guest-start  fail   never pass
 test-armhf-armhf-libvirt-qcow2  9 debian-di-installfail never pass
 test-armhf-armhf-xl-arndale  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  13 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qcow2 11 migrate-support-checkfail never pass
 test-amd64-i386-libvirt-raw  11 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  13 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 11 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 12 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 13 saverestore-support-checkfail never pass
 test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop  fail never pass
 test-amd64-i386-xl-qemut-win7-amd64 17 guest-stop  fail never pass
 test-amd64-amd64-xl-qemut-win7-amd64 17 guest-stop fail never pass
 test-armhf-armhf-libvirt-raw  9 debian-di-installfail   never pass
 test-amd64-i386-xend-qemut-winxpsp3 21 leak-check/checkfail never pass
 test-armhf-armhf-xl-multivcpu 13 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 12 migrate-support-checkfail  never pass

version targeted for testing:
 xen  27b82b08b17f589f638f3d5be8dcea42b5e73330
baseline version:
 xen  3646b134c1673f09c0a239de10b0da4c9265c8e8

Last test of basis60727  2015-08-16 16:15:09 Z   11 days
Testing same since60802  2015-08-20 14:41:37 Z7 days3 attempts


People who touched revisions under test:
  Jan Beulich jbeul...@suse.com

jobs:
 build-amd64-xend pass
 build-i386-xend  pass
 build-amd64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-armhf-libvirt  pass
 build-i386-libvirt  

Re: [Xen-devel] [PATCH v2 5/8] tmem: Move TMEM_CONTROL subop of tmem hypercall to sysctl.

2015-08-27 Thread Andrew Cooper
On 27/08/15 19:47, Konrad Rzeszutek Wilk wrote:

 --- a/xen/include/public/sysctl.h
 +++ b/xen/include/public/sysctl.h
 @@ -710,6 +710,48 @@ struct xen_sysctl_psr_cat_op {
  typedef struct xen_sysctl_psr_cat_op xen_sysctl_psr_cat_op_t;
  DEFINE_XEN_GUEST_HANDLE(xen_sysctl_psr_cat_op_t);
  
 +#define XEN_SYSCTL_TMEM_OP_ALL_CLIENTS 0xU
 +
 +#define XEN_SYSCTL_TMEM_OP_THAW   0
 +#define XEN_SYSCTL_TMEM_OP_FREEZE 1
 +#define XEN_SYSCTL_TMEM_OP_FLUSH  2
 +#define XEN_SYSCTL_TMEM_OP_DESTROY3
 +#define XEN_SYSCTL_TMEM_OP_LIST   4
 +#define XEN_SYSCTL_TMEM_OP_SET_WEIGHT 5
 +#define XEN_SYSCTL_TMEM_OP_SET_CAP6
 +#define XEN_SYSCTL_TMEM_OP_SET_COMPRESS   7
 +#define XEN_SYSCTL_TMEM_OP_QUERY_FREEABLE_MB  8
 +#define XEN_SYSCTL_TMEM_OP_SAVE_BEGIN 10
 +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_VERSION   11
 +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_MAXPOOLS  12
 +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_CLIENT_WEIGHT 13
 +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_CLIENT_CAP14
 +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_CLIENT_FLAGS  15
 +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_POOL_FLAGS16
 +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_POOL_NPAGES   17
 +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_POOL_UUID 18
 +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_NEXT_PAGE 19
 +#define XEN_SYSCTL_TMEM_OP_SAVE_GET_NEXT_INV  20
 +#define XEN_SYSCTL_TMEM_OP_SAVE_END   21
 +#define XEN_SYSCTL_TMEM_OP_RESTORE_BEGIN  30
 +#define XEN_SYSCTL_TMEM_OP_RESTORE_PUT_PAGE   32
 +#define XEN_SYSCTL_TMEM_OP_RESTORE_FLUSH_PAGE 33
 +
 +struct xen_sysctl_tmem_op {
 +uint32_t cmd;   /* IN: XEN_SYSCTL_TMEM_OP_* . */
 +int32_t pool_id;/* IN: 0 by default unless _SAVE_*, RESTORE_* .*/
 +uint32_t cli_id;/* IN: client id, 0 for 
 XEN_SYSCTL_TMEM_QUERY_FREEABLE_MB
 +   for all others can be the domain id or
 +   XEN_SYSCTL_TMEM_OP_ALL_CLIENTS for all. */
 +uint32_t arg1;  /* IN: If not applicable to command use 0. */
 +uint32_t arg2;  /* IN: If not applicable to command use 0. */
 Please can this interface be fixed as part of the move, even if it is in
 subsequent patches for clarity.
 I will gladly fix this interface in further patches. By all means!

What I wish to avoid is 4.6 releasing with the API in this state, as
adjusting valgrind and strace to compensate will be miserable.

The best solution would be to have this patch and the fixups adjacent in
the series, at which point the valgrind/strace adjustments can start
with the clean API for 4.6

 +uint64_t oid[3];/* IN: If not applicable to command use 0. */
 +XEN_GUEST_HANDLE_64(char) buf; /* IN/OUT: Buffer to save and restore 
 ops. */
 +};
 +typedef struct xen_sysctl_tmem_op xen_sysctl_tmem_op_t;
 +DEFINE_XEN_GUEST_HANDLE(xen_sysctl_tmem_op_t);
 +
  struct xen_sysctl {
  uint32_t cmd;
  #define XEN_SYSCTL_readconsole1
 @@ -734,6 +776,7 @@ struct xen_sysctl {
  #define XEN_SYSCTL_psr_cmt_op21
  #define XEN_SYSCTL_pcitopoinfo   22
  #define XEN_SYSCTL_psr_cat_op23
 +#define XEN_SYSCTL_tmem_op   24
  uint32_t interface_version; /* XEN_SYSCTL_INTERFACE_VERSION */
  union {
  struct xen_sysctl_readconsole   readconsole;
 @@ -758,6 +801,7 @@ struct xen_sysctl {
  struct xen_sysctl_coverage_op   coverage_op;
  struct xen_sysctl_psr_cmt_oppsr_cmt_op;
  struct xen_sysctl_psr_cat_oppsr_cat_op;
 +struct xen_sysctl_tmem_op   tmem_op;
  uint8_t pad[128];
  } u;
  };
 diff --git a/xen/include/public/tmem.h b/xen/include/public/tmem.h
 index 4fd2fc6..208f5a6 100644
 --- a/xen/include/public/tmem.h
 +++ b/xen/include/public/tmem.h
 @@ -33,7 +33,7 @@
  #define TMEM_SPEC_VERSION  1
  
  /* Commands to HYPERVISOR_tmem_op() */
 -#define TMEM_CONTROL   0
 +#define TMEM_CONTROL_MOVED 0
 If anything, this should be deleted as opposed to being renamed.  The
 _MOVED suffix is confusing, as TMEM_CONTROL_MOVED sounds like it could
 be a plausible TMEM operation.
 Ah, will do what Jan asked and just put a big comment. And keep the old
 name (or maybe add 'DEPRECATED'?)

If you are going to the effort of renaming, then just delete.  Either
way will break the build of unsuspecting code, and the latter leaves us
with less junk.

A comment however will be useful in all cases.

  #define TMEM_NEW_POOL  1
  #define TMEM_DESTROY_POOL  2
  #define TMEM_PUT_PAGE  4
 @@ -48,35 +48,9 @@
  #endif
  
  /* Privileged commands to HYPERVISOR_tmem_op() */
 -#define TMEM_AUTH 101 
 +#define TMEM_AUTH 101
  #define TMEM_RESTORE_NEW  102
  
 -/* Subops for HYPERVISOR_tmem_op(TMEM_CONTROL) */
 

Re: [Xen-devel] [PATCH v2 5/8] tmem: Move TMEM_CONTROL subop of tmem hypercall to sysctl.

2015-08-27 Thread Konrad Rzeszutek Wilk
  +struct xen_sysctl_tmem_op {
  +uint32_t cmd;   /* IN: XEN_SYSCTL_TMEM_OP_* . */
  +int32_t pool_id;/* IN: 0 by default unless _SAVE_*, RESTORE_* .*/
  +uint32_t cli_id;/* IN: client id, 0 for 
  XEN_SYSCTL_TMEM_QUERY_FREEABLE_MB
  +   for all others can be the domain id or
  +   XEN_SYSCTL_TMEM_OP_ALL_CLIENTS for all. */
  +uint32_t arg1;  /* IN: If not applicable to command use 0. */
  +uint32_t arg2;  /* IN: If not applicable to command use 0. */
  Please can this interface be fixed as part of the move, even if it is in
  subsequent patches for clarity.
  I will gladly fix this interface in further patches. By all means!
 
 What I wish to avoid is 4.6 releasing with the API in this state, as
 adjusting valgrind and strace to compensate will be miserable.

Right.
 
 The best solution would be to have this patch and the fixups adjacent in
 the series, at which point the valgrind/strace adjustments can start
 with the clean API for 4.6

Yes. I shall post this patchset plus extra patches next week.

 
  +uint64_t oid[3];/* IN: If not applicable to command use 0. */
  +XEN_GUEST_HANDLE_64(char) buf; /* IN/OUT: Buffer to save and restore 
  ops. */
  +};
  +typedef struct xen_sysctl_tmem_op xen_sysctl_tmem_op_t;
  +DEFINE_XEN_GUEST_HANDLE(xen_sysctl_tmem_op_t);
  +
   struct xen_sysctl {
   uint32_t cmd;
   #define XEN_SYSCTL_readconsole1
  @@ -734,6 +776,7 @@ struct xen_sysctl {
   #define XEN_SYSCTL_psr_cmt_op21
   #define XEN_SYSCTL_pcitopoinfo   22
   #define XEN_SYSCTL_psr_cat_op23
  +#define XEN_SYSCTL_tmem_op   24
   uint32_t interface_version; /* XEN_SYSCTL_INTERFACE_VERSION */
   union {
   struct xen_sysctl_readconsole   readconsole;
  @@ -758,6 +801,7 @@ struct xen_sysctl {
   struct xen_sysctl_coverage_op   coverage_op;
   struct xen_sysctl_psr_cmt_oppsr_cmt_op;
   struct xen_sysctl_psr_cat_oppsr_cat_op;
  +struct xen_sysctl_tmem_op   tmem_op;
   uint8_t pad[128];
   } u;
   };
  diff --git a/xen/include/public/tmem.h b/xen/include/public/tmem.h
  index 4fd2fc6..208f5a6 100644
  --- a/xen/include/public/tmem.h
  +++ b/xen/include/public/tmem.h
  @@ -33,7 +33,7 @@
   #define TMEM_SPEC_VERSION  1
   
   /* Commands to HYPERVISOR_tmem_op() */
  -#define TMEM_CONTROL   0
  +#define TMEM_CONTROL_MOVED 0
  If anything, this should be deleted as opposed to being renamed.  The
  _MOVED suffix is confusing, as TMEM_CONTROL_MOVED sounds like it could
  be a plausible TMEM operation.
  Ah, will do what Jan asked and just put a big comment. And keep the old
  name (or maybe add 'DEPRECATED'?)
 
 If you are going to the effort of renaming, then just delete.  Either
 way will break the build of unsuspecting code, and the latter leaves us
 with less junk.

It will break the tmem hypervisor code as it has a check for
TMEM_CONTROL (which per Jan suggestion should return -E.. something).

 
 A comment however will be useful in all cases.

Aye.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [xen-unstable baseline test] 60878: regressions - FAIL

2015-08-27 Thread osstest service owner
Old tested version had not actually been tested; therefore in this
flight we test it, rather than a new candidate.  The baseline, if
any, is the most recent actually tested revision.

flight 60878 xen-unstable real [real]
http://logs.test-lab.xenproject.org/osstest/logs/60878/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-i386-xl-qemuu-debianhvm-amd64 19 guest-start/debianhvm.repeat fail 
REGR. vs. 60775
 test-armhf-armhf-xl-cubietruck  6 xen-bootfail REGR. vs. 60775
 test-amd64-amd64-xl-qemuu-ovmf-amd64 19 guest-start/debianhvm.repeat fail 
REGR. vs. 60775
 test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsm 15 guest-localmigrate.2 
fail REGR. vs. 60775

Regressions which are regarded as allowable (not blocking):
 test-armhf-armhf-libvirt-qcow2  5 xen-install fail REGR. vs. 60775
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 16 
guest-start/debianhvm.repeat fail REGR. vs. 60775
 test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm 9 debian-hvm-install fail 
like 60775
 test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop  fail like 60775
 test-amd64-amd64-xl-qemut-win7-amd64 17 guest-stop fail like 60775

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-xl-qcow2 9 debian-di-installfail   never pass
 test-amd64-amd64-xl-pvh-intel 11 guest-start  fail  never pass
 test-amd64-amd64-xl-pvh-amd  11 guest-start  fail   never pass
 test-armhf-armhf-libvirt-raw  9 debian-di-installfail   never pass
 test-armhf-armhf-xl-raw   9 debian-di-installfail   never pass
 test-armhf-armhf-xl-vhd   9 debian-di-installfail   never pass
 test-armhf-armhf-libvirt-vhd  9 debian-di-installfail   never pass
 test-armhf-armhf-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-xsm 14 guest-saverestorefail   never pass
 test-armhf-armhf-libvirt 14 guest-saverestorefail   never pass
 test-armhf-armhf-libvirt 12 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-pair 21 guest-migrate/src_host/dst_host fail never 
pass
 test-amd64-i386-libvirt-xsm  12 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  12 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 12 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 guest-start.2fail   never pass
 test-armhf-armhf-xl-rtds 13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 12 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-pair 21 guest-migrate/src_host/dst_host fail never pass
 test-armhf-armhf-xl-arndale  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  13 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qcow2 11 migrate-support-checkfail never pass
 test-armhf-armhf-xl-credit2  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  12 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check 
fail never pass
 test-armhf-armhf-xl  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 13 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 12 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-xsm  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-xsm  12 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 11 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-raw 11 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-raw  11 migrate-support-checkfail   never pass
 test-amd64-i386-xl-qemut-win7-amd64 17 guest-stop  fail never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-vhd  11 migrate-support-checkfail   never pass
 test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stop fail never pass
 test-amd64-i386-libvirt-qcow2 11 migrate-support-checkfail  never pass

version targeted for testing:
 xen  7b99717f62caeac08eea224a177cd28f047ac4b5
baseline version:
 xen  743289d0296268fe6bad64531a24d8053afeb062

Last test of basis60810  2015-08-20 23:37:03 Z6 days
Testing same since60841  2015-08-23 06:47:14 Z4 days2 attempts


People who touched revisions under test:
  Boris Ostrovsky boris.ostrov...@oracle.com
  Ian 

Re: [Xen-devel] [PATCH v2 6/8] tmem: Remove the old tmem control XSM checks as it is part of sysctl hypercall.

2015-08-27 Thread Daniel De Graaf

On 27/08/15 07:02, Konrad Rzeszutek Wilk wrote:

The sysctl is where the tmem control operations are done and the
XSM checks are done via there. The old mechanism (to check
for control tmem op XSM from do_tmem_op) is not needed anymore.

CC:  Daniel De Graaf dgde...@tycho.nsa.gov
Signed-off-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com


Both this and patch 5 (which adds the sysctl check):
Acked-by: Daniel De Graaf dgde...@tycho.nsa.gov

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [qemu-mainline test] 60879: regressions - FAIL

2015-08-27 Thread osstest service owner
flight 60879 qemu-mainline real [real]
http://logs.test-lab.xenproject.org/osstest/logs/60879/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-armhf-armhf-xl-multivcpu 11 guest-start  fail REGR. vs. 60846

Regressions which are regarded as allowable (not blocking):
 test-armhf-armhf-xl-rtds 11 guest-start  fail   like 60846

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-xl-qcow2 9 debian-di-installfail   never pass
 test-amd64-amd64-xl-pvh-intel 11 guest-start  fail  never pass
 test-armhf-armhf-libvirt-qcow2  9 debian-di-installfail never pass
 test-armhf-armhf-libvirt-vhd  9 debian-di-installfail   never pass
 test-amd64-amd64-xl-pvh-amd  11 guest-start  fail   never pass
 test-armhf-armhf-libvirt-raw  9 debian-di-installfail   never pass
 test-armhf-armhf-xl-vhd   9 debian-di-installfail   never pass
 test-armhf-armhf-xl-raw   9 debian-di-installfail   never pass
 test-armhf-armhf-libvirt 14 guest-saverestorefail   never pass
 test-armhf-armhf-libvirt 12 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-xsm 14 guest-saverestorefail   never pass
 test-amd64-amd64-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  12 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-pair 21 guest-migrate/src_host/dst_host fail never pass
 test-armhf-armhf-xl-arndale  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  13 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-raw 11 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-xsm  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-xsm  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 12 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 13 saverestore-support-checkfail never pass
 test-armhf-armhf-xl  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  12 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  12 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 12 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-pair 21 guest-migrate/src_host/dst_host fail never 
pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check 
fail never pass
 test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stop fail never pass
 test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop  fail never pass
 test-amd64-amd64-libvirt-qcow2 11 migrate-support-checkfail never pass
 test-amd64-i386-libvirt-qcow2 11 migrate-support-checkfail  never pass
 test-amd64-i386-libvirt-vhd   9 debian-di-installfail   never pass
 test-amd64-i386-libvirt-raw  11 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd  9 debian-di-installfail   never pass
 test-amd64-amd64-xl-vhd   9 debian-di-installfail   never pass
 test-amd64-i386-xl-vhd9 debian-di-installfail   never pass

version targeted for testing:
 qemuu7df9671989c1cfa693764f9ae6349324b2ada02a
baseline version:
 qemuua30878e708c2149ce07d709a8b62edd944628449

Last test of basis60846  2015-08-23 16:48:08 Z4 days
Testing same since60879  2015-08-25 22:41:03 Z1 days1 attempts


People who touched revisions under test:
  Alistair Francis alistair.fran...@xilinx.com
  Aurelien Jarno aurel...@aurel32.net
  Benjamin Herrenschmidt b...@kernel.crashing.org
  Claudio Fontana claudio.font...@huawei.com
  Laurent Vivier laur...@vivier.eu
  Peter Maydell peter.mayd...@linaro.org
  Richard Henderson r...@twiddle.net

jobs:
 build-amd64-xsm  pass
 build-armhf-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-armhf-libvirt  pass
 

Re: [Xen-devel] [PATCH v2] libxenstore: prefer using the character device

2015-08-27 Thread Jonathan Creekmore

Ian Jackson ian.jack...@eu.citrix.com writes:

Wei Liu writes (Re: [Xen-devel] [PATCH v2] libxenstore: prefer 
using the character device): 
On Thu, Aug 27, 2015 at 09:04:38AM -0500, Jonathan Creekmore 
wrote: 
 With the addition of FMODE_ATOMIC_POS in the Linux 3.14 
 kernel, concurrent blocking file accesses to a single open 
 file descriptor can cause a deadlock trying to grab the file 
 position lock. If a watch has been set up, causing a 
 read_thread to blocking read on the file descriptor, then 
 future writes that would cause the background read to 
 complete will block waiting on the file position lock before 
 they can execute. This race condition only occurs when 
 libxenstore is accessing the xenstore daemon through the 
 /proc/xen/xenbus file and not through the unix domain socket, 
 which is the case when the xenstore daemon is running as a 
 stub domain or when oxenstored is passed 
 --disable-socket. Accessing the daemon from the true 
 character device also does not exhibit this problem.   On 
 Linux, prefer using the character device file over the proc 
 file if the character device exists. 


I confess I still see this as working around a kernel bug.  Only 
this time we are switching from a buggy to non-buggy kernel 
interface. 


Why don't we have the kernel provide only non-buggy interfaces ?


I was just trying to implement what David suggested; CC'ing him on 
this as well.




 diff --git a/tools/xenstore/xs_lib.c 
 b/tools/xenstore/xs_lib.c index af4f75a..0c7744e 100644 --- 
 a/tools/xenstore/xs_lib.c +++ b/tools/xenstore/xs_lib.c @@ 
 -81,6 +81,8 @@ const char *xs_domain_dev(void) 
  #if defined(__RUMPUSER_XEN__) || defined(__RUMPRUN__) return 
  /dev/xen/xenbus; #elif defined(__linux__) 
 +	if (access(/dev/xen/xenbus, F_OK) == 0) + 
 return /dev/xen/xenbus; 


Also, previously xs_domain_dev was a function which simply 
returned a static value.  I feel vaguely uneasy at putting this 
kind of autodetection logic here. 


Not entirely; the existing code queried an environment variable first
and, only if that was not set, did it return a static value. I added the
autodetection logic to fall back in the case where, for some reason,
/dev/xen/xenbus did not exist. Handling that kind of a fallback at a
higher layer would be a larger refactor to probe for the existence of
the character device first.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [3.19.y-ckt stable] Patch x86/xen: Probe target addresses in set_aliased_prot() before the hypercall has been added to staging queue

2015-08-27 Thread Kamal Mostafa
This is a note to let you know that I have just added a patch titled

x86/xen: Probe target addresses in set_aliased_prot() before the hypercall

to the linux-3.19.y-queue branch of the 3.19.y-ckt extended stable tree 
which can be found at:

http://kernel.ubuntu.com/git/ubuntu/linux.git/log/?h=linux-3.19.y-queue

This patch is scheduled to be released in version 3.19.8-ckt6.

If you, or anyone else, feels it should not be added to this tree, please 
reply to this email.

For more information about the 3.19.y-ckt tree, see
https://wiki.ubuntu.com/Kernel/Dev/ExtendedStable

Thanks.
-Kamal

--

From 922c946e2cad7cdaec82a9ae963c5eb47f7992b0 Mon Sep 17 00:00:00 2001
From: Andy Lutomirski l...@kernel.org
Date: Thu, 30 Jul 2015 14:31:31 -0700
Subject: x86/xen: Probe target addresses in set_aliased_prot() before the
 hypercall

commit aa1acff356bbedfd03b544051f5b371746735d89 upstream.

The update_va_mapping hypercall can fail if the VA isn't present
in the guest's page tables.  Under certain loads, this can
result in an OOPS when the target address is in unpopulated vmap
space.

While we're at it, add comments to help explain what's going on.

This isn't a great long-term fix.  This code should probably be
changed to use something like set_memory_ro.

Signed-off-by: Andy Lutomirski l...@kernel.org
Cc: Andrew Cooper andrew.coop...@citrix.com
Cc: Andy Lutomirski l...@amacapital.net
Cc: Boris Ostrovsky boris.ostrov...@oracle.com
Cc: Borislav Petkov b...@alien8.de
Cc: Brian Gerst brge...@gmail.com
Cc: David Vrabel dvra...@cantab.net
Cc: Denys Vlasenko dvlas...@redhat.com
Cc: H. Peter Anvin h...@zytor.com
Cc: Jan Beulich jbeul...@suse.com
Cc: Konrad Rzeszutek Wilk konrad.w...@oracle.com
Cc: Linus Torvalds torva...@linux-foundation.org
Cc: Peter Zijlstra pet...@infradead.org
Cc: Sasha Levin sasha.le...@oracle.com
Cc: Steven Rostedt rost...@goodmis.org
Cc: Thomas Gleixner t...@linutronix.de
Cc: secur...@kernel.org secur...@kernel.org
Cc: xen-devel xen-devel@lists.xen.org
Link: 
http://lkml.kernel.org/r/0b0e55b995cda11e7829f140b833ef932fcabe3a.1438291540.git.l...@kernel.org
Signed-off-by: Ingo Molnar mi...@kernel.org
Signed-off-by: Kamal Mostafa ka...@canonical.com
---
 arch/x86/xen/enlighten.c | 40 
 1 file changed, 40 insertions(+)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 78a881b..f94ad30 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -483,6 +483,7 @@ static void set_aliased_prot(void *v, pgprot_t prot)
pte_t pte;
unsigned long pfn;
struct page *page;
+   unsigned char dummy;

ptep = lookup_address((unsigned long)v, level);
BUG_ON(ptep == NULL);
@@ -492,6 +493,32 @@ static void set_aliased_prot(void *v, pgprot_t prot)

pte = pfn_pte(pfn, prot);

+   /*
+* Careful: update_va_mapping() will fail if the virtual address
+* we're poking isn't populated in the page tables.  We don't
+* need to worry about the direct map (that's always in the page
+* tables), but we need to be careful about vmap space.  In
+* particular, the top level page table can lazily propagate
+* entries between processes, so if we've switched mms since we
+* vmapped the target in the first place, we might not have the
+* top-level page table entry populated.
+*
+* We disable preemption because we want the same mm active when
+* we probe the target and when we issue the hypercall.  We'll
+* have the same nominal mm, but if we're a kernel thread, lazy
+* mm dropping could change our pgd.
+*
+* Out of an abundance of caution, this uses __get_user() to fault
+* in the target address just in case there's some obscure case
+* in which the target address isn't readable.
+*/
+
+   preempt_disable();
+
+   pagefault_disable();/* Avoid warnings due to being atomic. */
+   __get_user(dummy, (unsigned char __user __force *)v);
+   pagefault_enable();
+
if (HYPERVISOR_update_va_mapping((unsigned long)v, pte, 0))
BUG();

@@ -503,6 +530,8 @@ static void set_aliased_prot(void *v, pgprot_t prot)
BUG();
} else
kmap_flush_unused();
+
+   preempt_enable();
 }

 static void xen_alloc_ldt(struct desc_struct *ldt, unsigned entries)
@@ -510,6 +539,17 @@ static void xen_alloc_ldt(struct desc_struct *ldt, 
unsigned entries)
const unsigned entries_per_page = PAGE_SIZE / LDT_ENTRY_SIZE;
int i;

+   /*
+* We need to mark the all aliases of the LDT pages RO.  We
+* don't need to call vm_flush_aliases(), though, since that's
+* only responsible for flushing aliases out the TLBs, not the
+* page tables, and Xen will flush the TLB for us if needed.
+*
+* To avoid confusing future readers: none of this is 

[Xen-devel] [PATCH 3.19.y-ckt 111/130] x86/xen: Probe target addresses in set_aliased_prot() before the hypercall

2015-08-27 Thread Kamal Mostafa
3.19.8-ckt6 -stable review patch.  If anyone has any objections, please let me 
know.

--

From: Andy Lutomirski l...@kernel.org

commit aa1acff356bbedfd03b544051f5b371746735d89 upstream.

The update_va_mapping hypercall can fail if the VA isn't present
in the guest's page tables.  Under certain loads, this can
result in an OOPS when the target address is in unpopulated vmap
space.

While we're at it, add comments to help explain what's going on.

This isn't a great long-term fix.  This code should probably be
changed to use something like set_memory_ro.

Signed-off-by: Andy Lutomirski l...@kernel.org
Cc: Andrew Cooper andrew.coop...@citrix.com
Cc: Andy Lutomirski l...@amacapital.net
Cc: Boris Ostrovsky boris.ostrov...@oracle.com
Cc: Borislav Petkov b...@alien8.de
Cc: Brian Gerst brge...@gmail.com
Cc: David Vrabel dvra...@cantab.net
Cc: Denys Vlasenko dvlas...@redhat.com
Cc: H. Peter Anvin h...@zytor.com
Cc: Jan Beulich jbeul...@suse.com
Cc: Konrad Rzeszutek Wilk konrad.w...@oracle.com
Cc: Linus Torvalds torva...@linux-foundation.org
Cc: Peter Zijlstra pet...@infradead.org
Cc: Sasha Levin sasha.le...@oracle.com
Cc: Steven Rostedt rost...@goodmis.org
Cc: Thomas Gleixner t...@linutronix.de
Cc: secur...@kernel.org secur...@kernel.org
Cc: xen-devel xen-devel@lists.xen.org
Link: 
http://lkml.kernel.org/r/0b0e55b995cda11e7829f140b833ef932fcabe3a.1438291540.git.l...@kernel.org
Signed-off-by: Ingo Molnar mi...@kernel.org
Signed-off-by: Kamal Mostafa ka...@canonical.com
---
 arch/x86/xen/enlighten.c | 40 
 1 file changed, 40 insertions(+)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 78a881b..f94ad30 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -483,6 +483,7 @@ static void set_aliased_prot(void *v, pgprot_t prot)
pte_t pte;
unsigned long pfn;
struct page *page;
+   unsigned char dummy;
 
ptep = lookup_address((unsigned long)v, level);
BUG_ON(ptep == NULL);
@@ -492,6 +493,32 @@ static void set_aliased_prot(void *v, pgprot_t prot)
 
pte = pfn_pte(pfn, prot);
 
+   /*
+* Careful: update_va_mapping() will fail if the virtual address
+* we're poking isn't populated in the page tables.  We don't
+* need to worry about the direct map (that's always in the page
+* tables), but we need to be careful about vmap space.  In
+* particular, the top level page table can lazily propagate
+* entries between processes, so if we've switched mms since we
+* vmapped the target in the first place, we might not have the
+* top-level page table entry populated.
+*
+* We disable preemption because we want the same mm active when
+* we probe the target and when we issue the hypercall.  We'll
+* have the same nominal mm, but if we're a kernel thread, lazy
+* mm dropping could change our pgd.
+*
+* Out of an abundance of caution, this uses __get_user() to fault
+* in the target address just in case there's some obscure case
+* in which the target address isn't readable.
+*/
+
+   preempt_disable();
+
+   pagefault_disable();/* Avoid warnings due to being atomic. */
+   __get_user(dummy, (unsigned char __user __force *)v);
+   pagefault_enable();
+
if (HYPERVISOR_update_va_mapping((unsigned long)v, pte, 0))
BUG();
 
@@ -503,6 +530,8 @@ static void set_aliased_prot(void *v, pgprot_t prot)
BUG();
} else
kmap_flush_unused();
+
+   preempt_enable();
 }
 
 static void xen_alloc_ldt(struct desc_struct *ldt, unsigned entries)
@@ -510,6 +539,17 @@ static void xen_alloc_ldt(struct desc_struct *ldt, 
unsigned entries)
const unsigned entries_per_page = PAGE_SIZE / LDT_ENTRY_SIZE;
int i;
 
+   /*
+* We need to mark the all aliases of the LDT pages RO.  We
+* don't need to call vm_flush_aliases(), though, since that's
+* only responsible for flushing aliases out the TLBs, not the
+* page tables, and Xen will flush the TLB for us if needed.
+*
+* To avoid confusing future readers: none of this is necessary
+* to load the LDT.  The hypervisor only checks this when the
+* LDT is faulted in due to subsequent descriptor access.
+*/
+
for(i = 0; i  entries; i += entries_per_page)
set_aliased_prot(ldt + i, PAGE_KERNEL_RO);
 }
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [libvirt test] 60880: regressions - FAIL

2015-08-27 Thread osstest service owner
flight 60880 libvirt real [real]
http://logs.test-lab.xenproject.org/osstest/logs/60880/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 16 
guest-start/debianhvm.repeat fail REGR. vs. 60848

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-libvirt-vhd  9 debian-di-installfail   never pass
 test-armhf-armhf-libvirt-raw  9 debian-di-installfail   never pass
 test-armhf-armhf-libvirt-qcow2  9 debian-di-installfail never pass
 test-armhf-armhf-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-xsm 14 guest-saverestorefail   never pass
 test-amd64-i386-libvirt-xsm  12 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt 14 guest-saverestorefail   never pass
 test-armhf-armhf-libvirt 12 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 12 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-pair 21 guest-migrate/src_host/dst_host fail never 
pass
 test-amd64-i386-libvirt-pair 21 guest-migrate/src_host/dst_host fail never pass
 test-amd64-amd64-libvirt-qcow2 11 migrate-support-checkfail never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt  12 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 11 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-vhd  11 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-qcow2 11 migrate-support-checkfail  never pass
 test-amd64-i386-libvirt-raw  11 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-raw 11 migrate-support-checkfail   never pass

version targeted for testing:
 libvirt  bf2788218ad29719467260aa4ecad6dc31c97046
baseline version:
 libvirt  7dc52241b314adb19ca4f0af156d772e3d5a29a5

Last test of basis60848  2015-08-23 18:44:07 Z4 days
Testing same since60880  2015-08-25 23:18:39 Z1 days1 attempts


People who touched revisions under test:
  Erik Skultety eskul...@redhat.com
  Guido Günther a...@sigxcpu.org
  intrigeri intrig...@debian.org
  Laine Stump la...@laine.org
  Luyao Huang lhu...@redhat.com
  Martin Kletzander mklet...@redhat.com
  Sergey Bronnikov serg...@openvz.org
  Tomas Meszaros e...@tty.sk
  Vasiliy Tolstov v.tols...@selfip.ru

jobs:
 build-amd64-xsm  pass
 build-armhf-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-armhf-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-armhf-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm   pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsmfail
 test-amd64-amd64-libvirt-xsm pass
 test-armhf-armhf-libvirt-xsm fail
 test-amd64-i386-libvirt-xsm  pass
 test-amd64-amd64-libvirt pass
 test-armhf-armhf-libvirt fail
 test-amd64-i386-libvirt  pass
 test-amd64-amd64-libvirt-pairfail
 test-amd64-i386-libvirt-pair fail
 test-amd64-amd64-libvirt-qcow2   pass
 test-armhf-armhf-libvirt-qcow2   fail
 test-amd64-i386-libvirt-qcow2pass
 test-amd64-amd64-libvirt-raw pass
 test-armhf-armhf-libvirt-raw fail
 test-amd64-i386-libvirt-raw  pass
 test-amd64-amd64-libvirt-vhd pass
 test-armhf-armhf-libvirt-vhd fail
 test-amd64-i386-libvirt-vhd  pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: 

Re: [Xen-devel] [PATCH V4 2/4] x86/xsaves: enable xsaves/xrstors in xen

2015-08-27 Thread Shuai Ruan
On Wed, Aug 26, 2015 at 06:35:51AM -0600, Jan Beulich wrote:
  On 26.08.15 at 14:05, andrew.coop...@citrix.com wrote:
  On 26/08/15 12:50, Jan Beulich wrote:
  On 26.08.15 at 12:12, andrew.coop...@citrix.com wrote:
  On 25/08/15 11:54, Shuai Ruan wrote:
  --- a/xen/arch/x86/traps.c
  +++ b/xen/arch/x86/traps.c
  @@ -936,9 +936,10 @@ void pv_cpuid(struct cpu_user_regs *regs)
   if ( regs-_ecx == 1 )
   {
   a = XSTATE_FEATURE_XSAVEOPT |
  - XSTATE_FEATURE_XSAVEC |
  - (cpu_has_xgetbv1 ? XSTATE_FEATURE_XGETBV1 : 0) |
  - (cpu_has_xsaves ? XSTATE_FEATURE_XSAVES : 0);
  + XSTATE_FEATURE_XSAVEC;
  + /* PV guest will not support xsaves. */
  + /* (cpu_has_xgetbv1 ? XSTATE_FEATURE_XGETBV1 : 0) |
  + (cpu_has_xsaves ? XSTATE_FEATURE_XSAVES : 0); */
  Don't leave this code commented out like this.  Just delete it.
  Agreed, but - mind reminding me again why supporting them for
  PV guests isn't going to work?
  
  xsaves is a cpl0 instruction used to manage state which userspace can't
  be trusted to handle alone.  Xen therefore can't trust PV guests to use
  it either.
  
  The first of these features is Processor Trace.  A PV guest able to use
  xsaves/xrstors would be able to gather trace data of hypervisor
  execution, or cause the trace buffers to clobber arbitrary physical memory.
  
  The features covered by MSR_IA32_XSS can safely be exposed to PV guests,
  but via a hypercall interface.
 
 That covers xsaves, but not xgetbv1 (which also covers XSAVEOPT).
 
That is my error. I will fix it in next version.
 Jan
 
 
Thans for your review,Jan
 ___
 Xen-devel mailing list
 Xen-devel@lists.xen.org
 http://lists.xen.org/xen-devel
 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [BUG] 'o' debug key panics the hypervisor

2015-08-27 Thread Tian, Kevin
 From: Roger Pau Monné [mailto:roger@citrix.com]
 Sent: Thursday, August 27, 2015 10:08 PM
 
 El 27/08/15 a les 16.01, Jan Beulich ha escrit:
  On 27.08.15 at 13:29, roger@citrix.com wrote:
  When using Intel hardware without shared page tables between the IOMMU
  and EPT (I have not tried if the same happens when sharing the page
  tables), the following crash happens if I press the 'o' debug key with
  a HVM guest running. The guest doesn't have any device passed-through.
 
  This last thing is the relevant one: No-one ever cared to use 'o' without
  also passing through a device.
 
 I have to admit I discovered this when I thought I was typing at the
 Dom0 console but it turns out I had it switched to the Xen console...
 Anyway, it shouldn't crash Xen.
 
 Roger.

yes, it should be fixed anyway.

Thanks
Kevin

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH V4 2/4] x86/xsaves: enable xsaves/xrstors in xen

2015-08-27 Thread Shuai Ruan
On Wed, Aug 26, 2015 at 11:12:02AM +0100, Andrew Cooper wrote:
 On 25/08/15 11:54, Shuai Ruan wrote:
  This patch uses xsaves/xrstors instead of xsaveopt/xrstor
  to perform the xsave_area switching so that xen itself
  can benefit from them when available.
 
  For xsaves/xrstors only use compact format. Add format conversion
  support when perform guest os migration.
 
  Signed-off-by: Shuai Ruan shuai.r...@linux.intel.com
  ---
   xen/arch/x86/domain.c  |   2 +
   xen/arch/x86/domctl.c  |  34 ---
   xen/arch/x86/hvm/hvm.c |  19 ++---
   xen/arch/x86/i387.c|   4 ++
   xen/arch/x86/traps.c   |   7 ++--
   xen/arch/x86/xstate.c  | 112 
  ++---
   6 files changed, 132 insertions(+), 46 deletions(-)
 
  diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
  index 045f6ff..7b8f649 100644
  --- a/xen/arch/x86/domain.c
  +++ b/xen/arch/x86/domain.c
  @@ -1529,6 +1529,8 @@ static void __context_switch(void)
   if ( xcr0 != get_xcr0()  !set_xcr0(xcr0) )
   BUG();
   }
  +if ( cpu_has_xsaves )
  +wrmsrl(MSR_IA32_XSS, n-arch.msr_ia32_xss);
 
 This is still not appropriate.  You need a per cpu variable and only
 perform lazy writes of the MSR.
 
Ok.I will introduce two helper
void set_xss_msr(uint64_t msr_xss);
uint64_t get_xss_msr();
to perform lazy writes of the MSR.
   vcpu_restore_fpu_eager(n);
   n-arch.ctxt_switch_to(n);
   }
  diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
  index bf62a88..da136876 100644
  --- a/xen/arch/x86/domctl.c
  +++ b/xen/arch/x86/domctl.c
  @@ -867,6 +867,7 @@ long arch_do_domctl(
   if ( domctl-cmd == XEN_DOMCTL_getvcpuextstate )
   {
   unsigned int size;
  +void * xsave_area;
 
 Unnecessary space.
 
Sorry.
   
   ret = 0;
   vcpu_pause(v);
  @@ -896,9 +897,28 @@ long arch_do_domctl(
   ret = -EFAULT;
   
   offset += sizeof(v-arch.xcr0_accum);
  -if ( !ret  copy_to_guest_offset(evc-buffer, offset,
  -  (void *)v-arch.xsave_area,
  -  size - 2 * sizeof(uint64_t)) 
  )
  +
  +if ( cpu_has_xsaves )
 
 I think this would be more correct to gate on v-arch.xsave_area
 actually being compressed.
 
I will introduce a helper called 
bool_t xsave_area_compressed(struct xsave_struct *xsave_area)
to check whether the area is compressed or not.
  +{
  +xsave_area = xmalloc_bytes(size);
  +if ( !xsave_area )
  +{
  +ret = -ENOMEM;
  +goto vcpuextstate_out;
  +}
  +
  +save_xsave_states(v, xsave_area,
  +  evc-size - 2*sizeof(uint64_t));
 
 And on a similar note, save_xsave_states() is really
 uncompress_xsave_area().  On further consideration, it should ASSERT()
 that the source is currently compressed.
 
Ok.
  +
  +if ( !ret  copy_to_guest_offset(evc-buffer, offset,
  + xsave_area, size -
  + 2 * sizeof(uint64_t)) )
  + ret = -EFAULT;
  + xfree(xsave_area);
  +   }
  +   else if ( !ret  copy_to_guest_offset(evc-buffer, offset,
  + (void 
  *)v-arch.xsave_area,
  + size - 2 * 
  sizeof(uint64_t)) )
   ret = -EFAULT;
   
   vcpu_unpause(v);
  @@ -954,8 +974,12 @@ long arch_do_domctl(
   v-arch.xcr0_accum = _xcr0_accum;
   if ( _xcr0_accum  XSTATE_NONLAZY )
   v-arch.nonlazy_xstate_used = 1;
  -memcpy(v-arch.xsave_area, _xsave_area,
  -   evc-size - 2 * sizeof(uint64_t));
  +if ( cpu_has_xsaves )
  +load_xsave_states(v, (void *)_xsave_area,
  +  evc-size - 2*sizeof(uint64_t));
 
 Similarly, load_xsave_states() is really compress_xsave_area().
 
 You should check to see whether the area is already compressed, and just
 memcpy() in that case.
 
Ok.
  +else
  +memcpy(v-arch.xsave_area, (void *)_xsave_area,
  +   evc-size - 2 * sizeof(uint64_t));
   vcpu_unpause(v);
   }
   else
  diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
  index c957610..dc444ac 100644
  --- a/xen/arch/x86/hvm/hvm.c
  +++ b/xen/arch/x86/hvm/hvm.c
  @@ -2148,8 +2148,12 @@ static int hvm_save_cpu_xsave_states(struct domain 
  *d, hvm_domain_context_t *h)
   ctxt-xfeature_mask = xfeature_mask;
   ctxt-xcr0 = v-arch.xcr0;
   ctxt-xcr0_accum = 

Re: [Xen-devel] [PATCH] x86/vmx: fix vmx_is_singlestep_supported return value

2015-08-27 Thread Tian, Kevin
 From: Tamas K Lengyel [mailto:ta...@tklengyel.com]
 Sent: Tuesday, August 25, 2015 3:56 AM
 
 The function supposed to return a boolean but instead it returned
 the value 0x800 which is the Intel internal flag for MTF. This has
 caused various checks using this function to falsely report no MTF
 capability.
 
 Signed-off-by: Tamas K Lengyel tleng...@novetta.com

Acked-by: Kevin Tian kevin.t...@intel.com

Thanks,
Kevin

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH V4 1/4] x86/xsaves: add basic definitions/helpers to support xsaves

2015-08-27 Thread Shuai Ruan
On Wed, Aug 26, 2015 at 10:47:22AM +0100, Andrew Cooper wrote:
 On 25/08/2015 11:54, Shuai Ruan wrote:
  This patch add basic definitions/helpers which will be used in
  later patches.
 
  Signed-off-by: Shuai Ruan shuai.r...@linux.intel.com
 
 Thankyou - this series is looking far better now.
 
  ---
   xen/arch/x86/xstate.c| 137 
  +++
   xen/include/asm-x86/cpufeature.h |   4 ++
   xen/include/asm-x86/domain.h |   1 +
   xen/include/asm-x86/msr-index.h  |   2 +
   xen/include/asm-x86/xstate.h |  11 +++-
   5 files changed, 154 insertions(+), 1 deletion(-)
 
  diff --git a/xen/arch/x86/xstate.c b/xen/arch/x86/xstate.c
  index d5f5e3b..3986515 100644
  --- a/xen/arch/x86/xstate.c
  +++ b/xen/arch/x86/xstate.c
  @@ -29,6 +29,10 @@ static u32 __read_mostly xsave_cntxt_size;
   /* A 64-bit bitmask of the XSAVE/XRSTOR features supported by processor. */
   u64 __read_mostly xfeature_mask;
   
  +unsigned int *xstate_offsets, *xstate_sizes;
  +static unsigned int xstate_features;
  +static unsigned int xstate_comp_offsets[sizeof(xfeature_mask)*8];
 
 All of these should be tagged as __read_mostly.
 
Ok.
  +
   /* Cached xcr0 for fast read */
   static DEFINE_PER_CPU(uint64_t, xcr0);
   
  @@ -65,6 +69,139 @@ uint64_t get_xcr0(void)
   return this_cpu(xcr0);
   }
   
  +static void setup_xstate_features(void)
  +{
  +unsigned int eax, ebx, ecx, edx, leaf = 0x2;
  +unsigned int i;
  +
  +xstate_features = fls(xfeature_mask);
  +xstate_offsets = xzalloc_array(unsigned int, xstate_features);
  +xstate_sizes = xzalloc_array(unsigned int, xstate_features);
 
 These can both be plain xmalloc_array(), as we unconditionally set every
 member below.
 
 As a separate patch (or possibly this one if it isn't too big), you now
 need to modify the xstate initialisation to return int rather than
 void.  You must also check for allocation failures and bail with
 -ENOMEM.  It is fine for an error like this to be fatal to system or AP
 startup.
 
 Also, merging review from patch 2, this function gets called once per
 pcpu, meaning that you waste a moderate quantity of memory and
 repeatedly clobber the xstate_{offsets,sizes} pointers.  You should pass
 in bool_t bsp and, if bsp, allocate the arrays and read out, while if
 not bsp, check that the ap is identical to the bsp.
 
Ok.
  +
  +for (i = 0; i  xstate_features ; i++)
 
 Style: spaces inside braces, and there shouldn't be one between
 xstate_features;
 
 Also, you can remove i entirely, and use for ( leaf = 2; leaf 
 xstate_features; ++i )
 
Sorry.
  +{
  +cpuid_count(XSTATE_CPUID, leaf, eax, ebx, ecx, edx);
  +
  +xstate_offsets[leaf] = ebx;
  +xstate_sizes[leaf] = eax;
 
 You can drop the ebx and eax temporaries, and as ecx and edx are only
 around as discard variables, you can get away with having a single tmp
 variable.
 
 Therefore, something like:
 
 cpuid_count(XSTATE_CPUID, leaf, xstate_sizes[leaf],
 xstate_offsets[leaf], tmp, tmp);
 
 which also drops the body of the function to a single statement,
 allowing you to drop the braces.
 
Ok.
  +
  +leaf++;
  +}
  +}
  +
  +static void setup_xstate_comp(void)
 
 This function can get away with being __init, as it is based solely on
 data written by the BSP.
 
 Also merging review from patch 2, it only needs calling once.
 
ok.
  +{
  +unsigned int xstate_comp_sizes[sizeof(xfeature_mask)*8];
  +unsigned int i;
  +
  +/*
  + * The FP xstates and SSE xstates are legacy states. They are always
  + * in the fixed offsets in the xsave area in either compacted form
  + * or standard form.
  + */
  +xstate_comp_offsets[0] = 0;
  +xstate_comp_offsets[1] = XSAVE_SSE_OFFSET;
  +
  +xstate_comp_offsets[2] = FXSAVE_SIZE + XSAVE_HDR_SIZE;
  +
  +for (i = 2; i  xstate_features; i++)
 
 Style - spaces inside braces.
 
Sorry.
  +{
  +if ( (1ull  i)  xfeature_mask )
  +xstate_comp_sizes[i] = xstate_sizes[i];
  +else
  +xstate_comp_sizes[i] = 0;
  +
  +if ( i  2 )
  +xstate_comp_offsets[i] = xstate_comp_offsets[i-1]
  ++ xstate_comp_sizes[i-1];
 
 The entire xstate_comp_sizes array (which is quite large) can be removed
 if you rearrange the loop body as:
 
 xstate_comp_offsets[i] = xstate_comp_offsets[i-1] + (((1ul  i) 
 xfeature_mask) ? xstate_comp_sizes[i-1] : 0);
 
 which also removes the if ( i  2 )
 
  +}
 
 As this point, it is probably worth matching xstate_comp_offsets[i-1]
 against cpuid.0xd[0].ecx.  If hardware and Xen disagree on the maximum
 size of an xsave area, we shouldn't continue.
 
Ok.
  +}
  +
  +static void *get_xsave_addr(struct xsave_struct *xsave, int xstate)
  +{
  +int feature = fls(xstate) - 1;
 
 In each callsite, you have already calculated fls(xstate) - 1.  It would
 be better to pass an unsigned int xfeature_idx and avoid the repeated
 

Re: [Xen-devel] [PATCH V4 3/4] x86/xsaves: enable xsaves/xrstors for hvm guest

2015-08-27 Thread Shuai Ruan
On Wed, Aug 26, 2015 at 11:36:53AM +0100, Andrew Cooper wrote:
 On 25/08/15 11:54, Shuai Ruan wrote:
  This patch enables xsaves for hvm guest, includes:
  1.handle xsaves vmcs init and vmexit.
  2.add logic to write/read the XSS msr.
 
  Signed-off-by: Shuai Ruan shuai.r...@linux.intel.com
 
 Reviewed-by: Andrew Cooper andrew.coop...@citrix.com, given two
 corrections...
 
Thanks.
  ---
   xen/arch/x86/hvm/hvm.c | 43 
  ++
   xen/arch/x86/hvm/vmx/vmcs.c|  6 --
   xen/arch/x86/hvm/vmx/vmx.c | 20 ++
   xen/arch/x86/xstate.c  |  4 ++--
   xen/include/asm-x86/hvm/vmx/vmcs.h |  7 +++
   xen/include/asm-x86/hvm/vmx/vmx.h  |  2 ++
   xen/include/asm-x86/xstate.h   |  1 +
   7 files changed, 79 insertions(+), 4 deletions(-)
 
  diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
  index dc444ac..57612de 100644
  --- a/xen/arch/x86/hvm/hvm.c
  +++ b/xen/arch/x86/hvm/hvm.c
  @@ -4540,6 +4540,33 @@ void hvm_cpuid(unsigned int input, unsigned int 
  *eax, unsigned int *ebx,
   *ebx = _eax + _ebx;
   }
   }
  +if ( count == 1 )
  +{
  +if ( cpu_has_xsaves )
  +{
  +*ebx = XSTATE_AREA_MIN_SIZE;
  +if ( v-arch.xcr0 | v-arch.msr_ia32_xss )
  +for ( sub_leaf = 2; sub_leaf  63; sub_leaf++ )
  +{
  +if ( !((v-arch.xcr0 | v-arch.msr_ia32_xss)
  +   (1ULL  sub_leaf)) )
 
 Stray hard tabs.
 
Ok.
  @@ -4771,6 +4804,16 @@ int hvm_msr_write_intercept(unsigned int msr, 
  uint64_t msr_content,
  return X86EMUL_EXCEPTION;
   break;
   
  +case MSR_IA32_XSS:
  +   /*
  +* skylake only support bit 8 , but it is
  +* not support in xen.
  +*/
 
 I would alter this comment to
 
 /* No XSS features currently supported for guests. */
 
 The skylake bits are covered by cpu_has_xsaves.
 
 ~Andrew
 
Ok.
 ___
 Xen-devel mailing list
Thanks for your reviwe,Andrew
 Xen-devel@lists.xen.org
 http://lists.xen.org/xen-devel
 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] x86/vmx: fix vmx_is_singlestep_supported return value

2015-08-27 Thread Tian, Kevin
 From: Jan Beulich [mailto:jbeul...@suse.com]
 Sent: Tuesday, August 25, 2015 3:59 PM
 
  Tamas K Lengyel tamas.leng...@zentific.com 08/25/15 1:51 AM 
  @@ -1768,7 +1768,7 @@ static void vmx_enable_msr_exit_interception(struct
 domain *d)
 
   static bool_t vmx_is_singlestep_supported(void)
   {
  -return cpu_has_monitor_trap_flag;
  +return cpu_has_monitor_trap_flag ? 1 : 0;
 
  Prevailing style would tend towards !!cpu_has_monitor_trap_flag
 
 Yeap, you are right. If the maintainers prefer I can resend with that style.
 
 This could easily be adjusted upon commit. What I'm wondering is whether this
 is the right place to fix it: Wouldn't it be better for the cpu_has_* macros
 themselves to be adjusted so other (future) users won't fall into the same 
 trap
 (vmx_virtual_intr_delivery_enabled() is a good second example bogusly using
 int as its return type, and once adjusted to bool_t it would break)?
 

I'm OK with original patch. Above example can be taken care when changing
return type.

Thanks
Kevin

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v8 2/3] Differentiate IO/mem resources tracked by ioreq server

2015-08-27 Thread Yu, Zhang



On 8/25/2015 5:40 PM, Wei Liu wrote:

On Sun, Aug 23, 2015 at 05:33:17PM +0800, Yu Zhang wrote:

Currently in ioreq server, guest write-protected ram pages are
tracked in the same rangeset with device mmio resources. Yet
unlike device mmio, which can be in big chunks, the guest write-
protected pages may be discrete ranges with 4K bytes each. This
patch uses a seperate rangeset for the guest ram pages.

Note: Previously, a new hypercall or subop was suggested to map
write-protected pages into ioreq server. However, it turned out
handler of this new hypercall would be almost the same with the
existing pair - HVMOP_[un]map_io_range_to_ioreq_server, and there's
already a type parameter in this hypercall. So no new hypercall
defined, only a new type is introduced.

Signed-off-by: Yu Zhang yu.c.zh...@linux.intel.com
---
  tools/libxc/include/xenctrl.h| 31 
  tools/libxc/xc_domain.c  | 61 


For tools bits:

Acked-by: Wei Liu wei.l...@citrix.com


Thank you, Wei.
To other maintainers, do you have any question or suggestion?
Any advices would be appreciated! Thanks! :)

Yu


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for 4.6] VT-d: Create IOMMU mappings for RMRR regions if shared EPT is not being used

2015-08-27 Thread Tian, Kevin
 From: Chen, Tiejun
 Sent: Thursday, August 27, 2015 5:05 PM
 
 On 8/27/2015 4:40 PM, Malcolm Crossley wrote:
  On 27/08/15 03:59, Chen, Tiejun wrote:
  This kind of issue is already gone.
 
  https://www.mail-archive.com/xen-devel@lists.xen.org/msg32464.html
 
  There is a bug in the code you refer to above which results in no IOMMU 
  page table
  mappings being created if the guest domain is not sharing it's EPT page 
  tables with
  the IOMMU.
 
  set_identity_p2m_entry only configures the EPT page tables and does not 
  configure
  the IOMMU page tables.
 
 Okay, I got what you mean.
 
 Instead, could you insert iommu_{map,unmap_page() into
 {set,clear}_identity_p2m_entry()? I think this can make
 {set,clear}_identity_p2m_entry approachable in all circumstances.
 
 Kevin and Jan,
 
 Is this fine?
 

Yes that is the right thing to do. It's a bit surprise to me that it's
not there yet. :-)

Thanks
Kevin

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] VT-d faults with Integrated Intel graphics on 4.6

2015-08-27 Thread Chen, Tiejun

On 8/27/2015 7:03 PM, Konrad Rzeszutek Wilk wrote:

On Thu, Aug 27, 2015 at 11:06:30AM +0800, Chen, Tiejun wrote:

On 8/25/2015 10:43 PM, Konrad Rzeszutek Wilk wrote:
On Tue, Aug 25, 2015 at 02:55:31PM +0800, Chen, Tiejun wrote:
On 8/25/2015 8:19 AM, Tamas K Lengyel wrote:
Hi everyone,
I saw some people passingly mention this on the list before but just in
case it has been missed, my serial is also being spammed with the following
printouts with both Xen 4.6 RC1 and the latest staging build:

...
(XEN) [VT-D]DMAR:[DMA Read] Request device [:00:02.0] fault addr
33487d7000, iommu reg = 82c000201000
(XEN) [VT-D]DMAR: reason 06 - PTE Read access is not set
(XEN) [VT-D]DMAR:[DMA Read] Request device [:00:02.0] fault addr
33487d7000, iommu reg = 82c000201000
(XEN) [VT-D]DMAR: reason 06 - PTE Read access is not set
(XEN) [VT-D]DMAR:[DMA Read] Request device [:00:02.0] fault addr
33487d7000, iommu reg = 82c000201000
(XEN) [VT-D]DMAR: reason 06 - PTE Read access is not set
(XEN) [VT-D]DMAR:[DMA Read] Request device [:00:02.0] fault addr
33487d7000, iommu reg = 82c000201000
(XEN) [VT-D]DMAR: reason 06 - PTE Read access is not set
(XEN) [VT-D]DMAR:[DMA Read] Request device [:00:02.0] fault addr
2610742000, iommu reg = 82c000201000
(XEN) [VT-D]DMAR: reason 07 - Next page table ptr is invalid
...


What's your platform? BDW? And how much memory is set to your guest OS?

Is see this as well. But oddly enough - only when I use the AMT feature
(normally I just use serial console on the machine).

The platform is /DQ67SW, BIOS
SWQ6710H.86A.0066.2012.1105.1504 11/05/2012

There is no guest OS - this is initial domain. And I boot with 2GB:
  Released 0 page(s)

Xen: [mem 0x-0x00099fff] usable
Xen: [mem 0x0009a800-0x000f] reserved
Xen: [mem 0x0010-0x1fff] usable
Xen: [mem 0x2000-0x201f] reserved
Xen: [mem 0x2020-0x3fff] usable
Xen: [mem 0x4000-0x401f] reserved
Xen: [mem 0x4020-0x80465fff] usable
Xen: [mem 0x80466000-0x9e855fff] unusable
Xen: [mem 0x9e856000-0x9e85efff] ACPI data
Xen: [mem 0x9e85f000-0x9e8a9fff] ACPI NVS
Xen: [mem 0x9e8aa000-0x9e8b1fff] unusable
Xen: [mem 0x9e8b2000-0x9e9a4fff] reserved
Xen: [mem 0x9e9a5000-0x9e9a6fff] unusable
Xen: [mem 0x9e9a7000-0x9ebc5fff] reserved
Xen: [mem 0x9ebc6000-0x9ebc6fff] unusable
Xen: [mem 0x9ebc7000-0x9ebd6fff] reserved
Xen: [mem 0x9ebd7000-0x9ebf4fff] ACPI NVS
Xen: [mem 0x9ebf5000-0x9ec18fff] reserved
Xen: [mem 0x9ec19000-0x9ec5bfff] ACPI NVS
Xen: [mem 0x9ec5c000-0x9ee7bfff] reserved
Xen: [mem 0x9ee7c000-0x9eff] unusable
Xen: [mem 0x9f80-0xbf9f] reserved
Xen: [mem 0xfec0-0xfec00fff] reserved
Xen: [mem 0xfed1c000-0xfed3] reserved
Xen: [mem 0xfed9-0xfed91fff] reserved
Xen: [mem 0xfee0-0xfeef] reserved
Xen: [mem 0xff00-0x] reserved
Xen: [mem 0x0001-0x00043e5f] unusable


Just at first glance to fault address, this seems be issued from some

As you see those fault addresses are out of the normal memory range  here.

known erratas on BDS and SKL.

I am runnig v4.2-rc8.

So I really doubt this is related to some erratas. Currently the pre-fetch
unit of IOMMU unit dedicated to IGD can't work well on some platforms, so
you can see these wired faults.


Do you have some ideas for a solution/patch?


I can't reproduce this on my BDW. And this is my assumption as well. So 
could anyone provide a log with iommu=debug?


Thanks
Tiejun



Thanks!



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH V4 1/4] x86/xsaves: add basic definitions/helpers to support xsaves

2015-08-27 Thread Shuai Ruan
On Wed, Aug 26, 2015 at 06:53:06AM -0600, Jan Beulich wrote:
  On 26.08.15 at 13:41, jbeul...@suse.com wrote:
  On 26.08.15 at 11:47, andrew.coop...@citrix.com wrote:
  On 25/08/2015 11:54, Shuai Ruan wrote:
  --- a/xen/include/asm-x86/cpufeature.h
  +++ b/xen/include/asm-x86/cpufeature.h
  @@ -153,6 +153,10 @@
   #define X86_FEATURE_RDSEED   (7*32+18) /* RDSEED instruction */
   #define X86_FEATURE_ADX  (7*32+19) /* ADCX, ADOX instructions */
   #define X86_FEATURE_SMAP (7*32+20) /* Supervisor Mode Access Prevention 
  */
  +#define XSAVEOPT (1  0)
  +#define XSAVEC   (1  1)
  +#define XGETBV1  (1  2)
  +#define XSAVES   (1  3)
  
  This absolutely must be X86_FEATURE_* for consistency, and you will need
  to introduce a new capability word.
  
  Or even better re-use word 2.
 
 And make sure to replace the then redundant XSTATE_FEATURE_*
 definitions in xstate.h.
 
Ok.Thanks I will fix these in next version.
 Jan
 
 
 ___
 Xen-devel mailing list
 Xen-devel@lists.xen.org
 http://lists.xen.org/xen-devel
 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [Draft B] Boot ABI for HVM guests without a device-model

2015-08-27 Thread Andrew Cooper
On 27/08/15 09:04, Jan Beulich wrote:
 On 26.08.15 at 16:44, roger@citrix.com wrote:
 El 26/08/15 a les 14.12, Andrew Cooper ha escrit:
 On 26/08/15 13:00, Jan Beulich wrote:
 This structure is guaranteed to always be placed in memory after the
 DYM These structures are ...?

 loaded kernel and modules.
 There is no requirement for the command line/module information to be
 after the loaded kernel.  All it needs to do is not overlap.
 IMHO, this is helpful in order to get last used physical address, after
 which free memory starts. Current FreeBSD implementation relies on this,
 if we didn't do it that way I would have to calculate where the symtab +
 strtab ends, which is more complex.
 But the statement leaves open whether there is any free memory at
 all after those structures, or whether instead all free memory lives
 at lower addresses. Nor do I consider it appropriate to take a present
 (one might say overly simplistic) implementation as a basis for setting
 arbitrary restrictions.

I agree.  This sounds like a FreeBSD bug, and absolutely shouldn't be a
written restriction in the boot ABI.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [Draft B] Boot ABI for HVM guests without a device-model

2015-08-27 Thread Roger Pau Monné
El 27/08/15 a les 11.43, Andrew Cooper ha escrit:
 On 27/08/15 09:04, Jan Beulich wrote:
 On 26.08.15 at 16:44, roger@citrix.com wrote:
 El 26/08/15 a les 14.12, Andrew Cooper ha escrit:
 On 26/08/15 13:00, Jan Beulich wrote:
 This structure is guaranteed to always be placed in memory after the
 DYM These structures are ...?

 loaded kernel and modules.
 There is no requirement for the command line/module information to be
 after the loaded kernel.  All it needs to do is not overlap.
 IMHO, this is helpful in order to get last used physical address, after
 which free memory starts. Current FreeBSD implementation relies on this,
 if we didn't do it that way I would have to calculate where the symtab +
 strtab ends, which is more complex.
 But the statement leaves open whether there is any free memory at
 all after those structures, or whether instead all free memory lives
 at lower addresses. Nor do I consider it appropriate to take a present
 (one might say overly simplistic) implementation as a basis for setting
 arbitrary restrictions.

Can we just state that the hvm_start_info structure and associated 
metadata is placed after the loaded kernel and modules?

Whether there's free memory or not after this is something that the 
kernel has to figure out by itself, and I wasn't planning to add such a 
statement to the specification.

 I agree.  This sounds like a FreeBSD bug, and absolutely shouldn't be a
 written restriction in the boot ABI.

Bug? The FreeBSD native loader passes to the FreeBSD kernel the last 
used address, after which free memory starts. IMHO, it is not a bug, 
it's just how FreeBSD boots. I understand that Linux might not pass 
such a parameter, and there are other ways I can use to find this, but 
they are more complex. 

We already did something very similar with PV guests, see the comment 
before the start_info structure:

 *  3. This the order of bootstrap elements in the initial virtual region:
 *  a. relocated kernel image
 *  b. initial ram disk  [mod_start, mod_len]
 * (may be omitted)
 *  c. list of allocated page frames [mfn_list, nr_pages]
 * (unless relocated due to XEN_ELFNOTE_INIT_P2M)
 *  d. start_info_t structure[register ESI (x86)]
 * in case of dom0 this page contains the console info, too
 *  e. unless dom0: xenstore ring page
 *  f. unless dom0: console ring page
 *  g. bootstrap page tables [pt_base and CR3 (x86)]
 *  h. bootstrap stack   [register ESP (x86)]

IMHO it is important to mention how things are loaded into memory, and 
placing the hvm_start_info struct after the loaded kernel and modules 
is also the most natural way to do it, I don't foresee this changing in 
the future.

Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] x86/NUMA: make init_node_heap() respect Xen heap limit

2015-08-27 Thread Andrew Cooper
On 27/08/15 09:37, Jan Beulich wrote:
 On NUMA systems, where we try to use node local memory for the basic
 control structures of the buddy allocator, this special case needs to
 take into consideration a possible address width limit placed on the
 Xen heap. In turn this (but also other, more abstract considerations)
 requires that xenheap_max_mfn() not be called more than once (at most
 we might permit it to be called a second time with a larger value than
 was passed the first time), and be called only before calling
 end_boot_allocator().

 While inspecting all the involved code, a couple of off-by-one issues
 were found (and are being corrected here at once):
 - arch_init_memory() cleared one too many page table slots
 - the highmem_start based invocation of xenheap_max_mfn() passed too
   big a value
 - xenheap_max_mfn() calculated the wrong bit count in edge cases

 Signed-off-by: Jan Beulich jbeul...@suse.com

Reviewed-by: Andrew Cooper andrew.coop...@citrix.com

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] tmem vs Xen 4.6

2015-08-27 Thread Jan Beulich
Konrad,

I thought I'd remind you of the some progress per release criteria
to avoid it becoming subject to removal. The most recent changes
date back to spring 2014 afaics...

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC v2 for-4.6 0/2] In-tree feature documentation

2015-08-27 Thread Andrew Cooper
On 27/08/15 03:44, Jim Fehlig wrote:
 On 08/25/2015 04:40 AM, Andrew Cooper wrote:
 An issue which Xen has is an uncertain support statement for features.
 Given the success seen with docs/misc/xen-command-line.markdown, and in
 particular keeping it up to date, introduce a similar system for
 features.

 Patch 1 introduces a proposed template (and a makefile tweak to include
 the new docs/features subdirectory), while patch 2 is a feature document
 covering the topic of migration.

 v2 Adds %Revision and #History table, following feedback from v1.

 This is tagged RFC as I expect people to have different views as to what
 is useful to include.  I would particilarly appreciate feedback on the
 template before it starts getting used widely.

 Lars: Does this look like a reasonable counterpart to your formal
 support statement document?

 Jim: Per your request at the summit for new information, is patch 2
 suitable?

 Yes. It provides excellent info, with pointers to dig deeper for those
 interested.

 Your proposal does raise the bar for feature contribution, but I'm not
 active enough in the Xen community to know if folks are supportive of
 the additional overhead. Would new features require a feature doc
 before committing?

Unless there are some screaming objections, I am hoping yes, and that
the document becomes the authoritative support statement for the
feature.  (This is currently a real problem working out the security
status of bugs in features lacking a concrete statement of support.)

Writing one of these documents is not hard (I hope for there to soon be
many examples to refer to), and I would like to see it become
common-practice to have as $N/$N on a series.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 6/8] tmem: Remove the old tmem control XSM checks as it is part of sysctl hypercall.

2015-08-27 Thread Konrad Rzeszutek Wilk
The sysctl is where the tmem control operations are done and the
XSM checks are done via there. The old mechanism (to check
for control tmem op XSM from do_tmem_op) is not needed anymore.

CC:  Daniel De Graaf dgde...@tycho.nsa.gov
Signed-off-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
---
 xen/include/xsm/dummy.h | 6 --
 xen/include/xsm/xsm.h   | 6 --
 xen/xsm/dummy.c | 1 -
 xen/xsm/flask/hooks.c   | 6 --
 xen/xsm/flask/policy/access_vectors | 2 +-
 5 files changed, 1 insertion(+), 20 deletions(-)

diff --git a/xen/include/xsm/dummy.h b/xen/include/xsm/dummy.h
index bbbfce7..9fe372c 100644
--- a/xen/include/xsm/dummy.h
+++ b/xen/include/xsm/dummy.h
@@ -427,12 +427,6 @@ static XSM_INLINE int xsm_tmem_op(XSM_DEFAULT_VOID)
 return xsm_default_action(action, current-domain, NULL);
 }
 
-static XSM_INLINE int xsm_tmem_control(XSM_DEFAULT_VOID)
-{
-XSM_ASSERT_ACTION(XSM_PRIV);
-return xsm_default_action(action, current-domain, NULL);
-}
-
 static XSM_INLINE long xsm_do_xsm_op(XEN_GUEST_HANDLE_PARAM(xsm_op_t) op)
 {
 return -ENOSYS;
diff --git a/xen/include/xsm/xsm.h b/xen/include/xsm/xsm.h
index 3678a93..ba3caed 100644
--- a/xen/include/xsm/xsm.h
+++ b/xen/include/xsm/xsm.h
@@ -137,7 +137,6 @@ struct xsm_operations {
 
 int (*page_offline)(uint32_t cmd);
 int (*tmem_op)(void);
-int (*tmem_control)(void);
 
 long (*do_xsm_op) (XEN_GUEST_HANDLE_PARAM(xsm_op_t) op);
 #ifdef CONFIG_COMPAT
@@ -557,11 +556,6 @@ static inline int xsm_tmem_op(xsm_default_t def)
 return xsm_ops-tmem_op();
 }
 
-static inline int xsm_tmem_control(xsm_default_t def)
-{
-return xsm_ops-tmem_control();
-}
-
 static inline long xsm_do_xsm_op (XEN_GUEST_HANDLE_PARAM(xsm_op_t) op)
 {
 return xsm_ops-do_xsm_op(op);
diff --git a/xen/xsm/dummy.c b/xen/xsm/dummy.c
index 21b1bf8..72eba40 100644
--- a/xen/xsm/dummy.c
+++ b/xen/xsm/dummy.c
@@ -112,7 +112,6 @@ void xsm_fixup_ops (struct xsm_operations *ops)
 
 set_to_dummy_if_null(ops, page_offline);
 set_to_dummy_if_null(ops, tmem_op);
-set_to_dummy_if_null(ops, tmem_control);
 set_to_dummy_if_null(ops, hvm_param);
 set_to_dummy_if_null(ops, hvm_control);
 set_to_dummy_if_null(ops, hvm_param_nested);
diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c
index cfad13c..5f5f181 100644
--- a/xen/xsm/flask/hooks.c
+++ b/xen/xsm/flask/hooks.c
@@ -1132,11 +1132,6 @@ static inline int flask_tmem_op(void)
 return domain_has_xen(current-domain, XEN__TMEM_OP);
 }
 
-static inline int flask_tmem_control(void)
-{
-return domain_has_xen(current-domain, XEN__TMEM_CONTROL);
-}
-
 static int flask_add_to_physmap(struct domain *d1, struct domain *d2)
 {
 return domain_has_perm(d1, d2, SECCLASS_MMU, MMU__PHYSMAP);
@@ -1696,7 +1691,6 @@ static struct xsm_operations flask_ops = {
 
 .page_offline = flask_page_offline,
 .tmem_op = flask_tmem_op,
-.tmem_control = flask_tmem_control,
 .hvm_param = flask_hvm_param,
 .hvm_control = flask_hvm_param,
 .hvm_param_nested = flask_hvm_param_nested,
diff --git a/xen/xsm/flask/policy/access_vectors 
b/xen/xsm/flask/policy/access_vectors
index 71495fd..0aa68f8 100644
--- a/xen/xsm/flask/policy/access_vectors
+++ b/xen/xsm/flask/policy/access_vectors
@@ -69,7 +69,7 @@ class xen
 cpupool_op
 # tmem hypercall (any access)
 tmem_op
-# TMEM_CONTROL command of tmem hypercall
+# XEN_SYSCTL_tmem_op command of tmem (part of sysctl)
 tmem_control
 # XEN_SYSCTL_scheduler_op with XEN_DOMCTL_SCHEDOP_getinfo, XEN_SYSCTL_sched_id
 getscheduler
-- 
2.1.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 7/8] tmem: Remove extra spaces at end and some hard tabbing.

2015-08-27 Thread Konrad Rzeszutek Wilk
My editor marks these in red glowing red so removing them to
make it easier to focus on code.

Signed-off-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
---
 xen/common/tmem.c | 30 +++---
 1 file changed, 15 insertions(+), 15 deletions(-)

diff --git a/xen/common/tmem.c b/xen/common/tmem.c
index 1c331ac..66d2852 100644
--- a/xen/common/tmem.c
+++ b/xen/common/tmem.c
@@ -111,7 +111,7 @@ struct tmem_pool {
 atomic_t pgp_count;
 int pgp_count_max;
 long obj_count;  /* atomicity depends on pool_rwlock held for write */
-long obj_count_max;  
+long obj_count_max;
 unsigned long objnode_count, objnode_count_max;
 uint64_t sum_life_cycles;
 uint64_t sum_evicted_cycles;
@@ -1092,7 +1092,7 @@ static int shared_pool_quit(struct tmem_pool *pool, 
domid_t cli_id)
 
 ASSERT(is_shared(pool));
 ASSERT(pool-client != NULL);
-
+
 ASSERT_WRITELOCK(tmem_rwlock);
 pool_destroy_objs(pool, cli_id);
 list_for_each_entry(sl,pool-share_list, share_list)
@@ -1180,7 +1180,7 @@ static struct client *client_create(domid_t cli_id)
 }
 if ( !d-is_dying ) {
 d-tmem_client = client;
-   client-domain = d;
+client-domain = d;
 }
 rcu_unlock_domain(d);
 
@@ -1229,7 +1229,7 @@ static bool_t client_over_quota(struct client *client)
 int total = _atomic_read(client_weight_total);
 
 ASSERT(client != NULL);
-if ( (total == 0) || (client-weight == 0) || 
+if ( (total == 0) || (client-weight == 0) ||
   (client-eph_count == 0) )
 return 0;
 return ( ((global_eph_count*100L) / client-eph_count ) 
@@ -1414,7 +1414,7 @@ static int do_tmem_put_compress(struct 
tmem_page_descriptor *pgp, xen_pfn_t cmfn
 void *dst, *p;
 size_t size;
 int ret = 0;
-
+
 ASSERT(pgp != NULL);
 ASSERT(pgp-us.obj != NULL);
 ASSERT_SPINLOCK(pgp-us.obj-obj_spinlock);
@@ -1559,7 +1559,7 @@ refind:
 {
 /* no puts allowed into a frozen pool (except dup puts) */
 if ( client-frozen )
-   goto unlock_obj;
+   goto unlock_obj;
 }
 }
 else
@@ -1572,10 +1572,10 @@ refind:
 
 write_lock(pool-pool_rwlock);
 /*
-* Parallel callers may already allocated obj and inserted to 
obj_rb_root
-* before us.
-*/
-if (!obj_rb_insert(pool-obj_rb_root[oid_hash(oidp)], obj))
+ * Parallel callers may already allocated obj and inserted to 
obj_rb_root
+ * before us.
+ */
+if ( !obj_rb_insert(pool-obj_rb_root[oid_hash(oidp)], obj) )
 {
 tmem_free(obj, pool);
 write_unlock(pool-pool_rwlock);
@@ -1945,7 +1945,7 @@ static int do_tmem_new_pool(domid_t this_cli_id,
  (client-shared_auth_uuid[i][1] == uuid_hi) )
 break;
 if ( i == MAX_GLOBAL_SHARED_POOLS )
-   {
+   {
 tmem_client_info(Shared auth failed, create non shared pool 
instead!\n);
 pool-shared = 0;
 goto out;
@@ -2107,7 +2107,7 @@ static int tmemc_list_client(struct client *c, 
tmem_cli_va_param_t buf,
  p-obj_count, p-obj_count_max,
  p-objnode_count, p-objnode_count_max,
  p-good_puts, p-puts,p-dup_puts_flushed, p-dup_puts_replaced,
- p-no_mem_puts, 
+ p-no_mem_puts,
  p-found_gets, p-gets,
  p-flushs_found, p-flushs, p-flush_objs_found, p-flush_objs);
 if ( sum + n = len )
@@ -2146,7 +2146,7 @@ static int tmemc_list_shared(tmem_cli_va_param_t buf, int 
off, uint32_t len,
  p-obj_count, p-obj_count_max,
  p-objnode_count, p-objnode_count_max,
  p-good_puts, p-puts,p-dup_puts_flushed, p-dup_puts_replaced,
- p-no_mem_puts, 
+ p-no_mem_puts,
  p-found_gets, p-gets,
  p-flushs_found, p-flushs, p-flush_objs_found, p-flush_objs);
 if ( sum + n = len )
@@ -2456,7 +2456,7 @@ static int tmemc_save_get_next_page(int cli_id, uint32_t 
pool_id,
 /* process the first one */
 pool-cur_pgp = pgp = list_entry((pool-persistent_page_list)-next,
  struct tmem_page_descriptor,us.pool_pers_pages);
-} else if ( list_is_last(pool-cur_pgp-us.pool_pers_pages, 
+} else if ( list_is_last(pool-cur_pgp-us.pool_pers_pages,
  pool-persistent_page_list) )
 {
 /* already processed the last one in the list */
@@ -2504,7 +2504,7 @@ static int tmemc_save_get_next_inv(int cli_id, 
tmem_cli_va_param_t buf,
 pgp = list_entry((client-persistent_invalidated_list)-next,
  struct tmem_page_descriptor,client_inv_pages);
 client-cur_pgp = pgp;
-} else if ( list_is_last(client-cur_pgp-client_inv_pages, 
+} else if ( list_is_last(client-cur_pgp-client_inv_pages,
  

[Xen-devel] [PATCH v2 5/8] tmem: Move TMEM_CONTROL subop of tmem hypercall to sysctl.

2015-08-27 Thread Konrad Rzeszutek Wilk
The operations are to be used by an control domain to set parameters,
list pools, clients, and to be used during migration.

There is no need to have them in the tmem hypercall path.

This patch moves code without adding fixes - and in fact in
some cases makes the parameters soo long that they hurt eyes - but
that is for another patch.

Note that in regards to existing users:

 - Only the control domain could call it - which meant that if
   a guest called it would get -EPERM, so we are OK there.
   In practice no guests called this TMEM_CONTROL command.

 - The spec: 
https://oss.oracle.com/projects/tmem/dist/documentation/api/tmemspec-v001.pdf
   mentions: TBD [Not sure if this is really needed.]
   which is a carte blanche as any to do this!

Note: The XSM check is the same - we just move it from do_tmem_op
to do_sysctl.
Signed-off-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
---
 tools/libxc/xc_tmem.c  | 104 -
 tools/libxl/libxl.c|  14 ++--
 tools/python/xen/lowlevel/xc/xc.c  |  20 ++---
 tools/xenstat/libxenstat/src/xenstat.c |   4 +-
 xen/common/sysctl.c|   7 +-
 xen/common/tmem.c  | 136 -
 xen/include/public/sysctl.h|  44 +++
 xen/include/public/tmem.h  |  39 +-
 xen/include/xen/tmem.h |   3 +
 xen/include/xen/tmem_xen.h |   1 -
 xen/xsm/flask/hooks.c  |   3 +
 11 files changed, 195 insertions(+), 180 deletions(-)

diff --git a/tools/libxc/xc_tmem.c b/tools/libxc/xc_tmem.c
index 467001b..2f6ea17 100644
--- a/tools/libxc/xc_tmem.c
+++ b/tools/libxc/xc_tmem.c
@@ -47,27 +47,27 @@ static int do_tmem_op(xc_interface *xch, tmem_op_t *op)
 
 int xc_tmem_control(xc_interface *xch,
 int32_t pool_id,
-uint32_t subop,
+uint32_t cmd,
 uint32_t cli_id,
 uint32_t arg1,
 uint32_t arg2,
 void *buf)
 {
-tmem_op_t op;
+DECLARE_SYSCTL;
 DECLARE_HYPERCALL_BOUNCE(buf, arg1, XC_HYPERCALL_BUFFER_BOUNCE_OUT);
 int rc;
 
-op.cmd = TMEM_CONTROL;
-op.pool_id = pool_id;
-op.u.ctrl.subop = subop;
-op.u.ctrl.cli_id = cli_id;
-op.u.ctrl.arg1 = arg1;
-op.u.ctrl.arg2 = arg2;
-op.u.ctrl.oid[0] = 0;
-op.u.ctrl.oid[1] = 0;
-op.u.ctrl.oid[2] = 0;
-
-if ( subop == TMEMC_LIST  arg1 != 0 )
+sysctl.cmd = XEN_SYSCTL_tmem_op;
+sysctl.u.tmem_op.pool_id = pool_id;
+sysctl.u.tmem_op.cmd = cmd;
+sysctl.u.tmem_op.cli_id = cli_id;
+sysctl.u.tmem_op.arg1 = arg1;
+sysctl.u.tmem_op.arg2 = arg2;
+sysctl.u.tmem_op.oid[0] = 0;
+sysctl.u.tmem_op.oid[1] = 0;
+sysctl.u.tmem_op.oid[2] = 0;
+
+if ( cmd == XEN_SYSCTL_TMEM_OP_LIST  arg1 != 0 )
 {
 if ( buf == NULL )
 {
@@ -81,11 +81,11 @@ int xc_tmem_control(xc_interface *xch,
 }
 }
 
-set_xen_guest_handle(op.u.ctrl.buf, buf);
+set_xen_guest_handle(sysctl.u.tmem_op.buf, buf);
 
-rc = do_tmem_op(xch, op);
+rc = do_sysctl(xch, sysctl);
 
-if (subop == TMEMC_LIST  arg1 != 0)
+if ( cmd == XEN_SYSCTL_TMEM_OP_LIST  arg1 != 0 )
 xc_hypercall_bounce_post(xch, buf);
 
 return rc;
@@ -93,28 +93,28 @@ int xc_tmem_control(xc_interface *xch,
 
 int xc_tmem_control_oid(xc_interface *xch,
 int32_t pool_id,
-uint32_t subop,
+uint32_t cmd,
 uint32_t cli_id,
 uint32_t arg1,
 uint32_t arg2,
 struct tmem_oid oid,
 void *buf)
 {
-tmem_op_t op;
+DECLARE_SYSCTL;
 DECLARE_HYPERCALL_BOUNCE(buf, arg1, XC_HYPERCALL_BUFFER_BOUNCE_OUT);
 int rc;
 
-op.cmd = TMEM_CONTROL;
-op.pool_id = pool_id;
-op.u.ctrl.subop = subop;
-op.u.ctrl.cli_id = cli_id;
-op.u.ctrl.arg1 = arg1;
-op.u.ctrl.arg2 = arg2;
-op.u.ctrl.oid[0] = oid.oid[0];
-op.u.ctrl.oid[1] = oid.oid[1];
-op.u.ctrl.oid[2] = oid.oid[2];
-
-if ( subop == TMEMC_LIST  arg1 != 0 )
+sysctl.cmd = XEN_SYSCTL_tmem_op;
+sysctl.u.tmem_op.pool_id = pool_id;
+sysctl.u.tmem_op.cmd = cmd;
+sysctl.u.tmem_op.cli_id = cli_id;
+sysctl.u.tmem_op.arg1 = arg1;
+sysctl.u.tmem_op.arg2 = arg2;
+sysctl.u.tmem_op.oid[0] = oid.oid[0];
+sysctl.u.tmem_op.oid[1] = oid.oid[1];
+sysctl.u.tmem_op.oid[2] = oid.oid[2];
+
+if ( cmd == XEN_SYSCTL_TMEM_OP_LIST  arg1 != 0 )
 {
 if ( buf == NULL )
 {
@@ -128,11 +128,11 @@ int xc_tmem_control_oid(xc_interface *xch,
 }
 }
 
-set_xen_guest_handle(op.u.ctrl.buf, buf);
+set_xen_guest_handle(sysctl.u.tmem_op.buf, buf);
 
-rc = do_tmem_op(xch, op);
+rc = do_sysctl(xch, sysctl);
 
-if (subop == TMEMC_LIST  arg1 != 0)
+if ( 

[Xen-devel] [PATCH v2 3/8] tmem: remove in xc_tmem_control_oid duplicate set_xen_guest_handle call

2015-08-27 Thread Konrad Rzeszutek Wilk
We are doing another call to set_xen_guest_handle right
after the xc_hypercall_bounce_pre (the correct place to do it).

Signed-off-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
---
 tools/libxc/xc_tmem.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/tools/libxc/xc_tmem.c b/tools/libxc/xc_tmem.c
index 8352233..0d1bfb4 100644
--- a/tools/libxc/xc_tmem.c
+++ b/tools/libxc/xc_tmem.c
@@ -110,7 +110,6 @@ int xc_tmem_control_oid(xc_interface *xch,
 op.pool_id = pool_id;
 op.u.ctrl.subop = subop;
 op.u.ctrl.cli_id = cli_id;
-set_xen_guest_handle(op.u.ctrl.buf,buf);
 op.u.ctrl.arg1 = arg1;
 op.u.ctrl.arg2 = arg2;
 op.u.ctrl.oid[0] = oid.oid[0];
-- 
2.1.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 2/8] tmem: Add ASSERT in obj_rb_insert for pool-rwlock lock.

2015-08-27 Thread Konrad Rzeszutek Wilk
Manipulating the obj- structures requires us to hold the
pool-rwlock lock. Lets make that obvious in this function to
catch any errant users (none found, but we may in future).

Signed-off-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
---
 xen/common/tmem.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/xen/common/tmem.c b/xen/common/tmem.c
index 572944e..567ccc5 100644
--- a/xen/common/tmem.c
+++ b/xen/common/tmem.c
@@ -915,6 +915,11 @@ static int obj_rb_insert(struct rb_root *root, struct 
tmem_object_root *obj)
 {
 struct rb_node **new, *parent = NULL;
 struct tmem_object_root *this;
+struct tmem_pool *pool;
+
+pool = obj-pool;
+ASSERT(pool != NULL);
+ASSERT_WRITELOCK(pool-pool_rwlock);
 
 new = (root-rb_node);
 while ( *new )
-- 
2.1.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 8/8] tmem: Spelling mistakes.

2015-08-27 Thread Konrad Rzeszutek Wilk
Signed-off-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
---
 xen/common/tmem.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/xen/common/tmem.c b/xen/common/tmem.c
index 66d2852..9bd75d8 100644
--- a/xen/common/tmem.c
+++ b/xen/common/tmem.c
@@ -2659,7 +2659,7 @@ long do_tmem_op(tmem_cli_op_t uops)
 return -EFAULT;
 }
 
-/* Acquire wirte lock for all command at first */
+/* Acquire write lock for all command at first */
 write_lock(tmem_rwlock);
 
 if ( op.cmd == TMEM_CONTROL_MOVED )
-- 
2.1.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 4/8] tmem: Remove xc_tmem_control mystical arg3

2015-08-27 Thread Konrad Rzeszutek Wilk
It mentions it but it is never used. The hypercall interface
knows nothing of this sort of thing either. Lets just remove it.

Signed-off-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
---
 tools/libxc/include/xenctrl.h  |  2 +-
 tools/libxc/xc_tmem.c  | 38 --
 tools/libxl/libxl.c| 10 -
 tools/python/xen/lowlevel/xc/xc.c  |  7 +++
 tools/xenstat/libxenstat/src/xenstat.c |  4 ++--
 5 files changed, 29 insertions(+), 32 deletions(-)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index de3c0ad..9e34769 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -2304,7 +2304,7 @@ int xc_tmem_control_oid(xc_interface *xch, int32_t 
pool_id, uint32_t subop,
 struct tmem_oid oid, void *buf);
 int xc_tmem_control(xc_interface *xch,
 int32_t pool_id, uint32_t subop, uint32_t cli_id,
-uint32_t arg1, uint32_t arg2, uint64_t arg3, void *buf);
+uint32_t arg1, uint32_t arg2, void *buf);
 int xc_tmem_auth(xc_interface *xch, int cli_id, char *uuid_str, int arg1);
 int xc_tmem_save(xc_interface *xch, int dom, int live, int fd, int 
field_marker);
 int xc_tmem_save_extra(xc_interface *xch, int dom, int fd, int field_marker);
diff --git a/tools/libxc/xc_tmem.c b/tools/libxc/xc_tmem.c
index 0d1bfb4..467001b 100644
--- a/tools/libxc/xc_tmem.c
+++ b/tools/libxc/xc_tmem.c
@@ -51,7 +51,6 @@ int xc_tmem_control(xc_interface *xch,
 uint32_t cli_id,
 uint32_t arg1,
 uint32_t arg2,
-uint64_t arg3,
 void *buf)
 {
 tmem_op_t op;
@@ -64,7 +63,6 @@ int xc_tmem_control(xc_interface *xch,
 op.u.ctrl.cli_id = cli_id;
 op.u.ctrl.arg1 = arg1;
 op.u.ctrl.arg2 = arg2;
-/* use xc_tmem_control_oid if arg3 is required */
 op.u.ctrl.oid[0] = 0;
 op.u.ctrl.oid[1] = 0;
 op.u.ctrl.oid[2] = 0;
@@ -220,28 +218,28 @@ int xc_tmem_save(xc_interface *xch,
 uint32_t minusone = -1;
 struct tmem_handle *h;
 
-if ( xc_tmem_control(xch,0,TMEMC_SAVE_BEGIN,dom,live,0,0,NULL) = 0 )
+if ( xc_tmem_control(xch,0,TMEMC_SAVE_BEGIN,dom,live,0,NULL) = 0 )
 return 0;
 
 if ( write_exact(io_fd, marker, sizeof(marker)) )
 return -1;
-version = xc_tmem_control(xch,0,TMEMC_SAVE_GET_VERSION,0,0,0,0,NULL);
+version = xc_tmem_control(xch,0,TMEMC_SAVE_GET_VERSION,0,0,0,NULL);
 if ( write_exact(io_fd, version, sizeof(version)) )
 return -1;
-max_pools = xc_tmem_control(xch,0,TMEMC_SAVE_GET_MAXPOOLS,0,0,0,0,NULL);
+max_pools = xc_tmem_control(xch,0,TMEMC_SAVE_GET_MAXPOOLS,0,0,0,NULL);
 if ( write_exact(io_fd, max_pools, sizeof(max_pools)) )
 return -1;
 if ( version == -1 || max_pools == -1 )
 return -1;
 if ( write_exact(io_fd, minusone, sizeof(minusone)) )
 return -1;
-flags = xc_tmem_control(xch,0,TMEMC_SAVE_GET_CLIENT_FLAGS,dom,0,0,0,NULL);
+flags = xc_tmem_control(xch,0,TMEMC_SAVE_GET_CLIENT_FLAGS,dom,0,0,NULL);
 if ( write_exact(io_fd, flags, sizeof(flags)) )
 return -1;
-weight = 
xc_tmem_control(xch,0,TMEMC_SAVE_GET_CLIENT_WEIGHT,dom,0,0,0,NULL);
+weight = xc_tmem_control(xch,0,TMEMC_SAVE_GET_CLIENT_WEIGHT,dom,0,0,NULL);
 if ( write_exact(io_fd, weight, sizeof(weight)) )
 return -1;
-cap = xc_tmem_control(xch,0,TMEMC_SAVE_GET_CLIENT_CAP,dom,0,0,0,NULL);
+cap = xc_tmem_control(xch,0,TMEMC_SAVE_GET_CLIENT_CAP,dom,0,0,NULL);
 if ( write_exact(io_fd, cap, sizeof(cap)) )
 return -1;
 if ( flags == -1 || weight == -1 || cap == -1 )
@@ -258,14 +256,14 @@ int xc_tmem_save(xc_interface *xch,
 int checksum = 0;
 
 /* get pool id, flags, pagesize, n_pages, uuid */
-flags = 
xc_tmem_control(xch,i,TMEMC_SAVE_GET_POOL_FLAGS,dom,0,0,0,NULL);
+flags = xc_tmem_control(xch,i,TMEMC_SAVE_GET_POOL_FLAGS,dom,0,0,NULL);
 if ( flags != -1 )
 {
 pool_id = i;
-n_pages = 
xc_tmem_control(xch,i,TMEMC_SAVE_GET_POOL_NPAGES,dom,0,0,0,NULL);
+n_pages = 
xc_tmem_control(xch,i,TMEMC_SAVE_GET_POOL_NPAGES,dom,0,0,NULL);
 if ( !(flags  TMEM_POOL_PERSIST) )
 n_pages = 0;
-
(void)xc_tmem_control(xch,i,TMEMC_SAVE_GET_POOL_UUID,dom,sizeof(uuid),0,0,uuid);
+
(void)xc_tmem_control(xch,i,TMEMC_SAVE_GET_POOL_UUID,dom,sizeof(uuid),0,uuid);
 if ( write_exact(io_fd, pool_id, sizeof(pool_id)) )
 return -1;
 if ( write_exact(io_fd, flags, sizeof(flags)) )
@@ -290,7 +288,7 @@ int xc_tmem_save(xc_interface *xch,
 int ret;
 if ( (ret = xc_tmem_control(xch, pool_id,
 TMEMC_SAVE_GET_NEXT_PAGE, dom,
-bufsize, 0, 0, buf))  

[Xen-devel] [PATCH v2] Tmem bug-fixes and cleanups.

2015-08-27 Thread Konrad Rzeszutek Wilk
Hey!

At the Xenhackathon we spoke that the tmem code needs a bit of cleanups
and simplification. One of the things that Andrew mentioned was that the
TMEM_CONTROL should really be part of the sysctl hypercall. As I ventured
this path I realized there were some other issues that need to be taken
care of (like shared pools blowing up).

This patchset has been tested with 32/64 guests, with a 64 hypervisor
and 32 bit toolstack (and also with 64bit toolstack) with success.

For fun I've also created an Linux module:
http://xenbits.xen.org/gitweb/?p=xentesttools/bootstrap.git;a=blob;f=root_image/drivers/tmem_test/tmem_test.c
that I will expand to cover in the future more interesting hypercall
uses.

Going forward the next step will be to move the 'tmem_control' function to
its own file to simplify the code.

The patches are also in my git tree:

git://xenbits.xen.org/people/konradwilk/xen.git for-4.6/tmem.cleanups.v2

 tools/libxc/include/xenctrl.h  |   2 +-
 tools/libxc/xc_tmem.c  | 111 ++--
 tools/libxl/libxl.c|  22 ++--
 tools/python/xen/lowlevel/xc/xc.c  |  27 +++--
 tools/xenstat/libxenstat/src/xenstat.c |   6 +-
 xen/common/sysctl.c|   7 +-
 xen/common/tmem.c  | 183 +
 xen/include/public/sysctl.h|  44 
 xen/include/public/tmem.h  |  39 +--
 xen/include/xen/tmem.h |   3 +
 xen/include/xen/tmem_xen.h |   1 -
 xen/include/xsm/dummy.h|   6 --
 xen/include/xsm/xsm.h  |   6 --
 xen/xsm/dummy.c|   1 -
 xen/xsm/flask/hooks.c  |   9 +-
 xen/xsm/flask/policy/access_vectors|   2 +-
 16 files changed, 237 insertions(+), 232 deletions(-)
Konrad Rzeszutek Wilk (8):
  tmem: Don't crash/hang/leak hypervisor when using shared pools within an 
guest.
  tmem: Add ASSERT in obj_rb_insert for pool-rwlock lock.
  tmem: remove in xc_tmem_control_oid duplicate set_xen_guest_handle call
  tmem: Remove xc_tmem_control mystical arg3
  tmem: Move TMEM_CONTROL subop of tmem hypercall to sysctl.
  tmem: Remove the old tmem control XSM checks as it is part of sysctl 
hypercall.
  tmem: Remove extra spaces at end and some hard tabbing.
  tmem: Spelling mistakes.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] VT-d faults with Integrated Intel graphics on 4.6

2015-08-27 Thread Konrad Rzeszutek Wilk
On Thu, Aug 27, 2015 at 11:06:30AM +0800, Chen, Tiejun wrote:
 On 8/25/2015 10:43 PM, Konrad Rzeszutek Wilk wrote:
 On Tue, Aug 25, 2015 at 02:55:31PM +0800, Chen, Tiejun wrote:
 On 8/25/2015 8:19 AM, Tamas K Lengyel wrote:
 Hi everyone,
 I saw some people passingly mention this on the list before but just in
 case it has been missed, my serial is also being spammed with the following
 printouts with both Xen 4.6 RC1 and the latest staging build:
 
 ...
 (XEN) [VT-D]DMAR:[DMA Read] Request device [:00:02.0] fault addr
 33487d7000, iommu reg = 82c000201000
 (XEN) [VT-D]DMAR: reason 06 - PTE Read access is not set
 (XEN) [VT-D]DMAR:[DMA Read] Request device [:00:02.0] fault addr
 33487d7000, iommu reg = 82c000201000
 (XEN) [VT-D]DMAR: reason 06 - PTE Read access is not set
 (XEN) [VT-D]DMAR:[DMA Read] Request device [:00:02.0] fault addr
 33487d7000, iommu reg = 82c000201000
 (XEN) [VT-D]DMAR: reason 06 - PTE Read access is not set
 (XEN) [VT-D]DMAR:[DMA Read] Request device [:00:02.0] fault addr
 33487d7000, iommu reg = 82c000201000
 (XEN) [VT-D]DMAR: reason 06 - PTE Read access is not set
 (XEN) [VT-D]DMAR:[DMA Read] Request device [:00:02.0] fault addr
 2610742000, iommu reg = 82c000201000
 (XEN) [VT-D]DMAR: reason 07 - Next page table ptr is invalid
 ...
 
 
 What's your platform? BDW? And how much memory is set to your guest OS?
 
 Is see this as well. But oddly enough - only when I use the AMT feature
 (normally I just use serial console on the machine).
 
 The platform is /DQ67SW, BIOS
 SWQ6710H.86A.0066.2012.1105.1504 11/05/2012
 
 There is no guest OS - this is initial domain. And I boot with 2GB:
   Released 0 page(s)
 
 Xen: [mem 0x-0x00099fff] usable
 Xen: [mem 0x0009a800-0x000f] reserved
 Xen: [mem 0x0010-0x1fff] usable
 Xen: [mem 0x2000-0x201f] reserved
 Xen: [mem 0x2020-0x3fff] usable
 Xen: [mem 0x4000-0x401f] reserved
 Xen: [mem 0x4020-0x80465fff] usable
 Xen: [mem 0x80466000-0x9e855fff] unusable
 Xen: [mem 0x9e856000-0x9e85efff] ACPI data
 Xen: [mem 0x9e85f000-0x9e8a9fff] ACPI NVS
 Xen: [mem 0x9e8aa000-0x9e8b1fff] unusable
 Xen: [mem 0x9e8b2000-0x9e9a4fff] reserved
 Xen: [mem 0x9e9a5000-0x9e9a6fff] unusable
 Xen: [mem 0x9e9a7000-0x9ebc5fff] reserved
 Xen: [mem 0x9ebc6000-0x9ebc6fff] unusable
 Xen: [mem 0x9ebc7000-0x9ebd6fff] reserved
 Xen: [mem 0x9ebd7000-0x9ebf4fff] ACPI NVS
 Xen: [mem 0x9ebf5000-0x9ec18fff] reserved
 Xen: [mem 0x9ec19000-0x9ec5bfff] ACPI NVS
 Xen: [mem 0x9ec5c000-0x9ee7bfff] reserved
 Xen: [mem 0x9ee7c000-0x9eff] unusable
 Xen: [mem 0x9f80-0xbf9f] reserved
 Xen: [mem 0xfec0-0xfec00fff] reserved
 Xen: [mem 0xfed1c000-0xfed3] reserved
 Xen: [mem 0xfed9-0xfed91fff] reserved
 Xen: [mem 0xfee0-0xfeef] reserved
 Xen: [mem 0xff00-0x] reserved
 Xen: [mem 0x0001-0x00043e5f] unusable
 
 
 Just at first glance to fault address, this seems be issued from some
 
 As you see those fault addresses are out of the normal memory range  here.
 
 known erratas on BDS and SKL.
 
 I am runnig v4.2-rc8.
 
 So I really doubt this is related to some erratas. Currently the pre-fetch
 unit of IOMMU unit dedicated to IGD can't work well on some platforms, so
 you can see these wired faults.

Do you have some ideas for a solution/patch?

Thanks!

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3] xen/blkfront: convert to blk-mq APIs

2015-08-27 Thread Rafal Mielniczuk
On 25/08/15 03:14, Bob Liu wrote:
 Hi Rafal,

 Please have a try adding --iodepth_batch=32 --iodepth_batch_complete=32 to 
 the fio command line.
 I didn't see this issue any more, neither for domU.

 Thanks,
 -Bob

Hello,

Using 4.2-rc8 kernel, I can confirm that merges are happening after adding this 
option,
both in dom0 and in the guest using null_blk device.

Using the latest stable kernel 4.1.6 with the same configuration, there are no 
merges.

Do you know which change might have caused this difference?

Below are my results for dom0:


modprobe null_blk irqmode=2 completion_nsec=100 queue_mode=1 submit_queues=1


READ: io=22111MB, aggrb=754705KB/s, minb=754705KB/s, maxb=754705KB/s, 
mint=30001msec, maxt=30001msec

Disk stats (read/write):
  nullb0: ios=352340/0, merge=5285340/0, ticks=397340/0, in_queue=396112, 
util=95.87%


modprobe null_blk irqmode=2 completion_nsec=100 queue_mode=2 submit_queues=1


READ: io=22409MB, aggrb=764838KB/s, minb=764838KB/s, maxb=764838KB/s, 
mint=30002msec, maxt=30002msec

Disk stats (read/write):
  nullb0: ios=357070/0, merge=5356290/0, ticks=420812/0, in_queue=450772, 
util=99.32%

Thanks,
Rafal

 On 08/21/2015 04:46 PM, Rafal Mielniczuk wrote:
 On 19/08/15 12:12, Bob Liu wrote:
 Hi Jens  Christoph,

 Rafal reported an issue about this patch, that's after this patch no more 
 merges happen and the performance dropped if modprobe null_blk irqmode=2 
 completion_nsec=100, 
 but works fine if modprobe null_blk.

 I'm not sure whether it's as expect or not.
 Do you have any suggestions? Thank you!

 Here is the test result:

 fio --name=test --ioengine=libaio --rw=read --numjobs=8 --iodepth=32 \
 --time_based=1 --runtime=30 --bs=4KB --filename=/dev/xvdb \
 --direct=1 --group_reporting=1 --iodepth_batch=16

 
 modprobe null_blk
 
 
 *no patch* (avgrq-sz = 8.00 avgqu-sz=5.00)
 
 READ: io=10655MB, aggrb=363694KB/s, minb=363694KB/s, maxb=363694KB/s, 
 mint=30001msec, maxt=30001msec

 Disk stats (read/write):
   xvdb: ios=2715852/0, merge=1089/0, ticks=126572/0, in_queue=127456, 
 util=100.00%

 
 *with patch* (avgrq-sz = 8.00 avgqu-sz=8.00)
 
 READ: io=20655MB, aggrb=705010KB/s, minb=705010KB/s, maxb=705010KB/s, 
 mint=30001msec, maxt=30001msec

 Disk stats (read/write):
   xvdb: ios=5274633/0, merge=22/0, ticks=243208/0, in_queue=242908, 
 util=99.98%

 
 modprobe null_blk irqmode=2 completion_nsec=100
 
 
 *no patch* (avgrq-sz = 34.00 avgqu-sz=38.00)
 
 READ: io=10372MB, aggrb=354008KB/s, minb=354008KB/s, maxb=354008KB/s, 
 mint=30003msec, maxt=30003msec

 Disk stats (read/write):
   xvdb: ios=621760/0, *merge=1988170/0*, ticks=1136700/0, in_queue=1146020, 
 util=99.76%

 
 *with patch* (avgrq-sz = 8.00 avgqu-sz=28.00)
 
 READ: io=2876.8MB, aggrb=98187KB/s, minb=98187KB/s, maxb=98187KB/s, 
 mint=30002msec, maxt=30002msec

 Disk stats (read/write):
   xvdb: ios=734048/0, merge=0/0, ticks=843584/0, in_queue=843080, 
 util=99.72%

 Regards,
 -Bob
 Hello,

 We got a problem with the lack of merges also when we tested on null_blk 
 device in dom0 directly.
 When we enabled multi queue block-layer we got no merges, even when we set 
 the number of submission queues to 1.

 If I don't miss anything, that could suggest the problem lays somewhere in 
 the blk-mq layer itself?

 Please take a look at the results below:

 fio --name=test --ioengine=libaio --rw=read --numjobs=8 --iodepth=32 \
 --time_based=1 --runtime=30 --bs=4KB --filename=/dev/nullb0 \
 --direct=1 --group_reporting=1

 
 modprobe null_blk irqmode=2 completion_nsec=100 queue_mode=1 
 submit_queues=1
 
READ: io=13692MB, aggrb=467320KB/s, minb=467320KB/s, 

Re: [Xen-devel] [PATCH RFC] xen: if on Xen, flatten the scheduling domain hierarchy

2015-08-27 Thread George Dunlap
On 08/18/2015 04:55 PM, Dario Faggioli wrote:
 Hey everyone,
 
 So, as a followup of what we were discussing in this thread:
 
  [Xen-devel] PV-vNUMA issue: topology is misinterpreted by the guest
  http://lists.xenproject.org/archives/html/xen-devel/2015-07/msg03241.html
 
 I started looking in more details at scheduling domains in the Linux
 kernel. Now, that thread was about CPUID and vNUMA, and their weird way
 of interacting, while this thing I'm proposing here is completely
 independent from them both.
 
 In fact, no matter whether vNUMA is supported and enabled, and no matter
 whether CPUID is reporting accurate, random, meaningful or completely
 misleading information, I think that we should do something about how
 scheduling domains are build.
 
 Fact is, unless we use 1:1, and immutable (across all the guest
 lifetime) pinning, scheduling domains should not be constructed, in
 Linux, by looking at *any* topology information, because that just does
 not make any sense, when vcpus move around.
 
 Let me state this again (hoping to make myself as clear as possible): no
 matter in  how much good shape we put CPUID support, no matter how
 beautifully and consistently that will interact with both vNUMA,
 licensing requirements and whatever else. It will be always possible for
 vCPU #0 and vCPU #3 to be scheduled on two SMT threads at time t1, and
 on two different NUMA nodes at time t2. Hence, the Linux scheduler
 should really not skew his load balancing logic toward any of those two
 situations, as neither of them could be considered correct (since
 nothing is!).
 
 For now, this only covers the PV case. HVM case shouldn't be any
 different, but I haven't looked at how to make the same thing happen in
 there as well.
 
 OVERALL DESCRIPTION
 ===
 What this RFC patch does is, in the Xen PV case, configure scheduling
 domains in such a way that there is only one of them, spanning all the
 pCPUs of the guest.
 
 Note that the patch deals directly with scheduling domains, and there is
 no need to alter the masks that will then be used for building and
 reporting the topology (via CPUID, /proc/cpuinfo, /sysfs, etc.). That is
 the main difference between it and the patch proposed by Juergen here:
 http://lists.xenproject.org/archives/html/xen-devel/2015-07/msg05088.html
 
 This means that when, in future, we will fix CPUID handling and make it
 comply with whatever logic or requirements we want, that won't have  any
 unexpected side effects on scheduling domains.
 
 Information about how the scheduling domains are being constructed
 during boot are available in `dmesg', if the kernel is booted with the
 'sched_debug' parameter. It is also possible to look
 at /proc/sys/kernel/sched_domain/cpu*, and at /proc/schedstat.
 
 With the patch applied, only one scheduling domain is created, called
 the 'VCPU' domain, spanning all the guest's (or Dom0's) vCPUs. You can
 tell that from the fact that every cpu* folder
 in /proc/sys/kernel/sched_domain/ only have one subdirectory
 ('domain0'), with all the tweaks and the tunables for our scheduling
 domain.
 
 EVALUATION
 ==
 I've tested this with UnixBench, and by looking at Xen build time, on a
 16, 24 and 48 pCPUs hosts. I've run the benchmarks in Dom0 only, for
 now, but I plan to re-run them in DomUs soon (Juergen may be doing
 something similar to this in DomU already, AFAUI).
 
 I've run the benchmarks with and without the patch applied ('patched'
 and 'vanilla', respectively, in the tables below), and with different
 number of build jobs (in case of the Xen build) or of parallel copy of
 the benchmarks (in the case of UnixBench).
 
 What I get from the numbers is that the patch almost always brings
 benefits, in some cases even huge ones. There are a couple of cases
 where we regress, but always only slightly so, especially if comparing
 that to the magnitude of some of the improvement that we get.
 
 Bear also in mind that these results are gathered from Dom0, and without
 any overcommitment at the vCPU level (i.e., nr. vCPUs == nr pCPUs). If
 we move things in DomU and do overcommit at the Xen scheduler level, I
 am expecting even better results.
 
 RESULTS
 ===
 To have a quick idea of how a benchmark went, look at the '%
 improvement' row of each table.
 
 I'll put these results online, in a googledoc spreadsheet or something
 like that, to make them easier to read, as soon as possible.
 
 *** Intel(R) Xeon(R) E5620 @ 2.40GHz  
   
 *** pCPUs  16DOM0 vCPUS  16
 *** RAM12285 MB  DOM0 Memory 9955 MB
 *** NUMA nodes 2 
 ===
 MAKE XEN (lower == better)

Re: [Xen-devel] [PATCH v2 for-4.6 2/2] docs: Migration feature document

2015-08-27 Thread Andrew Cooper
On 27/08/15 03:15, Jim Fehlig wrote:
 On 08/25/2015 04:40 AM, Andrew Cooper wrote:
 Signed-off-by: Andrew Cooper andrew.coop...@citrix.com
 ---
 v2:
   * %Revision and #History, following template review
   * Grammar fixes
 ---
   docs/features/migration.pandoc |  100
 
   1 file changed, 100 insertions(+)
   create mode 100644 docs/features/migration.pandoc

 diff --git a/docs/features/migration.pandoc
 b/docs/features/migration.pandoc
 new file mode 100644
 index 000..0fd227f
 --- /dev/null
 +++ b/docs/features/migration.pandoc
 @@ -0,0 +1,100 @@
 +% Migration
 +% Revision 1
 +
 +\clearpage
 +
 +# Basics
 +--- -
 +Status: **Supported**
 +
 +  Architecture: x86
 +
 + Component: Toolstack
 +--- -
 +
 +# Overview
 +
 +Migration is a mechanism to move a virtual machine while the VM is
 +running.  Live migration moves a running virtual machine between two
 +physical servers, but the same mechanism can be used for non-live
 +migrate (pause and copy) and suspend/resume from disk.

 s/migrate/migration/ ?

Either is valid here, although it would be better to leave the entire
sentence in the same tense.


 +
 +# User details
 +
 +No hardware requirements, although hypervisor logdirty support is
 +required for live migration.
 +
 +From the command line, `xl migrate/save/restore` are the top level
 +interactions.  e.g.
 +
 +xl create my-vm.cfg
 +xl migrate my-vm localhost
 +
 +or
 +
 +xl create my-vm.cfg
 +xl save my-vm /path/to/save/file
 +xl restore /path/to/save/file
 +
 +Xen 4.6 sees the instruction of Migration v2.  There is no change for

 Xen 4.6 sees the introduction of Migration v2.
 Or, Xen 4.6 introduces Migration v2.

Oops.  I did mean s/instruction/introduction/.


 +people using `xl`, although the `libxl` API has had an extension.
 +
 +# Technical details
 +
 +Migration is formed of several layers.  `libxc` is responsible for the
 +contents of the VM (ram, vcpus, etc) and the live migration loop, while
 +`libxl` is responsible for items such as emulator state.
 +
 +The format of the migration v2 stream is specified in two documents,
 and
 +is architecture neutral.  Compatibility with legacy streams is
 +maintained via the `convert-legacy-stream` script which transforms a
 +legacy stream into a migration v2 stream.
 +
 +* Documents
 +* `docs/specs/libxc-migration-stream.pandoc`
 +* `docs/specs/libxl-migration-stream.pandoc`
 +* `libxc`
 +* `tools/libxc/xc_sr_*.[hc]`
 +* `libxl`
 +* `tools/libxl/libxl_stream_{read,write}.c`
 +* Scripts
 +* `tools/python/xen/migration/*.py`
 +* `tools/python/scripts/convert-legacy-stream`
 +* `tools/python/scripts/verify-stream-v2`
 +
 +Users of the `libxl` API have a new parameter `stream_version` in
 +`domain_restore_params` which is used to distinguish between legacy and
 +v2 migration streams, and hence whether legacy conversion is required.

 A question better asked when you posted the patches, but is there a
 way to detect if the receiver is V2 capable? I suspect sending a V2
 stream to a receiver capable only of V1 is not advised :-).

libxl has no concept of asking the far side for information, and
therefore no concept of advertising its features.

For querying the local libxl at build time, LIBXL_HAVE_SRM_V2 and
LIBXL_HAVE_SRM_V1 from 3a9ace01


 Also, commit id introducing the change would be helpful info. Looks
 like 3a9ace01 in this case.

See the reply (I am about to write) on the 0/0 thread, but I hope that
putting git sha's in this document is going to be impossible.


 +
 +# Limitations
 +
 +Hypervisor logdirty support is incompatible with hardware passthrough,
 +as IOMMU faults cannot be used to track writes.
 +
 +While not a bug in migration specifically, VMs are very sensitive to
 +changes in cpuid information, and cpuid levelling support currently has
 +its issues.  Extreme care should be taken when migrating VMs between
 +non-identical CPUs until the cpuid levelling improvements are complete.
 +
 +# Areas for improvement
 +
 +* Arm support
 +* Linear P2M support for x86 PV
 +* Live looping parameters

 * cpuid levelling

The topic of cpuid levelling is covered in the paragraph above, and is
logically separate from migration.  XenServer for examples uses it work
around various guest kernel bugs.

It is just as broken in the plain boot case as the migrate case, and I
don't wish to confuse the issues.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 1/8] tmem: Don't crash/hang/leak hypervisor when using shared pools within an guest.

2015-08-27 Thread Konrad Rzeszutek Wilk
When we are using shared pools we have an global array
(on which we put the pool), and an array of pools per domain.
We also have an shared list of clients (guests) _except_
for the very first domain that created the shared pool.

To deal with multiple guests using an shared pool we have an
ref count and a linked list. Whenever an new user of
a the shared pool joins we increase the ref count and add to
the linked list. Whenever an user quits the shared pool
we decrement and remove from the linked list.

Unfortunately this ref counting and linked list never
worked properly. There are multiple issues:

 1) If we have one shared pool (and only one guest creating it)
- we do not add it to the shared list of clients. Which
means the logic in 'shared_pool_quit' never removed
the pool from global_shared_pools. That meant when the pool
was de-allocated - we still had an pointer to the pool
which would be accessed by tmemc_list_client (xl tmem-list -a)
and hit a NULL page!

 2). If we have two shared pools in a domain - it (shared_pool_quit)
 would remove the domain from the share_list linked list, decrements
 the refcount to zero - and remove the pool from the global shared pool.
 When done it would also clear the client-pools[] to NULL for itself.
 Which is good. However since there are two shared pools in the domain
 the next entry in the client-pools[] would have a stale pointer to
 the just de-allocated pool. Accessing it and trying to de-allocate it
 would lead to crashes or hypervisor hang - depending on the build.

Fun times!

To trigger this use
http://xenbits.xen.org/gitweb/?p=xentesttools/bootstrap.git;a=blob;f=root_image/drivers/tmem_test/tmem_test.c

This patch fixes it by making the very first domain that created
an shared pool to follow the same logic as every domain that is
joining a shared pool. That is increment the refcount and also
add itself to the shared list of domains using it.

We also remove an ASSERT that incorrectly assumed
that only one shared pool would exist for a domain.

And to mirror the reporting logic in shared_pool_join
we also add a printk to advertise inter-domain shared pool
joining.

Signed-off-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
---
 xen/common/tmem.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/xen/common/tmem.c b/xen/common/tmem.c
index f2dc26e..572944e 100644
--- a/xen/common/tmem.c
+++ b/xen/common/tmem.c
@@ -1037,6 +1037,9 @@ static int shared_pool_join(struct tmem_pool *pool, 
struct client *new_client)
 tmem_client_info(adding new %s %d to shared pool owned by %s %d\n,
 tmem_client_str, new_client-cli_id, tmem_client_str,
 pool-client-cli_id);
+else if (pool-shared_count)
+tmem_client_info(inter-guest sharing of shared pool %s by client 
%d\n,
+ tmem_client_str, pool-client-cli_id);
 ++pool-shared_count;
 return 0;
 }
@@ -1056,7 +1059,10 @@ static void shared_pool_reassign(struct tmem_pool *pool)
 }
 old_client-pools[pool-pool_id] = NULL;
 sl = list_entry(pool-share_list.next, struct share_list, share_list);
-ASSERT(sl-client != old_client);
+/*
+ * The sl-client can be old_client if there are multiple shared pools
+ * within an guest.
+ */
 pool-client = new_client = sl-client;
 for (poolid = 0; poolid  MAX_POOLS_PER_DOMAIN; poolid++)
 if (new_client-pools[poolid] == pool)
@@ -1982,6 +1988,8 @@ static int do_tmem_new_pool(domid_t this_cli_id,
 {
 INIT_LIST_HEAD(pool-share_list);
 pool-shared_count = 0;
+if (shared_pool_join(pool, client))
+goto fail;
 global_shared_pools[first_unused_s_poolid] = pool;
 }
 }
-- 
2.1.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] tmem vs Xen 4.6

2015-08-27 Thread Konrad Rzeszutek Wilk
On Thu, Aug 27, 2015 at 04:02:08PM +0800, Bob Liu wrote:
 
 On 08/27/2015 02:55 PM, Jan Beulich wrote:
  Konrad,
  
  I thought I'd remind you of the some progress per release criteria
  to avoid it becoming subject to removal. The most recent changes
  date back to spring 2014 afaics...
  
  Jan
  
 
 AFAIR Konrad holds a few patches, I believe he will send out them shortly 
 (Friday or Monday).

Did it just now and also pushed them to a git tree.

Thanks!
 
 -- 
 Regards,
 -Bob

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [Draft B] Boot ABI for HVM guests without a device-model

2015-08-27 Thread Jan Beulich
 On 27.08.15 at 11:57, roger@citrix.com wrote:
 El 27/08/15 a les 11.43, Andrew Cooper ha escrit:
 On 27/08/15 09:04, Jan Beulich wrote:
 On 26.08.15 at 16:44, roger@citrix.com wrote:
 El 26/08/15 a les 14.12, Andrew Cooper ha escrit:
 On 26/08/15 13:00, Jan Beulich wrote:
 This structure is guaranteed to always be placed in memory after the
 DYM These structures are ...?

 loaded kernel and modules.
 There is no requirement for the command line/module information to be
 after the loaded kernel.  All it needs to do is not overlap.
 IMHO, this is helpful in order to get last used physical address, after
 which free memory starts. Current FreeBSD implementation relies on this,
 if we didn't do it that way I would have to calculate where the symtab +
 strtab ends, which is more complex.
 But the statement leaves open whether there is any free memory at
 all after those structures, or whether instead all free memory lives
 at lower addresses. Nor do I consider it appropriate to take a present
 (one might say overly simplistic) implementation as a basis for setting
 arbitrary restrictions.
 
 Can we just state that the hvm_start_info structure and associated 
 metadata is placed after the loaded kernel and modules?

I think we should try to avoid introducing any restrictions that aren't
technically warranted: The more restrictions we add now, the less
flexible we're going to be when we want/need to change some of the
implementation later on.

 I agree.  This sounds like a FreeBSD bug, and absolutely shouldn't be a
 written restriction in the boot ABI.
 
 Bug? The FreeBSD native loader passes to the FreeBSD kernel the last 
 used address, after which free memory starts. IMHO, it is not a bug, 
 it's just how FreeBSD boots. I understand that Linux might not pass 
 such a parameter, and there are other ways I can use to find this, but 
 they are more complex. 

Considering that you claimed that after that free memory starts, when
- as said - there might not be any free memory there, it indeed sounds
like a bug to me.

 We already did something very similar with PV guests, see the comment 
 before the start_info structure:

I think we'd better avoid leaving basically no room for alterations. It
has been problematic on the PV side in at least one case (the ordering
of page table pages for compat guests).

 IMHO it is important to mention how things are loaded into memory, and 
 placing the hvm_start_info struct after the loaded kernel and modules 
 is also the most natural way to do it, I don't foresee this changing in 
 the future.

The only thing we should make sure is that all pieces can be easily
found, i.e. the kernel doesn't need to do any guessing.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


  1   2   >