[Xen-devel] [xen-4.8-testing test] 106985: regressions - FAIL

2017-03-29 Thread osstest service owner
flight 106985 xen-4.8-testing real [real]
http://logs.test-lab.xenproject.org/osstest/logs/106985/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-armhf-armhf-xl-xsm 15 guest-start/debian.repeat fail REGR. vs. 106844
 test-armhf-armhf-libvirt  6 xen-boot fail REGR. vs. 106844
 test-armhf-armhf-xl-vhd 14 guest-start/debian.repeat fail REGR. vs. 106844

Regressions which are regarded as allowable (not blocking):
 test-armhf-armhf-xl-rtds15 guest-start/debian.repeat fail REGR. vs. 106844
 test-amd64-i386-xl-qemuu-win7-amd64 16 guest-stop fail like 106844
 test-amd64-amd64-xl-qemut-win7-amd64 16 guest-stopfail like 106844
 test-amd64-amd64-xl-qemuu-win7-amd64 16 guest-stopfail like 106844
 test-amd64-i386-xl-qemut-win7-amd64 16 guest-stop fail like 106844
 test-amd64-amd64-xl-rtds  9 debian-install   fail  like 106844

Tests which did not succeed, but are not blocking:
 test-arm64-arm64-libvirt-xsm  1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl   1 build-check(1)   blocked  n/a
 build-arm64-libvirt   1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt-qcow2  1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt  1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-credit2   1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-rtds  1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-multivcpu  1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-xsm   1 build-check(1)   blocked  n/a
 build-arm64   5 xen-buildfail   never pass
 build-arm64-pvops 5 kernel-build fail   never pass
 test-amd64-amd64-xl-pvh-intel 11 guest-start  fail  never pass
 test-amd64-amd64-libvirt 12 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  12 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  12 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check 
fail never pass
 test-amd64-amd64-xl-pvh-amd  11 guest-start  fail   never pass
 test-amd64-amd64-libvirt-xsm 12 migrate-support-checkfail   never pass
 build-arm64-xsm   5 xen-buildfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check 
fail never pass
 test-armhf-armhf-xl-arndale  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 12 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 13 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-credit2  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-xsm  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-xsm  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 12 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 13 saverestore-support-checkfail never pass
 test-armhf-armhf-xl  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-xsm 13 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 11 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 12 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  11 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  12 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 11 migrate-support-checkfail   never pass
 test-amd64-amd64-qemuu-nested-amd 16 debian-hvm-install/l1/l2  fail never pass

version targeted for testing:
 xen  ca41491f0507150139fc35ff6c9f076fdbe9487b
baseline version:
 xen  eca97a466dc8d8f99fbff8f51a117d6e8255ecdc

Last test of basis   106844  2017-03-22 15:51:56 Z7 days
Testing same since   106985  2017-03-29 19:12:11 Z0 days1 attempts


People who touched revisions under test:
  Stefano Stabellini 
  Wei Chen 

jobs:
 build-amd64-xsm  pass
 build-arm64-xsm 

Re: [Xen-devel] [PATCH RFC 07/20] migration: defer precopy policy to libxl

2017-03-29 Thread Joshua Otto
On Wed, Mar 29, 2017 at 07:54:15PM +0100, Jennifer Herbert wrote:
> I would like to encourage this patch - as I have use for it outside
> of your postcopy work.

Glad to hear that!

> Some things people will comment on:
> You've used 'unsigned' without the int keyword, which people don't like.
> Also on line 324, your missing space between 'if (' and
> 'ctx->save.policy_decision'.

Ack.  All of the existing code in xc_sr_save/xc_sr_restore uses plain "unsigned"
so I tried to be consistent.

> 
> Also, I'm not a fan of your CONSULT_POLICY macro, which you've defined at
> a odd point in your function, and I think could be done more elegantly.
> Worst of all ... its a macro - which I think should generally be avoided
> unless
> there is little choice.   I'm sure you could write a helper function to
> replace this.

Yes, you're right, will fix.

Thank you for the review!

Josh

> 
> Cheers,
> 
> -jenny
> 
> On 27/03/17 10:06, Joshua Otto wrote:
> >The precopy phase of the xc_domain_save() live migration algorithm has
> >historically been implemented to run until either a) (almost) no pages
> >are dirty or b) some fixed, hard-coded maximum number of precopy
> >iterations has been exceeded.  This policy and its implementation are
> >less than ideal for a few reasons:
> >- the logic of the policy is intertwined with the control flow of the
> >   mechanism of the precopy stage
> >- it can't take into account facts external to the immediate
> >   migration context, such as interactive user input or the passage of
> >   wall-clock time
> >- it does not permit the user to change their mind, over time, about
> >   what to do at the end of the precopy (they get an unconditional
> >   transition into the stop-and-copy phase of the migration)
> >
> >To permit users to implement arbitrary higher-level policies governing
> >when the live migration precopy phase should end, and what should be
> >done next:
> >- add a precopy_policy() callback to the xc_domain_save() user-supplied
> >   callbacks
> >- during the precopy phase of live migrations, consult this policy after
> >   each batch of pages transmitted and take the dictated action, which
> >   may be to a) abort the migration entirely, b) continue with the
> >   precopy, or c) proceed to the stop-and-copy phase.
> >- provide an implementation of the old policy as such a callback in
> >   libxl and plumb it through the IPC machinery to libxc, effectively
> >   maintaing the old policy for now
> >
> >Signed-off-by: Joshua Otto 
> >---
> >  tools/libxc/include/xenguest.h |  23 -
> >  tools/libxc/xc_nomigrate.c |   3 +-
> >  tools/libxc/xc_sr_common.h |   7 +-
> >  tools/libxc/xc_sr_save.c   | 194 
> > ++---
> >  tools/libxl/libxl_dom_save.c   |  20 
> >  tools/libxl/libxl_save_callout.c   |   3 +-
> >  tools/libxl/libxl_save_helper.c|   7 +-
> >  tools/libxl/libxl_save_msgs_gen.pl |   4 +-
> >  8 files changed, 189 insertions(+), 72 deletions(-)
> >
> >diff --git a/tools/libxc/include/xenguest.h b/tools/libxc/include/xenguest.h
> >index aa8cc8b..30ffb6f 100644
> >--- a/tools/libxc/include/xenguest.h
> >+++ b/tools/libxc/include/xenguest.h
> >@@ -39,6 +39,14 @@
> >   */
> >  struct xenevtchn_handle;
> >+/* For save's precopy_policy(). */
> >+struct precopy_stats
> >+{
> >+unsigned iteration;
> >+unsigned total_written;
> >+long dirty_count; /* -1 if unknown */
> >+};
> >+
> >  /* callbacks provided by xc_domain_save */
> >  struct save_callbacks {
> >  /* Called after expiration of checkpoint interval,
> >@@ -46,6 +54,17 @@ struct save_callbacks {
> >   */
> >  int (*suspend)(void* data);
> >+/* Called after every batch of page data sent during the precopy phase 
> >of a
> >+ * live migration to ask the caller what to do next based on the current
> >+ * state of the precopy migration.
> >+ */
> >+#define XGS_POLICY_ABORT  (-1) /* Abandon the migration entirely and
> >+* tidy up. */
> >+#define XGS_POLICY_CONTINUE_PRECOPY 0  /* Remain in the precopy phase. */
> >+#define XGS_POLICY_STOP_AND_COPY1  /* Immediately suspend and transmit 
> >the
> >+* remaining dirty pages. */
> >+int (*precopy_policy)(struct precopy_stats stats, void *data);
> >+
> >  /* Called after the guest's dirty pages have been
> >   *  copied into an output buffer.
> >   * Callback function resumes the guest & the device model,
> >@@ -100,8 +119,8 @@ typedef enum {
> >   *doesn't use checkpointing
> >   * @return 0 on success, -1 on failure
> >   */
> >-int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, uint32_t 
> >max_iters,
> >-   uint32_t max_factor, uint32_t flags /* XCFLAGS_xxx */,
> >+int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom,
> >+   uint32_t flags /* XCFLAGS_xxx */,
> >

Re: [Xen-devel] [PATCH RFC 07/20] migration: defer precopy policy to libxl

2017-03-29 Thread Joshua Otto
On Wed, Mar 29, 2017 at 09:18:10PM +0100, Andrew Cooper wrote:
> On 27/03/17 10:06, Joshua Otto wrote:
> > The precopy phase of the xc_domain_save() live migration algorithm has
> > historically been implemented to run until either a) (almost) no pages
> > are dirty or b) some fixed, hard-coded maximum number of precopy
> > iterations has been exceeded.  This policy and its implementation are
> > less than ideal for a few reasons:
> > - the logic of the policy is intertwined with the control flow of the
> >   mechanism of the precopy stage
> > - it can't take into account facts external to the immediate
> >   migration context, such as interactive user input or the passage of
> >   wall-clock time
> > - it does not permit the user to change their mind, over time, about
> >   what to do at the end of the precopy (they get an unconditional
> >   transition into the stop-and-copy phase of the migration)
> >
> > To permit users to implement arbitrary higher-level policies governing
> > when the live migration precopy phase should end, and what should be
> > done next:
> > - add a precopy_policy() callback to the xc_domain_save() user-supplied
> >   callbacks
> > - during the precopy phase of live migrations, consult this policy after
> >   each batch of pages transmitted and take the dictated action, which
> >   may be to a) abort the migration entirely, b) continue with the
> >   precopy, or c) proceed to the stop-and-copy phase.
> > - provide an implementation of the old policy as such a callback in
> >   libxl and plumb it through the IPC machinery to libxc, effectively
> >   maintaing the old policy for now
> >
> > Signed-off-by: Joshua Otto 
> 
> This patch should be split into two.  One modifying libxc to use struct
> precopy_stats, and a second to wire up the RPC call.

Will do.

> > ---
> >  tools/libxc/include/xenguest.h |  23 -
> >  tools/libxc/xc_nomigrate.c |   3 +-
> >  tools/libxc/xc_sr_common.h |   7 +-
> >  tools/libxc/xc_sr_save.c   | 194 
> > ++---
> >  tools/libxl/libxl_dom_save.c   |  20 
> >  tools/libxl/libxl_save_callout.c   |   3 +-
> >  tools/libxl/libxl_save_helper.c|   7 +-
> >  tools/libxl/libxl_save_msgs_gen.pl |   4 +-
> >  8 files changed, 189 insertions(+), 72 deletions(-)
> >
> > diff --git a/tools/libxc/include/xenguest.h b/tools/libxc/include/xenguest.h
> > index aa8cc8b..30ffb6f 100644
> > --- a/tools/libxc/include/xenguest.h
> > +++ b/tools/libxc/include/xenguest.h
> > @@ -39,6 +39,14 @@
> >   */
> >  struct xenevtchn_handle;
> >  
> > +/* For save's precopy_policy(). */
> > +struct precopy_stats
> > +{
> > +unsigned iteration;
> > +unsigned total_written;
> > +long dirty_count; /* -1 if unknown */
> 
> total_written and dirty_count are liable to be equal, so having them as
> different widths of integer clearly can't be correct.

Hmmm, I could have sworn that I chose the width to match the type of dirty_count
in the shadow op stats, but I've checked again and it's uint32_t there so I'm
not sure what I was thinking.

> 
> > +};
> > +
> >  /* callbacks provided by xc_domain_save */
> >  struct save_callbacks {
> >  /* Called after expiration of checkpoint interval,
> > @@ -46,6 +54,17 @@ struct save_callbacks {
> >   */
> >  int (*suspend)(void* data);
> >  
> > +/* Called after every batch of page data sent during the precopy phase 
> > of a
> > + * live migration to ask the caller what to do next based on the 
> > current
> > + * state of the precopy migration.
> > + */
> > +#define XGS_POLICY_ABORT  (-1) /* Abandon the migration entirely 
> > and
> > +* tidy up. */
> > +#define XGS_POLICY_CONTINUE_PRECOPY 0  /* Remain in the precopy phase. */
> > +#define XGS_POLICY_STOP_AND_COPY1  /* Immediately suspend and transmit 
> > the
> > +* remaining dirty pages. */
> > +int (*precopy_policy)(struct precopy_stats stats, void *data);
> 
> Structures shouldn't be passed by value like this, as the compiler has
> to do a lot of memcpy() work to make it happen.  You should pass by
> const pointer, as (as far as I can tell), they are strictly read-only to
> the implementation of this hook?

I chose to pass by value to make the IPC plumbing easier -
libxl_save_msgs_gen.pl doesn't know what to do about pointers, and (not being
the strongest Perl programmer...) I didn't want to volunteer to be the one to
teach it.

Is the memcpy() really significant here?  If this were a tight loop, sure, but
every invocation of the policy callback implies both a 4MB network transfer
_and_ a synchronous RPC.

> > +
> >  /* Called after the guest's dirty pages have been
> >   *  copied into an output buffer.
> >   * Callback function resumes the guest & the device model,
> > @@ -100,8 +119,8 @@ typedef enum {
> >   *doesn't use checkpointing
> >   * @return 

Re: [Xen-devel] [GSoc] Adding Floating Point support to Mini-OS

2017-03-29 Thread Juergen Gross
On 29/03/17 20:53, Felix Schmoll wrote:
> Hi,
> 
> while looking at this some more I came to the following
> questions/assumptions, so I'd be grateful if you could shortly address them:
> 
> -While implementing our own kernel last semester me and my team-mate
> came to believe that pusha/popa were faster that pushing/popping the
> individual registers, since it is just a single command. The Mini-OS
> kernel however does the latter. Is that a conscience performance-trade
> for something or did we just underly a misconception, in that it
> compiles to the same thing in the end?

pusha/popa are 32-bit (and real mode) only. They waste space on the
stack for the stack pointer. As the Linux kernel isn't using them
either I'd suggest to not use pusha/popa.

> -Lazy floating point register saving is similar to Copy-on-write, is
> that correct?

Samuel has already answered this question.

> -There is nothing preventing me from using some floating-point library
> for the user-space test program, right?

I wouldn't bother too much with libraries. Just use the FP types (float,
double) of gcc and do some basic math.

> I'd also appreciate if you could have a quick glance my updated proposal
> (on the GSoC portal) and give me some more feedback on it.

I'd replace the FP library part with basic FP operations based on gcc
support.

While I don't expect major problems I suggest adding a note about
testing in pv- and pvh-mode (both should work in 32- and 64-bit mode).


Juergen

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [xen-unstable test] 106979: regressions - FAIL

2017-03-29 Thread osstest service owner
flight 106979 xen-unstable real [real]
http://logs.test-lab.xenproject.org/osstest/logs/106979/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-armhf-armhf-xl-xsm 15 guest-start/debian.repeat fail REGR. vs. 106959

Regressions which are regarded as allowable (not blocking):
 test-armhf-armhf-libvirt-xsm 13 saverestore-support-checkfail  like 106959
 test-armhf-armhf-libvirt 13 saverestore-support-checkfail  like 106959
 test-amd64-amd64-xl-qemut-win7-amd64 16 guest-stopfail like 106959
 test-amd64-i386-xl-qemuu-win7-amd64 16 guest-stop fail like 106959
 test-amd64-i386-xl-qemut-win7-amd64 16 guest-stop fail like 106959
 test-amd64-amd64-xl-qemuu-win7-amd64 16 guest-stopfail like 106959
 test-armhf-armhf-libvirt-raw 12 saverestore-support-checkfail  like 106959
 test-amd64-amd64-xl-rtds  9 debian-install   fail  like 106959

Tests which did not succeed, but are not blocking:
 test-arm64-arm64-libvirt-xsm  1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl   1 build-check(1)   blocked  n/a
 build-arm64-libvirt   1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt-qcow2  1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt  1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-credit2   1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-rtds  1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-multivcpu  1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-xsm   1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-xsm  12 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  12 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 12 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 12 migrate-support-checkfail   never pass
 build-arm64   5 xen-buildfail   never pass
 build-arm64-xsm   5 xen-buildfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check 
fail never pass
 build-arm64-pvops 5 kernel-build fail   never pass
 test-armhf-armhf-xl-arndale  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  13 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 11 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-xsm  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-xsm  13 saverestore-support-checkfail   never pass
 test-amd64-amd64-qemuu-nested-amd 16 debian-hvm-install/l1/l2  fail never pass
 test-armhf-armhf-xl-multivcpu 12 migrate-support-checkfail  never pass
 test-armhf-armhf-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 13 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 12 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 13 saverestore-support-checkfail never pass
 test-armhf-armhf-xl-rtds 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 13 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 11 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  11 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  12 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  68a08e12c44435eb86600072b9e725e2387ce163
baseline version:
 xen  ac9ff74f39a734756af90ccbb7184551f7b1e22a

Last test of basis   106959  2017-03-28 09:14:17 Z1 days
Testing same since   106979  2017-03-29 16:19:31 Z0 days1 attempts


People who touched revisions under test:
  Ian Jackson 
  Jonathan Davies 
  Thomas Sanders 

jobs:
 build-amd64-xsm  pass
 build-arm64-xsm  fail
 build-armhf-xsm  pass
 build-i386-xsm   pass
 build-amd64-xtf 

Re: [Xen-devel] [PATCH RFC 06/20] libxc/xc_sr: factor helpers out of handle_page_data()

2017-03-29 Thread Joshua Otto
On Tue, Mar 28, 2017 at 08:52:26PM +0100, Andrew Cooper wrote:
> On 27/03/17 10:06, Joshua Otto wrote:
> > diff --git a/tools/libxc/xc_sr_stream_format.h 
> > b/tools/libxc/xc_sr_stream_format.h
> > index 3291b25..32400b2 100644
> > --- a/tools/libxc/xc_sr_stream_format.h
> > +++ b/tools/libxc/xc_sr_stream_format.h
> > @@ -80,15 +80,15 @@ struct xc_sr_rhdr
> >  #define REC_TYPE_OPTIONAL 0x8000U
> >  
> >  /* PAGE_DATA */
> > -struct xc_sr_rec_page_data_header
> > +struct xc_sr_rec_pages_header
> >  {
> >  uint32_t count;
> >  uint32_t _res1;
> >  uint64_t pfn[0];
> >  };
> >  
> > -#define PAGE_DATA_PFN_MASK  0x000fULL
> > -#define PAGE_DATA_TYPE_MASK 0xf000ULL
> > +#define REC_PFINFO_PFN_MASK  0x000fULL
> > +#define REC_PFINFO_TYPE_MASK 0xf000ULL
> >  
> >  /* X86_PV_INFO */
> >  struct xc_sr_rec_x86_pv_info
> 
> What are the purposes of these name changes?

I should definitely have explained this more explicitly, sorry about that.  I
use the same exact structure (a count followed by a list of encoded pfns+types)
for three additional record types (POSTCOPY_PFNS, POSTCOPY_PAGE_DATA, and
POSTCOPY_FAULT) later in the series when postcopy is introduced.  To enable the
generation and validation logic to be shared between all of the code that
processes this sort of record, I renamed the structure and its associated masks
to be more generic.

Josh

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH RFC 05/20] libxc/xc_sr: factor out filter_pages()

2017-03-29 Thread Joshua Otto
On Tue, Mar 28, 2017 at 08:27:48PM +0100, Andrew Cooper wrote:
> On 27/03/17 10:06, Joshua Otto wrote:
> > diff --git a/tools/libxc/xc_sr_restore.c b/tools/libxc/xc_sr_restore.c
> > index 481a904..8574ee8 100644
> > --- a/tools/libxc/xc_sr_restore.c
> > +++ b/tools/libxc/xc_sr_restore.c
> > @@ -194,6 +194,68 @@ int populate_pfns(struct xc_sr_context *ctx, unsigned 
> > count,
> >  return rc;
> >  }
> >  
> > +static void set_page_types(struct xc_sr_context *ctx, unsigned count,
> > +   xen_pfn_t *pfns, uint32_t *types)
> > +{
> > +unsigned i;
> 
> Please use unsigned int rather than just "unsigned" throughout.

Okay.  (For what it's worth, I chose plain "unsigned" here for consistency with
the rest of xc_sr_save/xc_sr_restore)

> > +
> > +for ( i = 0; i < count; ++i )
> > +ctx->restore.ops.set_page_type(ctx, pfns[i], types[i]);
> > +}
> > +
> > +/*
> > + * Given count pfns and their types, allocate and fill in buffer bpfns 
> > with only
> > + * those pfns that are 'backed' by real page data that needs to be 
> > migrated.
> > + * The caller must later free() *bpfns.
> > + *
> > + * Returns 0 on success and non-0 on failure.  *bpfns can be free()ed even 
> > after
> > + * failure.
> > + */
> > +static int filter_pages(struct xc_sr_context *ctx,
> > +unsigned count,
> > +xen_pfn_t *pfns,
> > +uint32_t *types,
> > +/* OUT */ unsigned *nr_pages,
> > +/* OUT */ xen_pfn_t **bpfns)
> > +{
> > +xc_interface *xch = ctx->xch;
> 
> Pointers to arrays are very easy to get wrong in C.  This code will be
> less error if you use
> 
> xen_pfn_t *_pfns;  (variable name subject to improvement)
> 
> > +unsigned i;
> > +
> > +*nr_pages = 0;
> > +*bpfns = malloc(count * sizeof(*bpfns));
> 
> _pfns = *bfns = malloc(...).
> 
> Then use _pfns in place of (*bpfns) everywhere else.
> 
> However,  your sizeof has the wrong indirection.  It works on x86
> because xen_pfn_t is the same size as a pointer, but it will blow up on
> 32bit ARM, where a pointer is 4 bytes but xen_pfn_t is 8 bytes.

Agh!  Oh dear.

> > +if ( !(*bpfns) )
> > +{
> > +ERROR("Failed to allocate %zu bytes to process page data",
> > +  count * (sizeof(*bpfns)));
> > +return -1;
> > +}
> > +
> > +for ( i = 0; i < count; ++i )
> > +{
> > +switch ( types[i] )
> > +{
> > +case XEN_DOMCTL_PFINFO_NOTAB:
> > +
> > +case XEN_DOMCTL_PFINFO_L1TAB:
> > +case XEN_DOMCTL_PFINFO_L1TAB | XEN_DOMCTL_PFINFO_LPINTAB:
> > +
> > +case XEN_DOMCTL_PFINFO_L2TAB:
> > +case XEN_DOMCTL_PFINFO_L2TAB | XEN_DOMCTL_PFINFO_LPINTAB:
> > +
> > +case XEN_DOMCTL_PFINFO_L3TAB:
> > +case XEN_DOMCTL_PFINFO_L3TAB | XEN_DOMCTL_PFINFO_LPINTAB:
> > +
> > +case XEN_DOMCTL_PFINFO_L4TAB:
> > +case XEN_DOMCTL_PFINFO_L4TAB | XEN_DOMCTL_PFINFO_LPINTAB:
> > +
> > +(*bpfns)[(*nr_pages)++] = pfns[i];
> > +break;
> > +}
> > +}
> > +
> > +return 0;
> > +}
> > +
> >  /*
> >   * Given a list of pfns, their types, and a block of page data from the
> >   * stream, populate and record their types, map the relevant subset and 
> > copy
> > @@ -203,7 +265,7 @@ static int process_page_data(struct xc_sr_context *ctx, 
> > unsigned count,
> >   xen_pfn_t *pfns, uint32_t *types, void 
> > *page_data)
> >  {
> >  xc_interface *xch = ctx->xch;
> > -xen_pfn_t *mfns = malloc(count * sizeof(*mfns));
> > +xen_pfn_t *mfns = NULL;
> 
> This shows a naming bug, which is my fault.  This should be named gfns,
> not mfns.  (It inherits its name from the legacy migration code, but
> that was also wrong.)
> 
> Please correct it, either in this patch or another; the memory
> management terms are hard enough, even when all the code is correct.

Ahhh - I actually found this desperately confusing when trying to grok the
code originally.  Thanks for clearing that up!

Josh

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH RFC 04/20] libxc/xc_sr_save.c: add WRITE_TRIVIAL_RECORD_FN()

2017-03-29 Thread Joshua Otto
On Tue, Mar 28, 2017 at 08:03:26PM +0100, Andrew Cooper wrote:
> On 27/03/17 10:06, Joshua Otto wrote:
> > Writing the libxc save stream requires writing a few 'trivial' records,
> > consisting only of a header with a particular type.  As a readability
> > aid, it's nice to have obviously-named functions that write these sorts
> > of records into the stream - for example, the first such function was
> > write_end_record(), which reads much more pleasantly at its call-site
> > than write_generic_record(REC_TYPE_END) would.  However, it's tedious
> > and error-prone to copy-paste the generic body of such a function for
> > each new trivial record type.
> >
> > Add a helper macro that takes a name base and a record type and declares
> > the corresponding trivial record write function.  Use this to re-define
> > the two existing trivial record functions, write_end_record() and
> > write_checkpoint_record().
> >
> > No functional change.
> >
> > Signed-off-by: Joshua Otto 
> 
> -1.
> 
> This hides the functions from tools like cscope, and makes the code
> harder to read.  I also don't really buy the error prone argument.

Okay, fair enough.

> 
> If you do want to avoid opencoding different functions, how about
> 
> static int write_zerolength_record(uint32_t record_type)
> 
> and updating the existing callsites to be
> 
> write_zerolength_record(REC_TYPE_END); etc.

I really do prefer write_end_record() to write_some_record(REC_TYPE_END),
visually.  I'll fix up the later patches to add the corresponding functions
without the macro.

Josh

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC XEN PATCH v2 00/15] Add vNVDIMM support to HVM domains

2017-03-29 Thread Dan Williams
On Sun, Mar 19, 2017 at 5:09 PM, Haozhong Zhang
 wrote:
> This is v2 RFC patch series to add vNVDIMM support to HVM domains.
> v1 can be found at 
> https://lists.xenproject.org/archives/html/xen-devel/2016-10/msg00424.html.
>
> No label and no _DSM except function 0 "query implemented functions"
> is supported by this version, but they will be added by future patches.
>
> The corresponding Qemu patch series is sent in another thread
> "[RFC QEMU PATCH v2 00/10] Implement vNVDIMM for Xen HVM guest".
>
> All patch series can be found at
>   Xen:  https://github.com/hzzhan9/xen.git nvdimm-rfc-v2
>   Qemu: https://github.com/hzzhan9/qemu.git xen-nvdimm-rfc-v2
>
> Changes in v2
> ==
>
> - One of the primary changes in v2 is dropping the linux kernel
>   patches, which were used to reserve on host pmem for placing its
>   frametable and M2P table. In v2, we add a management tool xen-ndctl
>   which is used in Dom0 to notify Xen hypervisor of which storage can
>   be used to manage the host pmem.
>
>   For example,
>   1.   xen-ndctl setup 0x24 0x38 0x38 0x3c
> tells Xen hypervisor to use host pmem pages at MFN 0x38 ~
> 0x3c to manage host pmem pages at MFN 0x24 ~ 0x38.
> I.e. the former is used to place the frame table and M2P table of
> both ranges of pmem pages.
>
>   2.   xen-ndctl setup 0x24 0x38
> tells Xen hypervisor to use the regular RAM to manage the host
> pmem pages at MFN 0x24 ~ 0x38. I.e the regular RMA is used
> to place the frame table and M2P table.
>
> - Another primary change in v2 is dropping the support to map files on
>   the host pmem to HVM domains as virtual NVDIMMs, as I cannot find a
>   stable to fix the fiemap of host files. Instead, we can rely on the
>   ability added in Linux kernel v4.9 that enables creating multiple
>   pmem namespaces on a single nvdimm interleave set.

This restriction is unfortunate, and it seems to limit the future
architecture of the pmem driver. We may not always be able to
guarantee a contiguous physical address range to Xen for a given
namespace and may want to concatenate disjoint physical address ranges
into a logically contiguous namespace.

Is there a resource I can read more about why the hypervisor needs to
have this M2P mapping for nvdimm support?

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH RFC 00/20] Add postcopy live migration support

2017-03-29 Thread Joshua Otto
On Tue, Mar 28, 2017 at 03:41:02PM +0100, Wei Liu wrote:
> Hi Harley, Chester and Joshua
> 
> This is really nice work. I took a brief look at all the patches, they
> look really high quality.

Thank you!

> 
> We're currently approaching freeze for a Xen release. We've got a lot on
> our plate. I think maintainers will get to this series at some point.

Understood.  We're currently approaching our final exams so that's probably for
the best :)

> 
> From the look of things some patches can go in because they're general
> useful.
> 
> On Mon, Mar 27, 2017 at 05:06:12AM -0400, Joshua Otto wrote:
> > Hi,
> > 
> > We're a team of three fourth-year undergraduate software engineering 
> > students at
> > the University of Waterloo in Canada.  In late 2015 we posted on the list 
> > [1] to
> > ask for a project to undertake for our program's capstone design project, 
> > and
> > Andrew Cooper pointed us in the direction of the live migration 
> > implementation
> > as an area that could use some attention.  We were particularly interested 
> > in
> > post-copy live migration (as evaluated by [2] and discussed on the list at 
> > [3]),
> > and have been working on an implementation of this on-and-off since then.
> > 
> > We now have a working implementation of this scheme, and are submitting it 
> > for
> > comment.  The changes are also available as the 'postcopy' branch of the 
> > GitHub
> > repository at [4]
> > 
> > As a brief overview of our approach:
> > - We introduce a mechanism by which libxl can indicate to the libxc stream
> >   helper process that the iterative migration precopy loop should be 
> > terminated
> >   and postcopy should begin.
> > - At this point, we suspend the domain, collect the final set of dirty pfns 
> > and
> >   write these pfns (and _not_ their contents) into the stream.
> > - At the destination, the xc restore logic registers itself as a pager for 
> > the
> >   migrating domain, 'evicts' all of the pfns indicated by the sender as
> >   outstanding, and then resumes the domain at the destination.
> > - As the domain executes, the migration sender continues to push the 
> > remaining
> >   oustanding pages to the receiver in the background.  The receiver
> >   monitors both the stream for incoming page data and the paging ring event
> >   channel for page faults triggered by the guest.  Page faults are 
> > forwarded on
> >   the back-channel migration stream to the migration sender, which 
> > prioritizes
> >   these pages for transmission.
> > 
> > By leveraging the existing paging API, we are able to implement the postcopy
> > scheme without any hypervisor modifications - all of our changes are 
> > confined to
> > the userspace toolstack.  However, we inherit from the paging API the
> > requirement that the domains be HVM and that the host have HAP/EPT support.
> > 
> 
> Please consider writing a design document for this feature and stick it
> at the beginning of your series in the future. You can find examples
> under docs/designs.

Absolutely, I'll submit one with v2.

> 
> The restriction is a bit unfortunate, but we shouldn't block useful work
> because it's incomplete. We just need to make sure should someone decide
> to implement similar functionality for PV guest, they should be able to
> do so.
> 
> You might want to check if shadow paging can be used with paging API,
> such that you can widen the requirement to HVM guest support.
> 
> > We haven't yet had the opportunity to perform a quantitative evaluation of 
> > the
> > performance trade-offs between the traditional pre-copy and our post-copy
> > strategies, but intend to.  Informally, we've been testing our 
> > implementation by
> > migrating a domain running the x86 memtest program (which is obviously a
> > tremendously write-heavy workload), and have observed a substantial 
> > reduction in
> > total time required for migration completion (at the expense of a visually
> > obvious 'slowdown' in the execution of the program).  We've also noticed 
> > that,
> > when performing a postcopy without any leading precopy iterations, the time
> > required at the destination to 'evict' all of the outstanding pages is
> > substantial - possibly because there is no batching mechanism by which 
> > pages can
> > be evicted - so this area in particular might require further attention.
> > 
> 
> Please do post numbers when you have them. For now, please be patient
> and wait for people to comment.

Will do.  As a general question for those following the thread, are there any
application workloads/benchmarks that people would find particularly
interesting?

The experiment that we've planned but haven't had the time to follow through
fully is to mount a ramdisk inside the guest and use Axboe's fio to test all of
the entries in the (read/write mix) x (working set size) x (access pattern)
matrix.

Thank you again for your feedback!

Josh

___
Xen-devel mailing list

Re: [Xen-devel] [GSoC]about the task "Share a page in memory from the VM config file"

2017-03-29 Thread Zhongze Liu
Hi Stefano,

What do you say if we extend this project into "sharing multiple
ranges of memory area among VMs from the config file".

Cheers.

Zhongze Liu

2017-03-30 9:07 GMT+08:00 Zhongze Liu :
> Hi Stefano,
>
> Thanks for reminding me of the deadline and providing me with more information
> on this project.
>
> I did setup the arm model and doing well with it. But after that, I
> encountered some
> private issues that I have to handle first, so I failed to get back to
> you immediately.
> Sorry for that.
>
> I would finish the proposal ASAP and start discussing the
> implementation details with
> you and Julien.
>
> Thanks again.
>
> Cheers.
>
> Zhongze Liu
>
> 2017-03-29 8:15 GMT+08:00 Stefano Stabellini :
>> On Tue, 28 Mar 2017, Stefano Stabellini wrote:
>>> Hello Zhongze,
>>>
>>> did you manage to make any progress with the ARM model?
>>>
>>> Finally, I would like to remind you of the upcoming deadline for
>>> applications submissions, which is the 3rd of April for GSoC and the
>>> 30th of March for Outreachy, see:
>>>
>>> http://marc.info/?l=xen-devel=149071502330534
>>>
>>> Please give a look at the Xen Project application template here:
>>>
>>> https://wiki.xenproject.org/wiki/GSoC_Student_Application_Template
>>>
>>> It also includes an "Implementation Plan", where you have the chance to
>>> explain the implementation plan for the projects you would like to apply
>>> for.
>>
>> FYI I added more info on this project here:
>>
>> http://marc.info/?l=xen-devel=149074641908123
>>
>>
>>> Thanks,
>>>
>>> Stefano
>>>
>>>
>>> On Wed, 22 Mar 2017, Stefano Stabellini wrote:
>>> > On Thu, 23 Mar 2017, Zhongze Liu wrote:
>>> > > Back to the GSoC task.
>>> > > Do I need to meet any special hardware requirements to complete the 
>>> > > task?
>>> >
>>> > No special hardware requirements, what you have below is more than
>>> > enough. However, if you have some spare time, it would be helpful for
>>> > you to setup an ARM build and test environment too, because it would be
>>> > nice if you could make this project work on ARM as well. In fact, this
>>> > work is mostly meant to help users in embedded scenarios, which are
>>> > mostly ARM based. You could go very far in this project with only the
>>> > x86 hardware you have, but you'll need ARM for the last bit.
>>> >
>>> > First you need a cross-compiler, you can download the latest from
>>> > linaro:
>>> >
>>> > https://releases.linaro.org/components/toolchain/binaries/latest/armv8l-linux-gnueabihf/gcc-linaro-6.3.1-2017.02-x86_64_armv8l-linux-gnueabihf.tar.xz
>>> >
>>> > You can download a free ARMv8 emulator from here:
>>> >
>>> > https://developer.arm.com/products/system-design/fixed-virtual-platforms
>>> >
>>> > choose "ARMv8-A Foundation Platform for Linux". Then follow these
>>> > instructions:
>>> >
>>> > http://marc.info/?l=xen-devel=149021352631609
>>> >
>>> > Please read the following emails in the thread too which tells you how
>>> > to build Xen and Linux for ARM. We are still solving some issues but the
>>> > steps so far are also on the wiki, see "Firmware & boot-wrapper" ->
>>> > arm64 and "Foundation Model":
>>> >
>>> > https://wiki.xen.org/wiki/Xen_ARM_with_Virtualization_Extensions/FastModels
>>> >
>>> >
>>> > > Currently I have 12G RAM and 128G SSD + 1T HDD.
>>> > > The output of "lscpu" on my test machine is as follows:
>>> > >
>>> > >Architecture:  x86_64
>>> > >CPU op-mode(s):32-bit, 64-bit
>>> > >Byte Order:Little Endian
>>> > >CPU(s):4
>>> > >On-line CPU(s) list:   0-3
>>> > >Thread(s) per core:1
>>> > >Core(s) per socket:4
>>> > >Socket(s): 1
>>> > >NUMA node(s):  1
>>> > >Vendor ID: GenuineIntel
>>> > >CPU family:6
>>> > >Model: 94
>>> > >Model name:Intel(R) Core(TM) i5-6300HQ CPU @ 2.30GHz
>>> > >Stepping:  3
>>> > >CPU MHz:   800.030
>>> > >CPU max MHz:   3200.
>>> > >CPU min MHz:   800.
>>> > >BogoMIPS:  4609.00
>>> > >Virtualization:VT-x
>>> > >L1d cache: 32K
>>> > >L1i cache: 32K
>>> > >L2 cache:  256K
>>> > >L3 cache:  6144K
>>> > >NUMA node0 CPU(s): 0-3
>>> > >Flags: fpu vme de pse tsc msr pae mce cx8 apic sep
>>> > > mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht
>>> > > tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs
>>> > > bts rep_good nopl xtopology nonstop_tsc aperfmperf tsc_known_freq pni
>>> > > pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr
>>> > > pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes
>>> > > xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb intel_pt
>>> > > tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle

[Xen-devel] [qemu-mainline test] 106977: tolerable FAIL - PUSHED

2017-03-29 Thread osstest service owner
flight 106977 qemu-mainline real [real]
http://logs.test-lab.xenproject.org/osstest/logs/106977/

Failures :-/ but no regressions.

Regressions which are regarded as allowable (not blocking):
 test-armhf-armhf-xl-rtds15 guest-start/debian.repeat fail REGR. vs. 106965
 test-armhf-armhf-libvirt-xsm 13 saverestore-support-checkfail  like 106965
 test-amd64-i386-xl-qemuu-win7-amd64 16 guest-stop fail like 106965
 test-armhf-armhf-libvirt-raw 12 saverestore-support-checkfail  like 106965
 test-amd64-amd64-xl-qemuu-win7-amd64 16 guest-stopfail like 106965
 test-amd64-amd64-xl-rtds  9 debian-install   fail  like 106965
 test-armhf-armhf-libvirt 13 saverestore-support-checkfail  like 106965

Tests which did not succeed, but are not blocking:
 test-arm64-arm64-libvirt-xsm  1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl   1 build-check(1)   blocked  n/a
 build-arm64-libvirt   1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt-qcow2  1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt  1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-credit2   1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-rtds  1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-multivcpu  1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-xsm   1 build-check(1)   blocked  n/a
 build-arm64-xsm   5 xen-buildfail   never pass
 test-amd64-i386-libvirt-xsm  12 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 12 migrate-support-checkfail   never pass
 build-arm64   5 xen-buildfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check 
fail never pass
 build-arm64-pvops 5 kernel-build fail   never pass
 test-amd64-amd64-libvirt-vhd 11 migrate-support-checkfail   never pass
 test-amd64-amd64-qemuu-nested-amd 16 debian-hvm-install/l1/l2  fail never pass
 test-armhf-armhf-xl-multivcpu 12 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 13 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-credit2  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-xsm  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-xsm  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 12 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 13 saverestore-support-checkfail never pass
 test-amd64-amd64-libvirt 12 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  12 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 11 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  11 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  12 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 13 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 12 migrate-support-checkfail   never pass

version targeted for testing:
 qemuudf9046363220e57d45818312759b954c033c58ab
baseline version:
 qemuu0491c221547a38b58e41fade9953cd1cf015288b

Last test of basis   106965  2017-03-28 12:15:00 Z1 days
Testing same since   106977  2017-03-29 16:16:55 Z0 days1 attempts


People who touched revisions under test:
  Emilio G. Cota 
  Jeff Cody 
  Markus Armbruster 
  Peter Maydell 
  Stefan Hajnoczi 

jobs:
 build-amd64-xsm  pass
 build-arm64-xsm  fail
 build-armhf-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-arm64  fail
 build-armhf  

Re: [Xen-devel] [PATCH v2] xen/arm32: Introduce alternative runtime patching

2017-03-29 Thread Wei Chen
Hi Julien,

On 2017/3/29 22:07, Julien Grall wrote:
>
>
> On 29/03/17 10:28, Wei Chen wrote:
>> Hi Julien,
>>
>> On 2017/3/29 16:40, Julien Grall wrote:
>>> Hi Wei,
>>>
>>> On 28/03/2017 08:23, Wei Chen wrote:
 diff --git a/xen/include/asm-arm/arm32/insn.h 
 b/xen/include/asm-arm/arm32/insn.h
 new file mode 100644
 index 000..4cda69e
 --- /dev/null
 +++ b/xen/include/asm-arm/arm32/insn.h
 @@ -0,0 +1,65 @@
 +/*
 +  * Copyright (C) 2017 ARM Ltd.
 +  *
 +  * This program is free software; you can redistribute it and/or modify
 +  * it under the terms of the GNU General Public License version 2 as
 +  * published by the Free Software Foundation.
 +  *
 +  * This program is distributed in the hope that it will be useful,
 +  * but WITHOUT ANY WARRANTY; without even the implied warranty of
 +  * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 +  * GNU General Public License for more details.
 +  *
 +  * You should have received a copy of the GNU General Public License
 +  * along with this program.  If not, see .
 +  */
 +#ifndef __ARCH_ARM_ARM32_INSN
 +#define __ARCH_ARM_ARM32_INSN
 +
 +#include 
 +
 +#define __AARCH32_INSN_FUNCS(abbr, mask, val)   \
 +static always_inline bool_t aarch32_insn_is_##abbr(uint32_t code) \
 +{ \
 +return (code & (mask)) == (val);  \
 +}
 +
 +/*
 + * From ARM DDI 0406C.c Section A8.8.18 and A8.8.25. We can see that
 + * unconditional blx and conditional b have the same value field and imm
 + * length. And from ARM DDI 0406C.c Section A5.7 Table A5-23, we can see
 + * that the blx is the only one unconditional instruction has the same
 + * value with conditional branch instructions. So we define the b and blx
 + * in the same macro to check them at the same time.
 + */
>>>
>>> I don't think this is true. The encodings are:
>>>   - b   1010
>>>   - bl  1011
>>>   - blx 101H
>>>
>>> where  != 0b. So both helpers (aarch32_insn_is_{b_or_blx,bl})
>>> will recognize the blx instruction depending on the value of bit H.
>>>
>>
>> I think I had made a misunderstanding of the H bit. I always thought
>> the H bit in ARM instruction set is 0.
>
> Because Xen is only using ARM instructions, blx will always have H = 0.
> But this is not what you described in your comment.

Yes, I missed that. I would fix it.

>
>>
>>> That's why I suggested to introduce a new helper checking for blx.
>>>
>>
>> I think that's not enough. Current macro will mask the conditional bits.
>> So no matter what the value of H bit, the blx will be recognized in
>> aarch32_insn_is_{b, bl}.
>>
>> I think we should update the __AARCH32_INSN_FUNCS to cover the cond
>> bits.
>>
>> #define __UNCONDITIONAL_INSN(code)   (((code) >> 28) == 0xF)
>>
>> #define __AARCH32_INSN_FUNCS(abbr, mask, val)   \
>> static always_inline bool_t aarch32_insn_is_##abbr(uint32_t code) \
>> { \
>>  return !__UNCONDITIONAL_INSN(code) && (code & (mask)) == (val);   \
>> }
>>
>> #define __AARCH32_UNCOND_INSN_FUNCS(abbr, mask, val)   \
>> static always_inline bool_t aarch32_insn_is_##abbr(uint32_t code) \
>> { \
>>  return __UNCONDITIONAL_INSN(code) && (code & (mask)) == (val);   \
>> }
>>
>> __AARCH32_UNCOND_INSN_FUNCS(blx,  0x0E00, 0x0A00)
>
> Looking at the code you aarch32_insn_is_* helpers are only used in
> aarch32_insn_is_branch_imm. So why don't you open-code the checks in the
> latter helper?
>

That's a good opinion!

> Cheers,
>


-- 
Regards,
Wei Chen

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [linux-linus test] 106976: regressions - FAIL

2017-03-29 Thread osstest service owner
flight 106976 linux-linus real [real]
http://logs.test-lab.xenproject.org/osstest/logs/106976/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-armhf-armhf-xl-arndale  11 guest-start   fail REGR. vs. 59254
 test-armhf-armhf-xl-credit2  11 guest-start   fail REGR. vs. 59254
 test-armhf-armhf-xl-multivcpu 11 guest-start  fail REGR. vs. 59254
 test-armhf-armhf-libvirt-xsm 11 guest-start   fail REGR. vs. 59254
 test-armhf-armhf-xl-cubietruck 11 guest-start fail REGR. vs. 59254
 test-armhf-armhf-libvirt 11 guest-start   fail REGR. vs. 59254
 test-armhf-armhf-xl-xsm  11 guest-start   fail REGR. vs. 59254
 test-armhf-armhf-xl  11 guest-start   fail REGR. vs. 59254

Regressions which are regarded as allowable (not blocking):
 test-armhf-armhf-xl-rtds 11 guest-start   fail REGR. vs. 59254
 test-amd64-amd64-xl-rtds  9 debian-installfail REGR. vs. 59254
 test-armhf-armhf-xl-vhd   9 debian-di-install   fail baseline untested
 test-armhf-armhf-libvirt-raw  9 debian-di-install   fail baseline untested
 test-amd64-i386-xl-qemuu-win7-amd64 16 guest-stop  fail like 59254
 test-amd64-amd64-xl-qemuu-win7-amd64 16 guest-stop fail like 59254
 test-amd64-amd64-xl-qemut-win7-amd64 16 guest-stop fail like 59254
 test-amd64-i386-xl-qemut-win7-amd64 16 guest-stop  fail like 59254

Tests which did not succeed, but are not blocking:
 test-arm64-arm64-libvirt-xsm  1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl   1 build-check(1)   blocked  n/a
 build-arm64-libvirt   1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt-qcow2  1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt  1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-credit2   1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-rtds  1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-multivcpu  1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-xsm   1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt  12 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  12 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 12 migrate-support-checkfail   never pass
 build-arm64-xsm   5 xen-buildfail   never pass
 test-amd64-amd64-libvirt 12 migrate-support-checkfail   never pass
 build-arm64   5 xen-buildfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-vhd 11 migrate-support-checkfail   never pass
 test-amd64-amd64-qemuu-nested-amd 16 debian-hvm-install/l1/l2  fail never pass

version targeted for testing:
 linuxfe82203b63e598c34d96e846dea49679a726fc7a
baseline version:
 linux45820c294fe1b1a9df495d57f40585ef2d069a39

Last test of basis59254  2015-07-09 04:20:48 Z  629 days
Failing since 59348  2015-07-10 04:24:05 Z  628 days  363 attempts
Testing same since   106976  2017-03-29 16:16:41 Z0 days1 attempts


8113 people touched revisions under test,
not listing them all

jobs:
 build-amd64-xsm  pass
 build-arm64-xsm  fail
 build-armhf-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-arm64  fail
 build-armhf  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-arm64-libvirt  blocked 
 build-armhf-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-arm64-pvopspass
 build-armhf-pvopspass
 build-i386-pvops pass
 build-amd64-rumprun  pass
 build-i386-rumprun   pass
 test-amd64-amd64-xl  pass
 test-arm64-arm64-xl  

Re: [Xen-devel] [PATCH v9 12/25] x86: refactor psr: L3 CAT: set value: implement cos id picking flow.

2017-03-29 Thread Yi Sun
On 17-03-30 09:37:33, Yi Sun wrote:
> On 17-03-29 03:57:52, Jan Beulich wrote:
> > >>> On 29.03.17 at 03:36,  wrote:
> > > On 17-03-29 09:20:21, Yi Sun wrote:
> > >> On 17-03-28 06:20:48, Jan Beulich wrote:
> > >> > >>> On 28.03.17 at 13:59,  wrote:
> > >> > > I think we at least need a 'get_val()' hook.
> > >> > 
> > >> > Of course.
> > >> > 
> > >> > > I try to implement CAT/CDP hook.
> > >> > > Please help to check if this is what you thought.
> > >> > 
> > >> > One remark below, but other than that - yes.
> > >> > 
> > >> > > static void cat_get_val(const struct feat_node *feat, unsigned int 
> > >> > > cos,
> > >> > > enum cbm_type type, int flag, uint32_t *val)
> > >> > > {
> > >> > > *val = feat->cos_reg_val[cos];
> > >> > > }
> > >> > > 
> > >> > > static void l3_cdp_get_val(const struct feat_node *feat, unsigned 
> > >> > > int cos,
> > >> > >enum cbm_type type, int flag, uint32_t 
> > >> > > *val)
> > >> > > {
> > >> > > if ( type == PSR_CBM_TYPE_L3_DATA || flag == 0 )
> > >> > > *val = get_cdp_data(feat, cos);
> > >> > > if ( type == PSR_CBM_TYPE_L3_CODE || flag == 1 )
> > >> > > *val = get_cdp_code(feat, cos);
> > >> > > }
> > >> > 
> > >> > Why the redundancy between type and flag?
> > >> > 
> > >> For psr_get_val, upper layer input the cbm_type to get either DATA or 
> > >> CODE
> > >> value. For other cases, we use flag as cos_num index to get either DATA 
> > >> or
> > >> CODE.
> > >> 
> > > Let me explain more to avoid confusion. For other cases, we use cos_num as
> > > index to get values from a feature. In these cases, we do not know the
> > > cbm_type of the feature. So, I use the cos_num as flag to make 'get_val'
> > > know which value should be returned.
> > 
> > I'm pretty sure this redundancy can be avoided.
> > 
> Then, I think I have to reuse the 'type'. As only CDP needs type to decide
> which value to be returned so far, I think I can implement codes like below
> to make CDP can handle all scenarios.
> 
> static void l3_cdp_get_val(const struct feat_node *feat, unsigned int cos,
>enum cbm_type type, uint32_t *val)
> {
> if ( type == PSR_CBM_TYPE_L3_DATA || flag == 0xF000 )
> *val = get_cdp_data(feat, cos);
> if ( type == PSR_CBM_TYPE_L3_CODE || flag == 0xF001 )
> *val = get_cdp_code(feat, cos);
> }
> 
> static bool fits_cos_max(...)
> {
> ..
> for (i = 0; i < feat->props->cos_num; i++)
> {
> feat->props->get_val(feat, cos, i + 0xF000, _val);
> if ( val[i] == default_val )
> ..
> }
> ..
> }
> 
> Is that good for you?

Sorry, a mistake, forgot to change 'flag' to 'type' in 'l3_cdp_get_val'.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v9 12/25] x86: refactor psr: L3 CAT: set value: implement cos id picking flow.

2017-03-29 Thread Yi Sun
On 17-03-29 03:57:52, Jan Beulich wrote:
> >>> On 29.03.17 at 03:36,  wrote:
> > On 17-03-29 09:20:21, Yi Sun wrote:
> >> On 17-03-28 06:20:48, Jan Beulich wrote:
> >> > >>> On 28.03.17 at 13:59,  wrote:
> >> > > I think we at least need a 'get_val()' hook.
> >> > 
> >> > Of course.
> >> > 
> >> > > I try to implement CAT/CDP hook.
> >> > > Please help to check if this is what you thought.
> >> > 
> >> > One remark below, but other than that - yes.
> >> > 
> >> > > static void cat_get_val(const struct feat_node *feat, unsigned int cos,
> >> > > enum cbm_type type, int flag, uint32_t *val)
> >> > > {
> >> > > *val = feat->cos_reg_val[cos];
> >> > > }
> >> > > 
> >> > > static void l3_cdp_get_val(const struct feat_node *feat, unsigned int 
> >> > > cos,
> >> > >enum cbm_type type, int flag, uint32_t *val)
> >> > > {
> >> > > if ( type == PSR_CBM_TYPE_L3_DATA || flag == 0 )
> >> > > *val = get_cdp_data(feat, cos);
> >> > > if ( type == PSR_CBM_TYPE_L3_CODE || flag == 1 )
> >> > > *val = get_cdp_code(feat, cos);
> >> > > }
> >> > 
> >> > Why the redundancy between type and flag?
> >> > 
> >> For psr_get_val, upper layer input the cbm_type to get either DATA or CODE
> >> value. For other cases, we use flag as cos_num index to get either DATA or
> >> CODE.
> >> 
> > Let me explain more to avoid confusion. For other cases, we use cos_num as
> > index to get values from a feature. In these cases, we do not know the
> > cbm_type of the feature. So, I use the cos_num as flag to make 'get_val'
> > know which value should be returned.
> 
> I'm pretty sure this redundancy can be avoided.
> 
Then, I think I have to reuse the 'type'. As only CDP needs type to decide
which value to be returned so far, I think I can implement codes like below
to make CDP can handle all scenarios.

static void l3_cdp_get_val(const struct feat_node *feat, unsigned int cos,
   enum cbm_type type, uint32_t *val)
{
if ( type == PSR_CBM_TYPE_L3_DATA || flag == 0xF000 )
*val = get_cdp_data(feat, cos);
if ( type == PSR_CBM_TYPE_L3_CODE || flag == 0xF001 )
*val = get_cdp_code(feat, cos);
}

static bool fits_cos_max(...)
{
..
for (i = 0; i < feat->props->cos_num; i++)
{
feat->props->get_val(feat, cos, i + 0xF000, _val);
if ( val[i] == default_val )
..
}
..
}

Is that good for you?

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [GSoC]about the task "Share a page in memory from the VM config file"

2017-03-29 Thread Zhongze Liu
Hi Stefano,

Thanks for reminding me of the deadline and providing me with more information
on this project.

I did setup the arm model and doing well with it. But after that, I
encountered some
private issues that I have to handle first, so I failed to get back to
you immediately.
Sorry for that.

I would finish the proposal ASAP and start discussing the
implementation details with
you and Julien.

Thanks again.

Cheers.

Zhongze Liu

2017-03-29 8:15 GMT+08:00 Stefano Stabellini :
> On Tue, 28 Mar 2017, Stefano Stabellini wrote:
>> Hello Zhongze,
>>
>> did you manage to make any progress with the ARM model?
>>
>> Finally, I would like to remind you of the upcoming deadline for
>> applications submissions, which is the 3rd of April for GSoC and the
>> 30th of March for Outreachy, see:
>>
>> http://marc.info/?l=xen-devel=149071502330534
>>
>> Please give a look at the Xen Project application template here:
>>
>> https://wiki.xenproject.org/wiki/GSoC_Student_Application_Template
>>
>> It also includes an "Implementation Plan", where you have the chance to
>> explain the implementation plan for the projects you would like to apply
>> for.
>
> FYI I added more info on this project here:
>
> http://marc.info/?l=xen-devel=149074641908123
>
>
>> Thanks,
>>
>> Stefano
>>
>>
>> On Wed, 22 Mar 2017, Stefano Stabellini wrote:
>> > On Thu, 23 Mar 2017, Zhongze Liu wrote:
>> > > Back to the GSoC task.
>> > > Do I need to meet any special hardware requirements to complete the task?
>> >
>> > No special hardware requirements, what you have below is more than
>> > enough. However, if you have some spare time, it would be helpful for
>> > you to setup an ARM build and test environment too, because it would be
>> > nice if you could make this project work on ARM as well. In fact, this
>> > work is mostly meant to help users in embedded scenarios, which are
>> > mostly ARM based. You could go very far in this project with only the
>> > x86 hardware you have, but you'll need ARM for the last bit.
>> >
>> > First you need a cross-compiler, you can download the latest from
>> > linaro:
>> >
>> > https://releases.linaro.org/components/toolchain/binaries/latest/armv8l-linux-gnueabihf/gcc-linaro-6.3.1-2017.02-x86_64_armv8l-linux-gnueabihf.tar.xz
>> >
>> > You can download a free ARMv8 emulator from here:
>> >
>> > https://developer.arm.com/products/system-design/fixed-virtual-platforms
>> >
>> > choose "ARMv8-A Foundation Platform for Linux". Then follow these
>> > instructions:
>> >
>> > http://marc.info/?l=xen-devel=149021352631609
>> >
>> > Please read the following emails in the thread too which tells you how
>> > to build Xen and Linux for ARM. We are still solving some issues but the
>> > steps so far are also on the wiki, see "Firmware & boot-wrapper" ->
>> > arm64 and "Foundation Model":
>> >
>> > https://wiki.xen.org/wiki/Xen_ARM_with_Virtualization_Extensions/FastModels
>> >
>> >
>> > > Currently I have 12G RAM and 128G SSD + 1T HDD.
>> > > The output of "lscpu" on my test machine is as follows:
>> > >
>> > >Architecture:  x86_64
>> > >CPU op-mode(s):32-bit, 64-bit
>> > >Byte Order:Little Endian
>> > >CPU(s):4
>> > >On-line CPU(s) list:   0-3
>> > >Thread(s) per core:1
>> > >Core(s) per socket:4
>> > >Socket(s): 1
>> > >NUMA node(s):  1
>> > >Vendor ID: GenuineIntel
>> > >CPU family:6
>> > >Model: 94
>> > >Model name:Intel(R) Core(TM) i5-6300HQ CPU @ 2.30GHz
>> > >Stepping:  3
>> > >CPU MHz:   800.030
>> > >CPU max MHz:   3200.
>> > >CPU min MHz:   800.
>> > >BogoMIPS:  4609.00
>> > >Virtualization:VT-x
>> > >L1d cache: 32K
>> > >L1i cache: 32K
>> > >L2 cache:  256K
>> > >L3 cache:  6144K
>> > >NUMA node0 CPU(s): 0-3
>> > >Flags: fpu vme de pse tsc msr pae mce cx8 apic sep
>> > > mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht
>> > > tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs
>> > > bts rep_good nopl xtopology nonstop_tsc aperfmperf tsc_known_freq pni
>> > > pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr
>> > > pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes
>> > > xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb intel_pt
>> > > tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle
>> > > avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt
>> > > xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify
>> > > hwp_act_window hwp_epp
>> > >
>> > > Cheers.
>> > >
>> > > Zhongze Liu.
>> > >
>> >
>>

___
Xen-devel mailing list
Xen-devel@lists.xen.org

[Xen-devel] [ovmf baseline-only test] 71120: all pass

2017-03-29 Thread Platform Team regression test user
This run is configured for baseline tests only.

flight 71120 ovmf real [real]
http://osstest.xs.citrite.net/~osstest/testlogs/logs/71120/

Perfect :-)
All tests in this flight passed as required
version targeted for testing:
 ovmf 6e7ec25aaaf0dfc2b4c84ffd4c7ee7cd442aecb6
baseline version:
 ovmf 89ad870fbff03a511102c73773000f2bea2017d2

Last test of basis71114  2017-03-28 17:46:51 Z1 days
Testing same since71120  2017-03-29 21:52:07 Z0 days1 attempts


People who touched revisions under test:
  Ard Biesheuvel 
  Bell Song 
  David Woodhouse 
  Gary Lin 
  Laszlo Ersek 
  Liming Gao 
  Qin Long 
  Ruiyu Ni 
  Song, BinX 
  Yonghong Zhu 

jobs:
 build-amd64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 pass
 test-amd64-i386-xl-qemuu-ovmf-amd64  pass



sg-report-flight on osstest.xs.citrite.net
logs: /home/osstest/logs
images: /home/osstest/images

Logs, config files, etc. are available at
http://osstest.xs.citrite.net/~osstest/testlogs/logs

Test harness code can be found at
http://xenbits.xensource.com/gitweb?p=osstest.git;a=summary


Push not applicable.

(No revision log; it would be 427 lines long.)

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v5 2/3] xen/arm: move setting of new target vcpu to vgic_migrate_irq

2017-03-29 Thread Stefano Stabellini
On Fri, 3 Mar 2017, Julien Grall wrote:
> Hi Stefano,
> 
> On 01/03/17 22:15, Stefano Stabellini wrote:
> > Move the atomic write of rank->vcpu, which sets the new vcpu target, to
> > vgic_migrate_irq, at the beginning of the lock protected area (protected
> > by the vgic lock).
> > 
> > This code movement reduces race conditions between vgic_migrate_irq and
> > setting rank->vcpu on one pcpu and gic_update_one_lr on another pcpu.
> > 
> > When gic_update_one_lr and vgic_migrate_irq take the same vgic lock,
> > there are no more race conditions with this patch. When vgic_migrate_irq
> > is called multiple times while GIC_IRQ_GUEST_MIGRATING is already set, a
> > race condition still exists because in that case gic_update_one_lr and
> > vgic_migrate_irq take different vgic locks.
> > 
> > Signed-off-by: Stefano Stabellini 
> > ---
> >  xen/arch/arm/vgic-v2.c |  5 ++---
> >  xen/arch/arm/vgic-v3.c |  4 +---
> >  xen/arch/arm/vgic.c| 15 ++-
> >  xen/include/asm-arm/vgic.h |  3 ++-
> >  4 files changed, 15 insertions(+), 12 deletions(-)
> > 
> > diff --git a/xen/arch/arm/vgic-v2.c b/xen/arch/arm/vgic-v2.c
> > index 0674f7b..43b4ac3 100644
> > --- a/xen/arch/arm/vgic-v2.c
> > +++ b/xen/arch/arm/vgic-v2.c
> > @@ -158,10 +158,9 @@ static void vgic_store_itargetsr(struct domain *d,
> > struct vgic_irq_rank *rank,
> >  {
> >  vgic_migrate_irq(d->vcpu[old_target],
> >   d->vcpu[new_target],
> > - virq);
> > + virq,
> > + >vcpu[offset]);
> >  }
> > -
> > -write_atomic(>vcpu[offset], new_target);
> 
> With this change rank->vcpu[offset] will not be updated for virtual SPIs (e.g
> p->desc != NULL). And therefore affinity for them will not work.

Do you mean p->desc == NULL? I don't think we have any purely virtual
SPIs yet, only virtual PPIs (where the target cannot be changed).
However, I think you are right: moving it before the call would also
work and it's simpler. I'll do that.


> However, from my understanding the problem you are trying to solve with this
> patch is having rank->vcpu[offset] to be set as soon as possible. It does not
> really matter if it is protected by the lock, what you care is
> rank->vcpu[offset] been seen before the lock has been released.
> 
> So if GIC_IRQ_GUEST_MIGRATE is set and gic_update_one_lr is running straight
> after the lock is released, rank->vcpu[offset] will contain the correct vCPU.
> 
> A better approach would be to move write_atomic(...) before
> vgic_migrated_irq(...). What do you think?



___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v5 3/3] xen/arm: vgic_migrate_irq: do not race against GIC_IRQ_GUEST_MIGRATING

2017-03-29 Thread Stefano Stabellini
On Fri, 3 Mar 2017, Julien Grall wrote:
> Hi Stefano,
> 
> On 01/03/17 22:15, Stefano Stabellini wrote:
> > A potential race condition occurs when vgic_migrate_irq is called a
> > second time, while GIC_IRQ_GUEST_MIGRATING is already set. In that case,
> > vgic_migrate_irq takes a different vgic lock from gic_update_one_lr.
> 
> Hmmm, vgic_migrate_irq will bail out before accessing inflight list if
> GIC_IRQ_GUEST_MIGRATING is already set:
> 
> /* migrating already in progress, no need to do anything */
> if ( test_bit(GIC_IRQ_GUEST_MIGRATING, >status )
>   return;
> 
> And test_bit is atomic. So I don't understand what is the corruption problem
> you mention.

The scenario is a bit convoluted: GIC_IRQ_GUEST_MIGRATING is already set
and vgic_migrate_irq is called to move the irq again, even though the
first migration is not complete yet. This could happen:


  CPU 0CPU 1
  gic_update_one_lr
  test_and_clear_bit MIGRATING
  read target (old)
write target (new)
vgic_migrate_irq
  test_bit MIGRATING
  irq_set_affinity (new)
  return
  irq_set_affinity (old)


After this patch this would happen:

  CPU 0CPU 1
  gic_update_one_lr
  test_bit MIGRATING
  read target (old)
write target (new)
vgic_migrate_irq
  test MIGRATING && 
GIC_IRQ_GUEST_VISIBLE (false)
  wait until !MIGRATING
  irq_set_affinity (old)
  clear_bit MIGRATING
  irq_set_affinity (new)


> > vgic_migrate_irq running concurrently with gic_update_one_lr could cause
> > data corruptions, as they both access the inflight list.
> > 
> > This patch fixes this problem. In vgic_migrate_irq after setting the new
> > vcpu target, it checks both GIC_IRQ_GUEST_MIGRATING and
> > GIC_IRQ_GUEST_VISIBLE. If they are both set we can just return because
> > we have already set the new target: when gic_update_one_lr reaches
> > the GIC_IRQ_GUEST_MIGRATING test, it will do the right thing.
> > 
> > Otherwise, if GIC_IRQ_GUEST_MIGRATING is set but GIC_IRQ_GUEST_VISIBLE
> > is not, gic_update_one_lr is running at the very same time on another
> > pcpu, so it just waits until it completes (GIC_IRQ_GUEST_MIGRATING is
> > cleared).
> > 
> > Signed-off-by: Stefano Stabellini 
> > ---
> >  xen/arch/arm/gic.c  |  5 -
> >  xen/arch/arm/vgic.c | 16 ++--
> >  2 files changed, 18 insertions(+), 3 deletions(-)
> > 
> > diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> > index 16bb150..a805300 100644
> > --- a/xen/arch/arm/gic.c
> > +++ b/xen/arch/arm/gic.c
> > @@ -508,10 +508,13 @@ static void gic_update_one_lr(struct vcpu *v, int i)
> >   * next pcpu, inflight is already cleared. No concurrent
> >   * accesses to inflight. */
> >  smp_mb();
> > -if ( test_and_clear_bit(GIC_IRQ_GUEST_MIGRATING, >status) )
> > +if ( test_bit(GIC_IRQ_GUEST_MIGRATING, >status) )
> >  {
> >  struct vcpu *v_target = vgic_get_target_vcpu(v, irq);
> >  irq_set_affinity(p->desc, cpumask_of(v_target->processor));
> > +/* Set the new affinity, then clear MIGRATING. */
> > +smp_mb();
> > +clear_bit(GIC_IRQ_GUEST_MIGRATING, >status);
> >  }
> >  }
> >  }
> > diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c
> > index a323e7e..9141b34 100644
> > --- a/xen/arch/arm/vgic.c
> > +++ b/xen/arch/arm/vgic.c
> > @@ -252,13 +252,25 @@ void vgic_migrate_irq(struct vcpu *old, struct vcpu
> > *new,
> >  spin_lock_irqsave(>arch.vgic.lock, flags);
> >  write_atomic(t_vcpu, new->vcpu_id);
> > 
> > -/* migration already in progress, no need to do anything */
> > -if ( test_bit(GIC_IRQ_GUEST_MIGRATING, >status) )
> > +/* Set the new target, then check MIGRATING and VISIBLE, it pairs
> > + * with the barrier in gic_update_one_lr. */
> > +smp_mb();
> > +
> > +/* no need to do anything, gic_update_one_lr will take care of it */
> > +if ( test_bit(GIC_IRQ_GUEST_MIGRATING, >status) &&
> > + test_bit(GIC_IRQ_GUEST_VISIBLE, >status) )
> >  {
> >  spin_unlock_irqrestore(>arch.vgic.lock, flags);
> >  return;
> >  }
> > 
> > +/* gic_update_one_lr is currently running, wait until its completion */
> > +while ( test_bit(GIC_IRQ_GUEST_MIGRATING, >status) )
> > +{
> > +cpu_relax();
> > +smp_rmb();
> > +}
> > +
> >  if ( list_empty(>inflight) )
> >  {
> >  irq_set_affinity(p->desc, 

Re: [Xen-devel] [PATCH RFC 00/20] Add postcopy live migration support

2017-03-29 Thread Andrew Cooper
On 27/03/2017 10:06, Joshua Otto wrote:
> Hi,
>
> We're a team of three fourth-year undergraduate software engineering students 
> at
> the University of Waterloo in Canada.  In late 2015 we posted on the list [1] 
> to
> ask for a project to undertake for our program's capstone design project, and
> Andrew Cooper pointed us in the direction of the live migration implementation
> as an area that could use some attention.  We were particularly interested in
> post-copy live migration (as evaluated by [2] and discussed on the list at 
> [3]),
> and have been working on an implementation of this on-and-off since then.
>
> We now have a working implementation of this scheme, and are submitting it for
> comment.  The changes are also available as the 'postcopy' branch of the 
> GitHub
> repository at [4]
>
> As a brief overview of our approach:
> - We introduce a mechanism by which libxl can indicate to the libxc stream
>   helper process that the iterative migration precopy loop should be 
> terminated
>   and postcopy should begin.
> - At this point, we suspend the domain, collect the final set of dirty pfns 
> and
>   write these pfns (and _not_ their contents) into the stream.
> - At the destination, the xc restore logic registers itself as a pager for the
>   migrating domain, 'evicts' all of the pfns indicated by the sender as
>   outstanding, and then resumes the domain at the destination.
> - As the domain executes, the migration sender continues to push the remaining
>   oustanding pages to the receiver in the background.  The receiver
>   monitors both the stream for incoming page data and the paging ring event
>   channel for page faults triggered by the guest.  Page faults are forwarded 
> on
>   the back-channel migration stream to the migration sender, which prioritizes
>   these pages for transmission.
>
> By leveraging the existing paging API, we are able to implement the postcopy
> scheme without any hypervisor modifications - all of our changes are confined 
> to
> the userspace toolstack.  However, we inherit from the paging API the
> requirement that the domains be HVM and that the host have HAP/EPT support.

Wow.  Considering that the paging API has had no in-tree consumers (and
its out-of-tree consumer folded), I am astounded that it hasn't bitrotten.

>
> We haven't yet had the opportunity to perform a quantitative evaluation of the
> performance trade-offs between the traditional pre-copy and our post-copy
> strategies, but intend to.  Informally, we've been testing our implementation 
> by
> migrating a domain running the x86 memtest program (which is obviously a
> tremendously write-heavy workload), and have observed a substantial reduction 
> in
> total time required for migration completion (at the expense of a visually
> obvious 'slowdown' in the execution of the program).

Do you have any numbers, even for this informal testing?

>   We've also noticed that,
> when performing a postcopy without any leading precopy iterations, the time
> required at the destination to 'evict' all of the outstanding pages is
> substantial - possibly because there is no batching mechanism by which pages 
> can
> be evicted - so this area in particular might require further attention.
>
> We're really interested in any feedback you might have!

Do you have a design document for this?  The spec modifications and code
comments are great, but there is no substitute (as far as understanding
goes) for a description in terms of the algorithm and design choices.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] arm64: dma_to_phys/phys_to_dma need to be properly implemented

2017-03-29 Thread Julien Grall

Hi,

On 29/03/2017 23:36, Stefano Stabellini wrote:

On Wed, 29 Mar 2017, Oleksandr Andrushchenko wrote:
If you can come up with a patch that only affects
xen_swiotlb_get_sgtable, and translates successfully void *cpu_addr into
a physical address using "at", I think I would take that patch. I would
recommend to test the patch on ARM32 too, where virt_to_phys reliably
fails.


I haven't fully read the thread. I would recommend to ask the 
ARM64/ARM32 kernel maintainers there advice here.


Cheers,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] arm64: dma_to_phys/phys_to_dma need to be properly implemented

2017-03-29 Thread Stefano Stabellini
On Wed, 29 Mar 2017, Oleksandr Andrushchenko wrote:
> Hi, Stefano!
> 
> Ok, probably I need to put more details into the use-case
> so it is clear. What I am doing is a DRM driver which
> utilizes PRIME buffer sharing [1] to implement zero-copy
> of display buffers between DomU and Dom0. PRIME is based on
> DMA Buffer Sharing API [2], so this is the reason I am
> dealing with sg_table here.
> 
> On 03/28/2017 10:20 PM, Stefano Stabellini wrote:
> > On Tue, 28 Mar 2017, Oleksandr Andrushchenko wrote:
> > > Hi, Stefano!
> > > 
> > > On 03/27/2017 11:23 PM, Stefano Stabellini wrote:
> > > > Hello Oleksandr,
> > > > 
> > > > Just to clarify, you are talking about dma_to_phys/phys_to_dma in Linux
> > > > (not in Xen), right?
> > > I am talking about Linux, sorry I was not clear
> > > > Drivers shouldn't use those functions directly (see the comment in
> > > > arch/arm64/include/asm/dma-mapping.h), they should call the appropriate
> > > > dma_map_ops functions instead.
> > > Yes, you are correct and I know about this and do not call
> > > dma_to_phys/phys_to_dma directly
> > > >The dma_map_ops functions should do the
> > > > right thing on Xen, even for pages where pfn != mfn, thanks to the
> > > > swiotlb-xen (drivers/xen/swiotlb-xen.c). Specifically, you can see the
> > > > special case for pfn != mfn here (see the "local" variable):
> > > > 
> > > > arch/arm/include/asm/xen/page-coherent.h:xen_dma_map_page
> > > Yes, the scenarios with pfn != mfn we had so
> > > far are all working correct
> > > > So, why are you calling dma_to_phys and phys_to_dma instead of the
> > > > dma_map_ops functions?
> > > Ok, let me give you an example of failing scenario which
> > > was not used before this by any backend (this is from
> > > my work on implementing zero copy for DRM drivers):
> > > 
> > > 1. Create sg table from pages:
> > > sg_alloc_table_from_pages(sgt, PFN_0, ...);
> > > map it and get dev_bus_addr - at this stage sg table is
> > > perfectly correct and properly mapped,
> > > dev_bus_addr == (MFN_u << PAGE_SHIFT)
> > Let me get this straight: one of the pages passed to
> > sg_alloc_table_from_pages is actually a foreign page (pfn != mfn), is
> > that right?
> > 
> > And by "map", you mean dma_get_sgtable_attrs is called on it, right?
> What happening here is:
> - my driver 
> 1. I create an sg table from pages with pfn != mfn (PFN_0/MFN_u)
> using drm_prime_pages_to_sg [3] which effectively is
> sg_alloc_table_from_pages
> - DRM framework 
> 2. I pass the sgt via PRIME to the real display driver
> and it does drm_gem_map_dma_buf [4]
> 3. So, at this point everyting is just fine, because sgt is
> correct (sgl->page_link points to my PFN_0 and p2m translation
> succeeds)
> - real HW DRM driver 
> 4. When real HW display driver accesses sgt it calls dma_get_sgtable
> [5] and then dma_map_sg [6]. And all this is happening on the sgt
> which my driver has provided, but PFN_0 is not honored anymore
> because dma_get_sgtable is expected to be able to figure out
> pfn from the corresponding DMA address.
> 
> So, strictly speaking real HW DRM driver has no issues,
> the API it uses is perfectly valid.
> > 
> > > 2. Duplicate it:
> > > dma_get_sgtable(..., sgt, ... dev_bus_addr,...);
> > Yeah, if one of the pages passed to sg_alloc_table_from_pages is
> > foreign, as Andrii pointed out, dma_get_sgtable
> > (xen_swiotlb_get_sgtable) doesn't actually work.
> This is the case
> > Is it legitimate that one of those pages is foreign or is it a mistake?
> This is the goal - I want pages from DomU to be directly
> accessed by the HW in Dom0 (I have DomU 1:1 mapped,
> so even contiguous buffers can come from DomU, if not
> 1:1 then IPMMU will be in place)
> > If it is a mistake, you could fix it.
> From the above - this is the intention
> >   Otherwise, if the use of
> > sg_alloc_table_from_pages or the following call to dma_get_sgtable are
> > part of your code, I suggest you work-around the problem by avoiding
> > the dma_get_sgtable call altogether.
> As seen from the above the problematic part is not in my
> driver, it is either DRM framework or HW display driver
> >   Don't use the sg_ dma api, use the
> > regular dma api instead.
> I use what DRM provides and dma_xxx if something is missed
> > 
> > 
> > Unfortunately, if the dma_get_sgtable is part of existing code, then we
> > have a problem. In that case, could you point me to the code that call
> > dma_get_sgtable?
> This is the case, see [5]
> > 
> > There is no easy way to make it work on Xen: virt_to_phys doesn't work
> > on ARM and dma_to_phys doesn't work on Xen. We could implement
> > xen_swiotlb_get_sgtable correctly if we had the physical address of the
> > page, because we could easily find out if the page is local or foreign
> > with a pfn != mfn check (similar to the one in
> > include/xen/arm/page-coherent.h:xen_dma_map_page).
> Yes, I saw this code and it helped me to figure out
> 

[Xen-devel] [PATCH v6 3/4] Introduce the Xen 9pfs transport header

2017-03-29 Thread Stefano Stabellini
Define the ring according to the protocol specification, using the new
DEFINE_XEN_FLEX_RING_AND_INTF macro.

Add the header to the C99 check.

Signed-off-by: Stefano Stabellini 
CC: jbeul...@suse.com
CC: konrad.w...@oracle.com
---
 xen/include/Makefile |  5 -
 xen/include/public/io/9pfs.h | 49 
 2 files changed, 53 insertions(+), 1 deletion(-)
 create mode 100644 xen/include/public/io/9pfs.h

diff --git a/xen/include/Makefile b/xen/include/Makefile
index be56738..a3b3583 100644
--- a/xen/include/Makefile
+++ b/xen/include/Makefile
@@ -94,9 +94,12 @@ all: headers.chk headers99.chk headers++.chk
 
 PUBLIC_HEADERS := $(filter-out public/arch-% public/dom0_ops.h, $(wildcard 
public/*.h public/*/*.h) $(public-y))
 
-PUBLIC_C99_HEADERS :=
+PUBLIC_C99_HEADERS := public/io/9pfs.h
 PUBLIC_ANSI_HEADERS := $(filter-out public/%ctl.h public/xsm/% 
public/%hvm/save.h $(PUBLIC_C99_HEADERS), $(PUBLIC_HEADERS))
 
+public/io/9pfs.h-c99-prereq := -include string.h
+public/io/9pfs.h-cxx-prereq := -include cstring
+
 headers.chk: $(PUBLIC_ANSI_HEADERS) Makefile
for i in $(filter %.h,$^); do \
$(CC) -x c -ansi -Wall -Werror -include stdint.h \
diff --git a/xen/include/public/io/9pfs.h b/xen/include/public/io/9pfs.h
new file mode 100644
index 000..4bfd5d4
--- /dev/null
+++ b/xen/include/public/io/9pfs.h
@@ -0,0 +1,49 @@
+/*
+ * 9pfs.h -- Xen 9PFS transport
+ *
+ * Refer to docs/misc/9pfs.markdown for the specification
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to
+ * deal in the Software without restriction, including without limitation the
+ * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
+ * sell copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ *
+ * Copyright (C) 2017 Stefano Stabellini 
+ */
+
+#ifndef __XEN_PUBLIC_IO_9PFS_H__
+#define __XEN_PUBLIC_IO_9PFS_H__
+
+#include "../grant_table.h"
+#include "ring.h"
+
+/*
+ * See docs/misc/9pfs.markdown in xen.git for the full specification:
+ * https://xenbits.xen.org/docs/unstable/misc/9pfs.html
+ */
+DEFINE_XEN_FLEX_RING_AND_INTF(xen_9pfs);
+
+#endif
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v6 4/4] Introduce the pvcalls header

2017-03-29 Thread Stefano Stabellini
Define the ring and request and response structs according to the
specification. Use the new DEFINE_XEN_FLEX_RING macro.

Add the header to the C99 check.

Signed-off-by: Stefano Stabellini 
CC: jbeul...@suse.com
CC: konrad.w...@oracle.com
---
 xen/include/Makefile|   4 +-
 xen/include/public/io/pvcalls.h | 153 
 2 files changed, 156 insertions(+), 1 deletion(-)
 create mode 100644 xen/include/public/io/pvcalls.h

diff --git a/xen/include/Makefile b/xen/include/Makefile
index a3b3583..d31b21d 100644
--- a/xen/include/Makefile
+++ b/xen/include/Makefile
@@ -94,11 +94,13 @@ all: headers.chk headers99.chk headers++.chk
 
 PUBLIC_HEADERS := $(filter-out public/arch-% public/dom0_ops.h, $(wildcard 
public/*.h public/*/*.h) $(public-y))
 
-PUBLIC_C99_HEADERS := public/io/9pfs.h
+PUBLIC_C99_HEADERS := public/io/9pfs.h public/io/pvcalls.h
 PUBLIC_ANSI_HEADERS := $(filter-out public/%ctl.h public/xsm/% 
public/%hvm/save.h $(PUBLIC_C99_HEADERS), $(PUBLIC_HEADERS))
 
 public/io/9pfs.h-c99-prereq := -include string.h
 public/io/9pfs.h-cxx-prereq := -include cstring
+public/io/pvcalls.h-c99-prereq := -include string.h
+public/io/pvcalls.h-cxx-prereq := -include cstring
 
 headers.chk: $(PUBLIC_ANSI_HEADERS) Makefile
for i in $(filter %.h,$^); do \
diff --git a/xen/include/public/io/pvcalls.h b/xen/include/public/io/pvcalls.h
new file mode 100644
index 000..cb81712
--- /dev/null
+++ b/xen/include/public/io/pvcalls.h
@@ -0,0 +1,153 @@
+/*
+ * pvcalls.h -- Xen PV Calls Protocol
+ *
+ * Refer to docs/misc/pvcalls.markdown for the specification
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to
+ * deal in the Software without restriction, including without limitation the
+ * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
+ * sell copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ *
+ * Copyright (C) 2017 Stefano Stabellini 
+ */
+
+#ifndef __XEN_PUBLIC_IO_PVCALLS_H__
+#define __XEN_PUBLIC_IO_PVCALLS_H__
+
+#include "../grant_table.h"
+#include "ring.h"
+
+/*
+ * See docs/misc/pvcalls.markdown in xen.git for the full specification:
+ * https://xenbits.xen.org/docs/unstable/misc/pvcalls.html
+ */
+struct pvcalls_data_intf {
+RING_IDX in_cons, in_prod, in_error;
+
+uint8_t pad1[52];
+
+RING_IDX out_cons, out_prod, out_error;
+
+uint8_t pad2[52];
+
+RING_IDX ring_order;
+grant_ref_t ref[];
+};
+DEFINE_XEN_FLEX_RING(pvcalls);
+
+#define PVCALLS_SOCKET 0
+#define PVCALLS_CONNECT1
+#define PVCALLS_RELEASE2
+#define PVCALLS_BIND   3
+#define PVCALLS_LISTEN 4
+#define PVCALLS_ACCEPT 5
+#define PVCALLS_POLL   6
+
+struct xen_pvcalls_request {
+uint32_t req_id; /* private to guest, echoed in response */
+uint32_t cmd;/* command to execute */
+union {
+struct xen_pvcalls_socket {
+uint64_t id;
+uint32_t domain;
+uint32_t type;
+uint32_t protocol;
+} socket;
+struct xen_pvcalls_connect {
+uint64_t id;
+uint8_t addr[28];
+uint32_t len;
+uint32_t flags;
+grant_ref_t ref;
+uint32_t evtchn;
+} connect;
+struct xen_pvcalls_release {
+uint64_t id;
+uint8_t reuse;
+} release;
+struct xen_pvcalls_bind {
+uint64_t id;
+uint8_t addr[28];
+uint32_t len;
+} bind;
+struct xen_pvcalls_listen {
+uint64_t id;
+uint32_t backlog;
+} listen;
+struct xen_pvcalls_accept {
+uint64_t id;
+uint64_t id_new;
+grant_ref_t ref;
+uint32_t evtchn;
+} accept;
+struct xen_pvcalls_poll {
+uint64_t id;
+} poll;
+/* dummy member to force sizeof(struct xen_pvcalls_request)
+ * to match across archs */
+struct xen_pvcalls_dummy {
+uint8_t dummy[56];
+} dummy;
+} u;
+};
+
+struct xen_pvcalls_response {
+uint32_t 

[Xen-devel] [PATCH v6 0/4] new ring macros, 9pfs and pvcalls headers

2017-03-29 Thread Stefano Stabellini
Hi all,

this patch series introduces a set of new ring macros to support rings
in the formats specified by the Xen 9pfs transport and PV Calls
protocol. It also introduces the Xen 9pfs and PV Calls protocols
headers.

Changes in v6:
- remove stray semicolons
- code style fix for return statements
- make the last element of DEFINE_XEN_FLEX_RING non a static inline
  function
- improve ring.h prereq comment
- remove mask_order
- use ring_size as parameter instead of ring_order
- fix indentation of parameters in ring.h
- improve order of parameters in ring.h
- introduce per header prereqs in xen/include/Makefile

Changes in v5:
- parenthesize uses of macro parameters in XEN_FLEX_RING_SIZE
- add grant_table.h to the list of prereqs in ring.h and remove the
  #include from ring.h
- #include grant_table.h in 9pfs.h and pvcalls.h
- remove PAGE_SHIFT definition, define XEN_PAGE_SHIFT instead
- code style fixes
- remove struct xen_9pfs_header definition
- don't add extra -include cstring to all c++ tests, only when needed
- add headers99.chk to .gitignore 

Changes in v4:
- include ../grant_table.h in ring.h
- add a comment about required declarations on top of ring.h
- add a patch to introduce a C99 headers check
- add -include string.h to the existing C++ headers check
- add 9pfs and pvcalls to the C99 headers check

Changes in v3:
- fix commit message
- add newlines after read/write_packet functions
- reorder DEFINE_XEN_FLEX_RING_AND_INTF and DEFINE_XEN_FLEX_RING

Changes in v2:
- replace __attribute__((packed)) with #pragma pack
- remove XEN_9PFS_RING_ORDER, the 9pfs ring order should be dynamic
- add editor configuration blocks
- add links to the specs


Stefano Stabellini (4):
  ring.h: introduce macros to handle monodirectional rings with multiple 
req sizes
  xen: introduce a C99 headers check
  Introduce the Xen 9pfs transport header
  Introduce the pvcalls header

 .gitignore  |   1 +
 xen/include/Makefile|  36 ++
 xen/include/public/io/9pfs.h|  49 +
 xen/include/public/io/pvcalls.h | 153 
 xen/include/public/io/ring.h| 151 +++
 5 files changed, 378 insertions(+), 12 deletions(-)
 create mode 100644 xen/include/public/io/9pfs.h
 create mode 100644 xen/include/public/io/pvcalls.h

Cheers,

Stefano

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v6 2/4] xen: introduce a C99 headers check

2017-03-29 Thread Stefano Stabellini
Introduce a C99 headers check, for non-ANSI compliant headers: 9pfs.h
and pvcalls.h.

In addition to the usual -include stdint.h, also add -include string.h
to the C99 check to get the declaration of memcpy and size_t.

For the same reason, also add -include cstring to the C++ check when
necessary.

Signed-off-by: Stefano Stabellini 
CC: jbeul...@suse.com
CC: konrad.w...@oracle.com
---
 .gitignore   |  1 +
 xen/include/Makefile | 31 +++
 2 files changed, 20 insertions(+), 12 deletions(-)

diff --git a/.gitignore b/.gitignore
index 443b12a..a8905b1 100644
--- a/.gitignore
+++ b/.gitignore
@@ -274,6 +274,7 @@ xen/arch/*/efi/compat.c
 xen/arch/*/efi/efi.h
 xen/arch/*/efi/runtime.c
 xen/include/headers.chk
+xen/include/headers99.chk
 xen/include/headers++.chk
 xen/include/asm
 xen/include/asm-*/asm-offsets.h
diff --git a/xen/include/Makefile b/xen/include/Makefile
index aca7f20..be56738 100644
--- a/xen/include/Makefile
+++ b/xen/include/Makefile
@@ -90,11 +90,12 @@ compat/xlat.h: $(addprefix compat/.xlat/,$(xlat-y)) Makefile
 
 ifeq ($(XEN_TARGET_ARCH),$(XEN_COMPILE_ARCH))
 
-all: headers.chk headers++.chk
+all: headers.chk headers99.chk headers++.chk
 
 PUBLIC_HEADERS := $(filter-out public/arch-% public/dom0_ops.h, $(wildcard 
public/*.h public/*/*.h) $(public-y))
 
-PUBLIC_ANSI_HEADERS := $(filter-out public/%ctl.h public/xsm/% 
public/%hvm/save.h, $(PUBLIC_HEADERS))
+PUBLIC_C99_HEADERS :=
+PUBLIC_ANSI_HEADERS := $(filter-out public/%ctl.h public/xsm/% 
public/%hvm/save.h $(PUBLIC_C99_HEADERS), $(PUBLIC_HEADERS))
 
 headers.chk: $(PUBLIC_ANSI_HEADERS) Makefile
for i in $(filter %.h,$^); do \
@@ -104,16 +105,22 @@ headers.chk: $(PUBLIC_ANSI_HEADERS) Makefile
done >$@.new
mv $@.new $@
 
+headers99.chk: $(PUBLIC_C99_HEADERS) Makefile
+   rm -f $@.new $@
+   $(foreach i, $(filter %.h,$^), $(CC) -x c -std=c99 -Wall -Werror \
+   -include stdint.h $($(i)-c99-prereq) -S -o /dev/null $(i) || \
+   exit 1; echo $(i) >> $@.new;)
+   mv $@.new $@
+
 headers++.chk: $(PUBLIC_HEADERS) Makefile
-   if $(CXX) -v >/dev/null 2>&1; then \
-   for i in $(filter %.h,$^); do \
-   echo '#include "'$$i'"' \
-   | $(CXX) -x c++ -std=gnu++98 -Wall -Werror -D__XEN_TOOLS__ \
- -include stdint.h -include public/xen.h -S -o /dev/null - \
-   || exit 1; \
-   echo $$i; \
-   done ; \
-   fi >$@.new
+   if ! $(CXX) -v >/dev/null 2>&1; then \
+   exit 0;  \
+   fi
+   rm -f $@.new $@
+   $(foreach i, $(filter %.h,$^), echo "#include "\"$(i)\"|   \
+   $(CXX) -x c++ -std=gnu++98 -Wall -Werror -D__XEN_TOOLS__   \
+   -include stdint.h $($(i)-cxx-prereq) -include public/xen.h \
+   -S -o /dev/null - || exit 1; echo $(i) >> $@.new;)
mv $@.new $@
 
 endif
@@ -128,5 +135,5 @@ all: $(BASEDIR)/include/asm-x86/cpuid-autogen.h
 endif
 
 clean::
-   rm -rf compat headers.chk headers++.chk
+   rm -rf compat headers*.chk
rm -f $(BASEDIR)/include/asm-x86/cpuid-autogen.h
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v6 1/4] ring.h: introduce macros to handle monodirectional rings with multiple req sizes

2017-03-29 Thread Stefano Stabellini
This patch introduces macros, structs and functions to handle rings in
the format described by docs/misc/pvcalls.markdown and
docs/misc/9pfs.markdown. The index page (struct __name##_data_intf)
contains the indexes and the grant refs to setup two rings.

   Indexes page
   +--+
   |@0 $NAME_data_intf:   |
   |@76: ring_order = 1   |
   |@80: ref[0]+  |
   |@84: ref[1]+  |
   |   |  |
   |   |  |
   +--+
   |
   v (data ring)
   +---+---+
   |  @0->4095: in |
   |  ref[0]   |
   |---|
   |  @4096->8191: out |
   |  ref[1]   |
   +---+

$NAME_read_packet and $NAME_write_packet are provided to read or write
any data struct from/to the ring. In pvcalls, they are unused. In xen
9pfs, they are used to read or write the 9pfs header. In other protocols
they could be used to read/write the whole request structure. See
docs/misc/9pfs.markdown:Ring Usage to learn how to check how much data
is on the ring, and how to handle notifications.

There is a ring_size parameter to most functions so that protocols using
these macros don't have to have a statically defined ring order at build
time. In pvcalls for example, each new ring could have a different
order.

These macros don't help you share the indexes page or the event channels
needed for notifications. You can do that with other out of band
mechanisms, such as xenstore or another ring.

It is not possible to use a macro to define another macro with a
variable name. For this reason, this patch introduces static inline
functions instead, that are not C89 compliant. Additionally, the macro
defines a struct with a variable sized array, which is also not C89
compliant.

Signed-off-by: Stefano Stabellini 
CC: konrad.w...@oracle.com
CC: jbeul...@suse.com
---
 xen/include/public/io/ring.h | 151 +++
 1 file changed, 151 insertions(+)

diff --git a/xen/include/public/io/ring.h b/xen/include/public/io/ring.h
index 801c0da..216c940 100644
--- a/xen/include/public/io/ring.h
+++ b/xen/include/public/io/ring.h
@@ -27,6 +27,21 @@
 #ifndef __XEN_PUBLIC_IO_RING_H__
 #define __XEN_PUBLIC_IO_RING_H__
 
+/*
+ * When #include'ing this header, you need to provide the following
+ * declaration upfront:
+ * - standard integers types (uint8_t, uint16_t, etc)
+ * They are provided by stdint.h of the standard headers.
+ *
+ * In addition, if you intend to use the FLEX macros, you also need to
+ * provide the following, before invoking the FLEX macros:
+ * - size_t
+ * - memcpy
+ * - grant_ref_t
+ * These declarations are provided by string.h of the standard headers,
+ * and grant_table.h from the Xen public headers.
+ */
+
 #include "../xen-compat.h"
 
 #if __XEN_INTERFACE_VERSION__ < 0x00030208
@@ -313,6 +328,142 @@ typedef struct __name##_back_ring __name##_back_ring_t
 (_work_to_do) = RING_HAS_UNCONSUMED_RESPONSES(_r);  \
 } while (0)
 
+
+/*
+ * DEFINE_XEN_FLEX_RING_AND_INTF defines two monodirectional rings and
+ * functions to check if there is data on the ring, and to read and
+ * write to them.
+ *
+ * DEFINE_XEN_FLEX_RING is similar to DEFINE_XEN_FLEX_RING_AND_INTF, but
+ * does not define the indexes page. As different protocols can have
+ * extensions to the basic format, this macro allow them to define their
+ * own struct.
+ *
+ * XEN_FLEX_RING_SIZE
+ *   Convenience macro to calculate the size of one of the two rings
+ *   from the overall order.
+ *
+ * $NAME_mask
+ *   Function to apply the size mask to an index, to reduce the index
+ *   within the range [0-size].
+ *
+ * $NAME_read_packet
+ *   Function to read data from the ring. The amount of data to read is
+ *   specified by the "size" argument.
+ *
+ * $NAME_write_packet
+ *   Function to write data to the ring. The amount of data to write is
+ *   specified by the "size" argument.
+ *
+ * $NAME_get_ring_ptr
+ *   Convenience function that returns a pointer to read/write to the
+ *   ring at the right location.
+ *
+ * $NAME_data_intf
+ *   Indexes page, shared between frontend and backend. It also
+ *   contains the array of grant refs.
+ *
+ * $NAME_queued
+ *   Function to calculate how many bytes are currently on the ring,
+ *   ready to be read. It can also be used to calculate how much free
+ *   space is currently on the ring (ring_size - $NAME_queued()).
+ */
+
+#define XEN_PAGE_SHIFT 12
+#define XEN_FLEX_RING_SIZE(order) \
+(1UL << ((order) + XEN_PAGE_SHIFT - 1))
+
+#define DEFINE_XEN_FLEX_RING(name)\
+static inline RING_IDX 

Re: [Xen-devel] Xen Security Advisory 206 - xenstore denial of service via repeated update

2017-03-29 Thread Michael Young

On Wed, 29 Mar 2017, Xen.org security team wrote:


-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

   Xen Security Advisory XSA-206
 version 9

   xenstore denial of service via repeated update


I am seeing a build failure from these patches when using gcc 7. The 
problem is with
xsa206-4.80002-xenstored-Log-when-the-write-transaction-rate-limit-.patch 
because in tools/xenstore/xenstored_domain.c the patch adds the boolean 
wrl_delay_logged to the structure "domain" but later it tries to increment 
it, resulting in the error 
xenstored_domain.c: In function 'wrl_apply_debit_actual':
xenstored_domain.c:949:32: error: increment of a boolean expression 
[-Werror=bool-operation]

   if (!domain->wrl_delay_logged++) {

Michael Young

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v5 2/4] xen: introduce a C99 headers check

2017-03-29 Thread Stefano Stabellini
On Wed, 29 Mar 2017, Jan Beulich wrote:
> >>> On 29.03.17 at 00:08,  wrote:
> > Introduce a C99 headers check, for non-ANSI compliant headers: 9pfs.h
> > and pvcalls.h.
> > 
> > In addition to the usual -include stdint.h, also add -include string.h
> > to the C99 check to get the declaration of memcpy and size_t.
> 
> No, as explained before. You shouldn't think of just your new headers,
> but others which may later join the party.
> 
> > --- a/xen/include/Makefile
> > +++ b/xen/include/Makefile
> > @@ -90,11 +90,16 @@ compat/xlat.h: $(addprefix compat/.xlat/,$(xlat-y)) 
> > Makefile
> >  
> >  ifeq ($(XEN_TARGET_ARCH),$(XEN_COMPILE_ARCH))
> >  
> > -all: headers.chk headers++.chk
> > +all: headers.chk headers99.chk headers++.chk
> >  
> >  PUBLIC_HEADERS := $(filter-out public/arch-% public/dom0_ops.h, $(wildcard 
> > public/*.h public/*/*.h) $(public-y))
> >  
> > -PUBLIC_ANSI_HEADERS := $(filter-out public/%ctl.h public/xsm/% 
> > public/%hvm/save.h, $(PUBLIC_HEADERS))
> > +PUBLIC_C99_HEADERS :=
> > +PUBLIC_ANSI_HEADERS := $(filter-out public/%ctl.h public/xsm/% 
> > public/%hvm/save.h $(PUBLIC_C99_HEADERS), $(PUBLIC_HEADERS))
> > +
> > +EXTRA_PREREQ_C99 := -include string.h
> > +EXTRA_PREREQ_CPP := -include cstring
> 
> These should be per header, e.g.
> 
> pvcalls.h-c99-prereq := string.h
> pvcalls.h-cxx-prereq := cstring
> 
> which will also (I think at least) greatly simplify the adjustments
> needed further down).
> 
> Implied from this - please don't use CPP for C++, as CPP is commonly
> used for the preprocessor.

I can try, but it requires changing the loops below from bash to
Makefile ($(foreach)) to handle variable expansions properly. It's not a
trivial change and it decreases readability, see the new version of the
series that I am about to send. If it was up to me I would probably keep
the code as is in this version.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v4 3/4] Introduce the Xen 9pfs transport header

2017-03-29 Thread Stefano Stabellini
On Wed, 29 Mar 2017, Jan Beulich wrote:
> >>> On 28.03.17 at 23:04,  wrote:
> > On Tue, 28 Mar 2017, Jan Beulich wrote:
> >> >>> Stefano Stabellini  03/27/17 10:54 PM >>>
> >> >On Mon, 27 Mar 2017, Jan Beulich wrote:
> >> >> >>> On 24.03.17 at 19:31,  wrote:
> >> >> > +/*
> >> >> > + * See docs/misc/9pfs.markdown in xen.git for the full specification:
> >> >> > + * https://xenbits.xen.org/docs/unstable/misc/9pfs.html 
> >> >> > + */
> >> >> > +#pragma pack(push)
> >> >> > +#pragma pack(1)
> >> >> > +struct xen_9pfs_header {
> >> >> > + uint32_t size;
> >> >> > + uint8_t id;
> >> >> > + uint16_t tag;
> >> >> > +};
> >> >> > +#pragma pack(pop)
> >> >> 
> >> >> There's no precedent to using pragmas in the public headers, and
> >> >> these aren't C99-compliant.
> >> >
> >> >I'll remove pragma, together with the definition of struct
> >> >xen_9pfs_header: this structure is already defined as part of the 9p
> >> >protocol, and it is already mentioned in the Xen 9pfs transport spec as
> >> >well. In fact, both QEMU and Linux already have it defined. I don't
> >> >think we need it here.
> >> 
> >> That'll deal with the immediate issue here, but not with the more general
> >> implied one: Why would you want to have misaligned fields in a protocol
> >> definition?
> > 
> > Because this header is not actually part of the Xen trasport protocol,
> > it is defined by the 9pfs specification. That's why QEMU already had it.
> > I cannot do anything about that. I was only redefining it here for
> > convenience, because reading the header is required to figure out how
> > big is a request (or a response).
> 
> If the size is all you care about, perhaps
> 
> struct xen_9pfs_header {
>   uint8_t size[4];
>   uint8_t id[1];
>   uint8_t tag[2];
> };
> 
> would do?

I think it risks causing confusion with the regular header definition,
it's best to drop it. After all, it is specified elsewhere and clearly
mentioned in docs/misc/9pfs.markdown. There won't be any surprises.


> (This made me notice you use hard tabs, which you
> shouldn't in the Xen tree.)

Sorry about the tabs, I checked and none of the other patches have them.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] Xen ARM community call - meeting minutes and date for the next one

2017-03-29 Thread Stefano Stabellini
On Wed, 29 Mar 2017, Julien Grall wrote:
> Hi,
> 
> On 28/03/2017 16:23, Julien Grall wrote:
> > Hi all,
> > 
> > Apologies for the late sending, you will find at the end of the e-mail a
> > summary of the discussion from the previous call. Feel free to reply if I
> > missed some parts.
> > 
> > I suggest to have the next call on the 5th April at 5PM UTC. Any opinions?
> 
> Apologize, I forgot that we switched timezone last Sunday. I was meant
> to say 5pm BST (e.g UTC + 1). Is it fine for everyone to do the meeting
> one hour later?

I didn't even realize that the timezone was different. Yes, please :-)

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] Xen ARM community call - meeting minutes and date for the next one

2017-03-29 Thread Julien Grall

Hi,

On 28/03/2017 16:23, Julien Grall wrote:

Hi all,

Apologies for the late sending, you will find at the end of the e-mail a
summary of the discussion from the previous call. Feel free to reply if I
missed some parts.

I suggest to have the next call on the 5th April at 5PM UTC. Any opinions?


Apologize, I forgot that we switched timezone last Sunday. I was meant
to say 5pm BST (e.g UTC + 1). Is it fine for everyone to do the meeting
one hour later?

Cheers,

--
Julien Grall
IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [ovmf test] 106978: all pass - PUSHED

2017-03-29 Thread osstest service owner
flight 106978 ovmf real [real]
http://logs.test-lab.xenproject.org/osstest/logs/106978/

Perfect :-)
All tests in this flight passed as required
version targeted for testing:
 ovmf 6e7ec25aaaf0dfc2b4c84ffd4c7ee7cd442aecb6
baseline version:
 ovmf 89ad870fbff03a511102c73773000f2bea2017d2

Last test of basis   106971  2017-03-28 14:15:19 Z1 days
Testing same since   106978  2017-03-29 16:18:50 Z0 days1 attempts


People who touched revisions under test:
  Ard Biesheuvel 
  Bell Song 
  David Woodhouse 
  Gary Lin 
  Laszlo Ersek 
  Liming Gao 
  Qin Long 
  Ruiyu Ni 
  Song, BinX 
  Yonghong Zhu 

jobs:
 build-amd64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 pass
 test-amd64-i386-xl-qemuu-ovmf-amd64  pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

+ branch=ovmf
+ revision=6e7ec25aaaf0dfc2b4c84ffd4c7ee7cd442aecb6
+ . ./cri-lock-repos
++ . ./cri-common
+++ . ./cri-getconfig
+++ umask 002
+++ getrepos
 getconfig Repos
 perl -e '
use Osstest;
readglobalconfig();
print $c{"Repos"} or die $!;
'
+++ local repos=/home/osstest/repos
+++ '[' -z /home/osstest/repos ']'
+++ '[' '!' -d /home/osstest/repos ']'
+++ echo /home/osstest/repos
++ repos=/home/osstest/repos
++ repos_lock=/home/osstest/repos/lock
++ '[' x '!=' x/home/osstest/repos/lock ']'
++ OSSTEST_REPOS_LOCK_LOCKED=/home/osstest/repos/lock
++ exec with-lock-ex -w /home/osstest/repos/lock ./ap-push ovmf 
6e7ec25aaaf0dfc2b4c84ffd4c7ee7cd442aecb6
+ branch=ovmf
+ revision=6e7ec25aaaf0dfc2b4c84ffd4c7ee7cd442aecb6
+ . ./cri-lock-repos
++ . ./cri-common
+++ . ./cri-getconfig
+++ umask 002
+++ getrepos
 getconfig Repos
 perl -e '
use Osstest;
readglobalconfig();
print $c{"Repos"} or die $!;
'
+++ local repos=/home/osstest/repos
+++ '[' -z /home/osstest/repos ']'
+++ '[' '!' -d /home/osstest/repos ']'
+++ echo /home/osstest/repos
++ repos=/home/osstest/repos
++ repos_lock=/home/osstest/repos/lock
++ '[' x/home/osstest/repos/lock '!=' x/home/osstest/repos/lock ']'
+ . ./cri-common
++ . ./cri-getconfig
++ umask 002
+ select_xenbranch
+ case "$branch" in
+ tree=ovmf
+ xenbranch=xen-unstable
+ '[' xovmf = xlinux ']'
+ linuxbranch=
+ '[' x = x ']'
+ qemuubranch=qemu-upstream-unstable
+ select_prevxenbranch
++ ./cri-getprevxenbranch xen-unstable
+ prevxenbranch=xen-4.8-testing
+ '[' x6e7ec25aaaf0dfc2b4c84ffd4c7ee7cd442aecb6 = x ']'
+ : tested/2.6.39.x
+ . ./ap-common
++ : osst...@xenbits.xen.org
+++ getconfig OsstestUpstream
+++ perl -e '
use Osstest;
readglobalconfig();
print $c{"OsstestUpstream"} or die $!;
'
++ :
++ : git://xenbits.xen.org/xen.git
++ : osst...@xenbits.xen.org:/home/xen/git/xen.git
++ : git://xenbits.xen.org/qemu-xen-traditional.git
++ : git://git.kernel.org
++ : git://git.kernel.org/pub/scm/linux/kernel/git
++ : git
++ : git://xenbits.xen.org/xtf.git
++ : osst...@xenbits.xen.org:/home/xen/git/xtf.git
++ : git://xenbits.xen.org/xtf.git
++ : git://xenbits.xen.org/libvirt.git
++ : osst...@xenbits.xen.org:/home/xen/git/libvirt.git
++ : git://xenbits.xen.org/libvirt.git
++ : git://xenbits.xen.org/osstest/rumprun.git
++ : git
++ : git://xenbits.xen.org/osstest/rumprun.git
++ : osst...@xenbits.xen.org:/home/xen/git/osstest/rumprun.git
++ : git://git.seabios.org/seabios.git
++ : osst...@xenbits.xen.org:/home/xen/git/osstest/seabios.git
++ : git://xenbits.xen.org/osstest/seabios.git
++ : https://github.com/tianocore/edk2.git
++ : 

Re: [Xen-devel] [PATCH RFC 08/20] libxl/migration: add precopy tuning parameters

2017-03-29 Thread Andrew Cooper
On 27/03/17 10:06, Joshua Otto wrote:
> In the context of the live migration algorithm, the precopy iteration
> count refers to the number of page-copying iterations performed prior to
> the suspension of the guest and transmission of the final set of dirty
> pages.  Similarly, the precopy dirty threshold refers to the dirty page
> count below which we judge it more profitable to proceed to
> stop-and-copy rather than continue with the precopy.  These would be
> helpful tuning parameters to work with when migrating particularly busy
> guests, as they enable an administrator to reap the available benefits
> of the precopy algorithm (the transmission of guest pages _not_ in the
> writable working set can be completed without guest downtime) while
> reducing the total amount of time required for the migration (as
> iterations of the precopy loop that will certainly be redundant can be
> skipped in favour of an earlier suspension).
>
> To expose these tuning parameters to users:
> - introduce a new libxl API function, libxl_domain_live_migrate(),
>   taking the same parameters as libxl_domain_suspend() _and_
>   precopy_iterations and precopy_dirty_threshold parameters, and
>   consider these parameters in the precopy policy
>
>   (though a pair of new parameters on their own might not warrant an
>   entirely new API function, it is added in anticipation of a number of
>   additional migration-only parameters that would be cumbersome on the
>   whole to tack on to the existing suspend API)
>
> - switch xl migrate to the new libxl_domain_live_migrate() and add new
>   --postcopy-iterations and --postcopy-threshold parameters to pass
>   through
>
> Signed-off-by: Joshua Otto 

This will have to defer to the tools maintainers, but I purposefully
didn't expose these knobs to users when rewriting live migration,
because they cannot be meaningfully chosen by anyone outside of a
testing scenario.  (That is not to say they aren't useful for testing
purposes, but I didn't upstream my version of this patch.)

I spent quite a while wondering how best to expose these tunables in a
way that end users could sensibly use them, and the best I came up with
was this:

First, run the guest under logdirty for a period of time to establish
the working set, and how steady it is.  From this, you have a baseline
for the target threshold, and a plausible way of estimating the
downtime.  (Better yet, as XenCenter, XenServers windows GUI, has proved
time and time again, users love graphs!  Even if they don't necessarily
understand them.)

From this baseline, the conditions you need to care about are the rate
of convergence.  On a steady VM, you should converge asymptotically to
the measured threshold, although on 5 or fewer iterations, the
asymptotic properties don't appear cleanly.  (Of course, the larger the
VM, the more iterations, and the more likely to spot this.)

Users will either care about the migration completing successfully, or
avoiding interrupting the workload.  The majority case would be both,
but every user will have one of these two options which is more
important than the other.  As a result, there need to be some options to
cover "if $X happens, do I continue or abort".

The case where the VM becomes more busy is harder however.  For the
users which care about not interrupting the workload, there will be a
point above which they'd prefer to abort the migration rather than
continue it.  For the users which want the migration to complete, they'd
prefer to pause the VM and take a downtime hit, rather than aborting.

Therefore, you really need two thresholds; the one above which you
always abort, the one where you would normally choose to pause.  The
decision as to what to do depends on where you are between these
thresholds when the dirty state converges.  (Of course, if the VM
suddenly becomes more idle, it is sensible to continue beyond the lower
threshold, as it will reduce the downtime.)  The absolute number of
iterations on the other hand doesn't actually matter from a users point
of view, so isn't a useful control to have.

Another thing to be careful with is the measure of convergence with
respect to guest busyness, and other factors influencing the absolute
iteration time, such as congestion of the network between the two
hosts.  I haven't yet come up with a sensible way of reconciling this
with the above, in a way which can be expressed as a useful set of controls.


The plan, following migration v2, was always to come back to this and
see about doing something better than the current hard coded parameters,
but I am still working on fixing migration in other areas (not having
VMs crash when moving, because they observe important differences in the
hardware).

How does your postcopy proposal influence/change the above logic?

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [xen-unstable-smoke test] 106984: tolerable trouble: broken/fail/pass - PUSHED

2017-03-29 Thread osstest service owner
flight 106984 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/106984/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-arm64-arm64-xl-xsm   1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt 12 migrate-support-checkfail   never pass
 build-arm64   5 xen-buildfail   never pass
 build-arm64-pvops 5 kernel-build fail   never pass
 test-armhf-armhf-xl  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  13 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  09ebe69c0c832e2cdbfc76f0338499bd8b50d9de
baseline version:
 xen  b988e88cc041f630dcfa735dcf9c895310103629

Last test of basis   106982  2017-03-29 17:01:14 Z0 days
Testing same since   106984  2017-03-29 19:02:08 Z0 days1 attempts


People who touched revisions under test:
  Joshua Otto 
  Owen Smith 
  Paul Durrant 
  Stefano Stabellini 
  Stefano Stabellini 
  Wei Chen 
  Wei Liu 

jobs:
 build-amd64  pass
 build-arm64  fail
 build-armhf  pass
 build-amd64-libvirt  pass
 build-arm64-pvopsfail
 test-armhf-armhf-xl  pass
 test-arm64-arm64-xl-xsm  broken  
 test-amd64-amd64-xl-qemuu-debianhvm-i386 pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

+ branch=xen-unstable-smoke
+ revision=09ebe69c0c832e2cdbfc76f0338499bd8b50d9de
+ . ./cri-lock-repos
++ . ./cri-common
+++ . ./cri-getconfig
+++ umask 002
+++ getrepos
 getconfig Repos
 perl -e '
use Osstest;
readglobalconfig();
print $c{"Repos"} or die $!;
'
+++ local repos=/home/osstest/repos
+++ '[' -z /home/osstest/repos ']'
+++ '[' '!' -d /home/osstest/repos ']'
+++ echo /home/osstest/repos
++ repos=/home/osstest/repos
++ repos_lock=/home/osstest/repos/lock
++ '[' x '!=' x/home/osstest/repos/lock ']'
++ OSSTEST_REPOS_LOCK_LOCKED=/home/osstest/repos/lock
++ exec with-lock-ex -w /home/osstest/repos/lock ./ap-push xen-unstable-smoke 
09ebe69c0c832e2cdbfc76f0338499bd8b50d9de
+ branch=xen-unstable-smoke
+ revision=09ebe69c0c832e2cdbfc76f0338499bd8b50d9de
+ . ./cri-lock-repos
++ . ./cri-common
+++ . ./cri-getconfig
+++ umask 002
+++ getrepos
 getconfig Repos
 perl -e '
use Osstest;
readglobalconfig();
print $c{"Repos"} or die $!;
'
+++ local repos=/home/osstest/repos
+++ '[' -z /home/osstest/repos ']'
+++ '[' '!' -d /home/osstest/repos ']'
+++ echo /home/osstest/repos
++ repos=/home/osstest/repos
++ repos_lock=/home/osstest/repos/lock
++ '[' x/home/osstest/repos/lock '!=' x/home/osstest/repos/lock ']'
+ . ./cri-common
++ . ./cri-getconfig
++ umask 002
+ select_xenbranch
+ case "$branch" in
+ tree=xen
+ xenbranch=xen-unstable-smoke
+ qemuubranch=qemu-upstream-unstable
+ '[' xxen = xlinux ']'
+ linuxbranch=
+ '[' xqemu-upstream-unstable = x ']'
+ select_prevxenbranch
++ ./cri-getprevxenbranch xen-unstable-smoke
+ prevxenbranch=xen-4.8-testing
+ '[' x09ebe69c0c832e2cdbfc76f0338499bd8b50d9de = x ']'
+ : tested/2.6.39.x
+ . ./ap-common
++ : osst...@xenbits.xen.org
+++ getconfig OsstestUpstream
+++ perl -e '
use Osstest;
readglobalconfig();
print $c{"OsstestUpstream"} or die $!;
'
++ :
++ : git://xenbits.xen.org/xen.git
++ : osst...@xenbits.xen.org:/home/xen/git/xen.git
++ : git://xenbits.xen.org/qemu-xen-traditional.git
++ : git://git.kernel.org
++ : git://git.kernel.org/pub/scm/linux/kernel/git
++ : git
++ : git://xenbits.xen.org/xtf.git
++ : osst...@xenbits.xen.org:/home/xen/git/xtf.git
++ : git://xenbits.xen.org/xtf.git
++ : git://xenbits.xen.org/libvirt.git
++ : 

Re: [Xen-devel] [PATCH v5 1/4] ring.h: introduce macros to handle monodirectional rings with multiple req sizes

2017-03-29 Thread Stefano Stabellini
On Wed, 29 Mar 2017, Jan Beulich wrote:
> >>> On 29.03.17 at 00:08,  wrote:
> > +#define DEFINE_XEN_FLEX_RING(name) 
> >\
> > +static inline RING_IDX name##_mask(RING_IDX idx, RING_IDX ring_size)   
> >\
> > +{  
> >\
> > +return (idx & (ring_size - 1));
> >\
> > +}  
> >\
> > +   
> >\
> > +static inline RING_IDX name##_mask_order(RING_IDX idx, RING_IDX 
> > ring_order)   \
> > +{  
> >\
> > +return (idx & (XEN_FLEX_RING_SIZE(ring_order) - 1));   
> >\
> > +}  
> >\
> 
> Do you really need both (and if you do, perhaps the latter should
> call the former)? I also find the mixture of ring_order and ring_size
> parameters of later functions a little strange.

Actually, I don't need two, I'll drop name##_mask_order. I'll change
the parameter below to be ring_size for consistency.


> > +static inline unsigned char *name##_get_ring_ptr(unsigned char *buf,   
> >\
> > + RING_IDX idx, 
> >\
> > + RING_IDX ring_order)  
> >\
> > +{  
> >\
> > +return buf + name##_mask_order(idx, ring_order);   
> >\
> 
> Please be consistent with parenthesizing the operand of return:
> The earlier two functions have an unnecessary pair of parens,
> so personally I'd prefer those to be dropped. But if you prefer to
> have them, add them everywhere.

OK


> > +static inline void name##_read_packet(const unsigned char *buf,
> >\
> > +RING_IDX masked_prod, RING_IDX *masked_cons,   
> >\
> > +RING_IDX ring_size, void *opaque, size_t size) 
> >\
> 
> Especially with so many parameters I think some extra thought
> should be spent on their ordering: Primarily this is a memcpy()-
> like function, so I would kind of expect destination description,
> source description (each of which may require more than one
> parameter), size, auxiliary.

I'll reorder the parameters.


> As to the auxiliary part (especially
> ring_size) - there's no structure you could pass a pointer to,
> taking care of more than one of these, is there (struct
> name##_data_intf would at least appear to be a candidate,
> but is not always available)?

That is the problem, it is not always available. I prefer to keep them
separate.


> I'm also not really clear whether it wouldn't be better for both
> input and output to be void * (input remaining const of course).

I think it's a matter of taste: source is unsigned char* because it is
of the same type as the underlying ring buffer to read data from. I'll
leave it as is for now.


> And finally please indent function parameter declarations
> uniformly throughout the patch. If you prefer to follow the style
> of the declaration right above, then please reduce indentation 
> to four spaces (to match that of function scope local variable
> declarations).

I'll fix indentation.


> > +static inline RING_IDX name##_queued(RING_IDX prod,
> >\
> > +RING_IDX cons, RING_IDX ring_size) 
> >\
> > +{  
> >\
> > +RING_IDX size; 
> >\
> > +   
> >\
> > +if (prod == cons)  
> >\
> > +return 0;  
> >\
> > +   
> >\
> > +prod = name##_mask(prod, ring_size);   
> >\
> > +cons = name##_mask(cons, ring_size);   
> >\
> > +   
> >\
> > +if (prod == cons)  
> >\
> > +return ring_size;  
> >\
> > +   
> >\
> > +if (prod > cons)   
> >\
> > +size = prod - cons;
> >\
> > +

Re: [Xen-devel] [PATCH RFC 07/20] migration: defer precopy policy to libxl

2017-03-29 Thread Andrew Cooper
On 27/03/17 10:06, Joshua Otto wrote:
> The precopy phase of the xc_domain_save() live migration algorithm has
> historically been implemented to run until either a) (almost) no pages
> are dirty or b) some fixed, hard-coded maximum number of precopy
> iterations has been exceeded.  This policy and its implementation are
> less than ideal for a few reasons:
> - the logic of the policy is intertwined with the control flow of the
>   mechanism of the precopy stage
> - it can't take into account facts external to the immediate
>   migration context, such as interactive user input or the passage of
>   wall-clock time
> - it does not permit the user to change their mind, over time, about
>   what to do at the end of the precopy (they get an unconditional
>   transition into the stop-and-copy phase of the migration)
>
> To permit users to implement arbitrary higher-level policies governing
> when the live migration precopy phase should end, and what should be
> done next:
> - add a precopy_policy() callback to the xc_domain_save() user-supplied
>   callbacks
> - during the precopy phase of live migrations, consult this policy after
>   each batch of pages transmitted and take the dictated action, which
>   may be to a) abort the migration entirely, b) continue with the
>   precopy, or c) proceed to the stop-and-copy phase.
> - provide an implementation of the old policy as such a callback in
>   libxl and plumb it through the IPC machinery to libxc, effectively
>   maintaing the old policy for now
>
> Signed-off-by: Joshua Otto 

This patch should be split into two.  One modifying libxc to use struct
precopy_stats, and a second to wire up the RPC call.

> ---
>  tools/libxc/include/xenguest.h |  23 -
>  tools/libxc/xc_nomigrate.c |   3 +-
>  tools/libxc/xc_sr_common.h |   7 +-
>  tools/libxc/xc_sr_save.c   | 194 
> ++---
>  tools/libxl/libxl_dom_save.c   |  20 
>  tools/libxl/libxl_save_callout.c   |   3 +-
>  tools/libxl/libxl_save_helper.c|   7 +-
>  tools/libxl/libxl_save_msgs_gen.pl |   4 +-
>  8 files changed, 189 insertions(+), 72 deletions(-)
>
> diff --git a/tools/libxc/include/xenguest.h b/tools/libxc/include/xenguest.h
> index aa8cc8b..30ffb6f 100644
> --- a/tools/libxc/include/xenguest.h
> +++ b/tools/libxc/include/xenguest.h
> @@ -39,6 +39,14 @@
>   */
>  struct xenevtchn_handle;
>  
> +/* For save's precopy_policy(). */
> +struct precopy_stats
> +{
> +unsigned iteration;
> +unsigned total_written;
> +long dirty_count; /* -1 if unknown */

total_written and dirty_count are liable to be equal, so having them as
different widths of integer clearly can't be correct.

> +};
> +
>  /* callbacks provided by xc_domain_save */
>  struct save_callbacks {
>  /* Called after expiration of checkpoint interval,
> @@ -46,6 +54,17 @@ struct save_callbacks {
>   */
>  int (*suspend)(void* data);
>  
> +/* Called after every batch of page data sent during the precopy phase 
> of a
> + * live migration to ask the caller what to do next based on the current
> + * state of the precopy migration.
> + */
> +#define XGS_POLICY_ABORT  (-1) /* Abandon the migration entirely and
> +* tidy up. */
> +#define XGS_POLICY_CONTINUE_PRECOPY 0  /* Remain in the precopy phase. */
> +#define XGS_POLICY_STOP_AND_COPY1  /* Immediately suspend and transmit 
> the
> +* remaining dirty pages. */
> +int (*precopy_policy)(struct precopy_stats stats, void *data);

Structures shouldn't be passed by value like this, as the compiler has
to do a lot of memcpy() work to make it happen.  You should pass by
const pointer, as (as far as I can tell), they are strictly read-only to
the implementation of this hook?

> +
>  /* Called after the guest's dirty pages have been
>   *  copied into an output buffer.
>   * Callback function resumes the guest & the device model,
> @@ -100,8 +119,8 @@ typedef enum {
>   *doesn't use checkpointing
>   * @return 0 on success, -1 on failure
>   */
> -int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, uint32_t 
> max_iters,
> -   uint32_t max_factor, uint32_t flags /* XCFLAGS_xxx */,
> +int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom,
> +   uint32_t flags /* XCFLAGS_xxx */,
> struct save_callbacks* callbacks, int hvm,
> xc_migration_stream_t stream_type, int recv_fd);

It would be cleaner for existing callers, and to extend in the future,
to encapsulate all of these parameters in a struct domain_save_params
and pass it by pointer to here.

That way, we'd avoid the situation we currently have where some
information is passed in bitfields in a single parameter, whereas other
booleans are passed as integers.

The hvm parameter specifically is useless, and can be removed by

Re: [Xen-devel] [PATCH v5 1/4] ring.h: introduce macros to handle monodirectional rings with multiple req sizes

2017-03-29 Thread Stefano Stabellini
On Wed, 29 Mar 2017, Jan Beulich wrote:
> >>> On 29.03.17 at 00:08,  wrote:
> > --- a/xen/include/public/io/ring.h
> > +++ b/xen/include/public/io/ring.h
> > @@ -27,6 +27,21 @@
> >  #ifndef __XEN_PUBLIC_IO_RING_H__
> >  #define __XEN_PUBLIC_IO_RING_H__
> >  
> > +/*
> > + * When #include'ing this header, you need to provide the following
> > + * declaration upfront:
> > + * - standard integers types (uint8_t, uint16_t, etc)
> > + * They are provided by stdint.h of the standard headers.
> > + *
> > + * In addition, if you intend to use the FLEX macros, you also need to
> > + * provide:
> > + * - size_t
> > + * - memcpy
> > + * - grant_ref_t
> > + * These declarations are provided by string.h of the standard headers,
> > + * and grant_table.h from the Xen public headers.
> > + */
> 
> Btw, there's another difference only implied so far: The integer
> types indeed need to be made available up front (as spelled out).
> The others can as well be made available after ring.h inclusion,
> as long as that happens before invoking any of the macros.

OK

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [GSoc] Adding Floating Point support to Mini-OS

2017-03-29 Thread Samuel Thibault
Hello,

Felix Schmoll, on mer. 29 mars 2017 20:53:14 +0200, wrote:
> -While implementing our own kernel last semester me and my team-mate
> came to believe that pusha/popa were faster that pushing/popping the
> individual registers, since it is just a single command. The Mini-OS
> kernel however does the latter. Is that a conscience performance-trade
> for something or did we just underly a misconception, in that it
> compiles to the same thing in the end?

That's very old code :) I guess if there had been any performance trade,
since that was more than a decade ago, the trade is moot with nowadays'
processor architectures being very different :)

I guess one argument could be to allow flexibility for other register
order, or something like that. Probably not something strong anyway, so
I'd say don't believe there was much thinking there :)

> -Lazy floating point register saving is similar to Copy-on-write, is that 
> correct?

Mmm, not so exactly, depends how you understand "similar" and
"copy-on-write" :)

It's true that it makes sense to compare with copy-on-write, in the
sense that as long as the next thread does not touch FP stuff, one
can keep the previous thread FP state as it is, leading to noticeable
optimization of loading/storing those 512bit-ish registers :)

I however don't think you want to try to implement the "copy"
optimization part of memory copy-on-write, in the sense that when you
create a new thread, better just copy the FP state from the creator, and
not try to share the state like copy-on-write does with memory between
processes.  Optimizing the context switch cost is the most of what
you'll want to optimize.

> -There is nothing preventing me from using some floating-point library
> for the user-space test program, right?

Sure.  You just run the risk of pulling stuff which would require more
OS support from mini-os than currently implemented :)

Samuel

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [GSoc] Adding Floating Point support to Mini-OS

2017-03-29 Thread Felix Schmoll
Hi,

while looking at this some more I came to the following
questions/assumptions, so I'd be grateful if you could shortly address them:

-While implementing our own kernel last semester me and my team-mate came
to believe that pusha/popa were faster that pushing/popping the individual
registers, since it is just a single command. The Mini-OS kernel however
does the latter. Is that a conscience performance-trade for something or
did we just underly a misconception, in that it compiles to the same thing
in the end?

-Lazy floating point register saving is similar to Copy-on-write, is that
correct?

-There is nothing preventing me from using some floating-point library for
the user-space test program, right?

I'd also appreciate if you could have a quick glance my updated proposal
(on the GSoC portal) and give me some more feedback on it.

Thanks
Felix
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH RFC 07/20] migration: defer precopy policy to libxl

2017-03-29 Thread Jennifer Herbert

Hi,

I would like to encourage this patch - as I have use for it outside
of your postcopy work.

Some things people will comment on:
You've used 'unsigned' without the int keyword, which people don't like.
Also on line 324, your missing space between 'if (' and 
'ctx->save.policy_decision'.


Also, I'm not a fan of your CONSULT_POLICY macro, which you've defined at
a odd point in your function, and I think could be done more elegantly.
Worst of all ... its a macro - which I think should generally be avoided 
unless
there is little choice.   I'm sure you could write a helper function to 
replace this.


Cheers,

-jenny

On 27/03/17 10:06, Joshua Otto wrote:

The precopy phase of the xc_domain_save() live migration algorithm has
historically been implemented to run until either a) (almost) no pages
are dirty or b) some fixed, hard-coded maximum number of precopy
iterations has been exceeded.  This policy and its implementation are
less than ideal for a few reasons:
- the logic of the policy is intertwined with the control flow of the
   mechanism of the precopy stage
- it can't take into account facts external to the immediate
   migration context, such as interactive user input or the passage of
   wall-clock time
- it does not permit the user to change their mind, over time, about
   what to do at the end of the precopy (they get an unconditional
   transition into the stop-and-copy phase of the migration)

To permit users to implement arbitrary higher-level policies governing
when the live migration precopy phase should end, and what should be
done next:
- add a precopy_policy() callback to the xc_domain_save() user-supplied
   callbacks
- during the precopy phase of live migrations, consult this policy after
   each batch of pages transmitted and take the dictated action, which
   may be to a) abort the migration entirely, b) continue with the
   precopy, or c) proceed to the stop-and-copy phase.
- provide an implementation of the old policy as such a callback in
   libxl and plumb it through the IPC machinery to libxc, effectively
   maintaing the old policy for now

Signed-off-by: Joshua Otto 
---
  tools/libxc/include/xenguest.h |  23 -
  tools/libxc/xc_nomigrate.c |   3 +-
  tools/libxc/xc_sr_common.h |   7 +-
  tools/libxc/xc_sr_save.c   | 194 ++---
  tools/libxl/libxl_dom_save.c   |  20 
  tools/libxl/libxl_save_callout.c   |   3 +-
  tools/libxl/libxl_save_helper.c|   7 +-
  tools/libxl/libxl_save_msgs_gen.pl |   4 +-
  8 files changed, 189 insertions(+), 72 deletions(-)

diff --git a/tools/libxc/include/xenguest.h b/tools/libxc/include/xenguest.h
index aa8cc8b..30ffb6f 100644
--- a/tools/libxc/include/xenguest.h
+++ b/tools/libxc/include/xenguest.h
@@ -39,6 +39,14 @@
   */
  struct xenevtchn_handle;
  
+/* For save's precopy_policy(). */

+struct precopy_stats
+{
+unsigned iteration;
+unsigned total_written;
+long dirty_count; /* -1 if unknown */
+};
+
  /* callbacks provided by xc_domain_save */
  struct save_callbacks {
  /* Called after expiration of checkpoint interval,
@@ -46,6 +54,17 @@ struct save_callbacks {
   */
  int (*suspend)(void* data);
  
+/* Called after every batch of page data sent during the precopy phase of a

+ * live migration to ask the caller what to do next based on the current
+ * state of the precopy migration.
+ */
+#define XGS_POLICY_ABORT  (-1) /* Abandon the migration entirely and
+* tidy up. */
+#define XGS_POLICY_CONTINUE_PRECOPY 0  /* Remain in the precopy phase. */
+#define XGS_POLICY_STOP_AND_COPY1  /* Immediately suspend and transmit the
+* remaining dirty pages. */
+int (*precopy_policy)(struct precopy_stats stats, void *data);
+
  /* Called after the guest's dirty pages have been
   *  copied into an output buffer.
   * Callback function resumes the guest & the device model,
@@ -100,8 +119,8 @@ typedef enum {
   *doesn't use checkpointing
   * @return 0 on success, -1 on failure
   */
-int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, uint32_t 
max_iters,
-   uint32_t max_factor, uint32_t flags /* XCFLAGS_xxx */,
+int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom,
+   uint32_t flags /* XCFLAGS_xxx */,
 struct save_callbacks* callbacks, int hvm,
 xc_migration_stream_t stream_type, int recv_fd);
  
diff --git a/tools/libxc/xc_nomigrate.c b/tools/libxc/xc_nomigrate.c

index 15c838f..2af64e4 100644
--- a/tools/libxc/xc_nomigrate.c
+++ b/tools/libxc/xc_nomigrate.c
@@ -20,8 +20,7 @@
  #include 
  #include 
  
-int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, uint32_t max_iters,

-   uint32_t max_factor, uint32_t flags,
+int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, 

Re: [Xen-devel] [PATCH v4 1/8] xen: import ring.h from xen

2017-03-29 Thread Stefano Stabellini
On Wed, 29 Mar 2017, Paolo Bonzini wrote:
> On 29/03/2017 01:54, Stefano Stabellini wrote:
> >>> I understand your point of view, and honestly it wouldn't be a problem
> >>> doing it the way you suggested either. However, I think that going
> >>> forward it will be less of a maintenance pain to keep ring.h in sync,
> >>> compared to maintaining a versioned build dependency between Xen and
> >>> QEMU for the compilation of one PV backend. We do have version checks
> >>> in QEMU for Xen compatibility, but not for PV backends or the xenpv
> >>> machine yet.
> >> For the pvUSB backend I just used a mandatory macro from the header for
> >> the #ifdef. The backend will signal support when it was defined during
> >> build and will refuse initialization otherwise. Xen tools are able to
> >> recoginze qemu support of the backend by looking into Xenstore.
> > 
> > What do you think of the following:
> 
> Let's just copy the header...

It's settled.

Thanks,

Stefano

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] Implement new hypercall to return domain id

2017-03-29 Thread Wei Liu
For all other people who happen to see this patch: this isn't meant to
be applied.

Cc Juergen as well.

On Wed, Mar 29, 2017 at 08:07:39PM +0200, Felix Schmoll wrote:
> Minimal implementation of a new hypercall that returns the domain
> id of the invoking domain with adjustments in libxc.
> 

I think this patch looks good.

If you feel like it, a more advanced challenge would be to pass a
userspace buffer down to Xen and have the hypervisor write something
back. Some new APIs are needed but there is plenty of examples in libxc.

You don't have to do it, and you don't need to post patch. It's just
something you might be interested in trying in case you have time.

Wei.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [xen-unstable-smoke test] 106982: tolerable trouble: broken/fail/pass - PUSHED

2017-03-29 Thread osstest service owner
flight 106982 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/106982/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-arm64-arm64-xl-xsm   1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt 12 migrate-support-checkfail   never pass
 build-arm64   5 xen-buildfail   never pass
 build-arm64-pvops 5 kernel-build fail   never pass
 test-armhf-armhf-xl  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  13 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  b988e88cc041f630dcfa735dcf9c895310103629
baseline version:
 xen  68a08e12c44435eb86600072b9e725e2387ce163

Last test of basis   106974  2017-03-28 15:01:49 Z1 days
Testing same since   106982  2017-03-29 17:01:14 Z0 days1 attempts


People who touched revisions under test:
  Andrew Cooper 
  Jan Beulich 

jobs:
 build-amd64  pass
 build-arm64  fail
 build-armhf  pass
 build-amd64-libvirt  pass
 build-arm64-pvopsfail
 test-armhf-armhf-xl  pass
 test-arm64-arm64-xl-xsm  broken  
 test-amd64-amd64-xl-qemuu-debianhvm-i386 pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

+ branch=xen-unstable-smoke
+ revision=b988e88cc041f630dcfa735dcf9c895310103629
+ . ./cri-lock-repos
++ . ./cri-common
+++ . ./cri-getconfig
+++ umask 002
+++ getrepos
 getconfig Repos
 perl -e '
use Osstest;
readglobalconfig();
print $c{"Repos"} or die $!;
'
+++ local repos=/home/osstest/repos
+++ '[' -z /home/osstest/repos ']'
+++ '[' '!' -d /home/osstest/repos ']'
+++ echo /home/osstest/repos
++ repos=/home/osstest/repos
++ repos_lock=/home/osstest/repos/lock
++ '[' x '!=' x/home/osstest/repos/lock ']'
++ OSSTEST_REPOS_LOCK_LOCKED=/home/osstest/repos/lock
++ exec with-lock-ex -w /home/osstest/repos/lock ./ap-push xen-unstable-smoke 
b988e88cc041f630dcfa735dcf9c895310103629
+ branch=xen-unstable-smoke
+ revision=b988e88cc041f630dcfa735dcf9c895310103629
+ . ./cri-lock-repos
++ . ./cri-common
+++ . ./cri-getconfig
+++ umask 002
+++ getrepos
 getconfig Repos
 perl -e '
use Osstest;
readglobalconfig();
print $c{"Repos"} or die $!;
'
+++ local repos=/home/osstest/repos
+++ '[' -z /home/osstest/repos ']'
+++ '[' '!' -d /home/osstest/repos ']'
+++ echo /home/osstest/repos
++ repos=/home/osstest/repos
++ repos_lock=/home/osstest/repos/lock
++ '[' x/home/osstest/repos/lock '!=' x/home/osstest/repos/lock ']'
+ . ./cri-common
++ . ./cri-getconfig
++ umask 002
+ select_xenbranch
+ case "$branch" in
+ tree=xen
+ xenbranch=xen-unstable-smoke
+ qemuubranch=qemu-upstream-unstable
+ '[' xxen = xlinux ']'
+ linuxbranch=
+ '[' xqemu-upstream-unstable = x ']'
+ select_prevxenbranch
++ ./cri-getprevxenbranch xen-unstable-smoke
+ prevxenbranch=xen-4.8-testing
+ '[' xb988e88cc041f630dcfa735dcf9c895310103629 = x ']'
+ : tested/2.6.39.x
+ . ./ap-common
++ : osst...@xenbits.xen.org
+++ getconfig OsstestUpstream
+++ perl -e '
use Osstest;
readglobalconfig();
print $c{"OsstestUpstream"} or die $!;
'
++ :
++ : git://xenbits.xen.org/xen.git
++ : osst...@xenbits.xen.org:/home/xen/git/xen.git
++ : git://xenbits.xen.org/qemu-xen-traditional.git
++ : git://git.kernel.org
++ : git://git.kernel.org/pub/scm/linux/kernel/git
++ : git
++ : git://xenbits.xen.org/xtf.git
++ : osst...@xenbits.xen.org:/home/xen/git/xtf.git
++ : git://xenbits.xen.org/xtf.git
++ : git://xenbits.xen.org/libvirt.git
++ : osst...@xenbits.xen.org:/home/xen/git/libvirt.git
++ : git://xenbits.xen.org/libvirt.git
++ : git://xenbits.xen.org/osstest/rumprun.git
++ : git
++ : git://xenbits.xen.org/osstest/rumprun.git
++ : 

Re: [Xen-devel] [PATCH] Enable compiling with gcc tracing

2017-03-29 Thread Wei Liu
On Wed, Mar 29, 2017 at 08:16:02PM +0200, Felix Schmoll wrote:
> Make minimal adjustments in order to enable the compilation of the
> xen source-code with gcc-6's -fsanitize-coverage=trace-pc option.
> 
> Due to a bug in Xen's build-system the flag for the compiler has
> to be handed in via the command line, i.e. for compiling one would
> use:
> 
>   make CC=
> 
> This is an experimental patch as in a final version you would not
> want all files to be compiled with this option by default.
> 
> Signed-off-by: Felix Schmoll 

Have you tried booting Xen with this patch applied and trace-pc enabled?

> ---
>  xen/Rules.mk| 1 +
>  xen/common/kernel.c | 2 ++
>  xen/include/xen/hypercall.h | 2 ++
>  3 files changed, 5 insertions(+)
> 
> diff --git a/xen/Rules.mk b/xen/Rules.mk
> index 77bcd44922..254cc4381e 100644
> --- a/xen/Rules.mk
> +++ b/xen/Rules.mk
> @@ -46,6 +46,7 @@ else
>  CFLAGS += -O2 -fomit-frame-pointer
>  endif
>  
> +CFLAGS += -fsanitize-coverage=trace-pc
>  CFLAGS += -nostdinc -fno-builtin -fno-common
>  CFLAGS += -Werror -Wredundant-decls -Wno-pointer-arith
>  CFLAGS += -pipe -g -D__XEN__ -include $(BASEDIR)/include/xen/config.h
> diff --git a/xen/common/kernel.c b/xen/common/kernel.c
> index 84618715dc..77b22effb3 100644
> --- a/xen/common/kernel.c
> +++ b/xen/common/kernel.c
> @@ -238,6 +238,8 @@ void __init do_initcalls(void)
>  
>  # define DO(fn) long do_##fn
>  
> +void __sanitizer_cov_trace_pc(void) { return; }
> +

IIRC this is going to recurse until stack overflows, right?

What I actually want you to do is to add a new file and hook it up in
the build system.

And maybe if you feel like it, start looking at actually putting
something inside the trace_pc function.

Hint, you can get hold of PC with  __builtin_return_address(0).

>  #endif
>  
>  /*
> diff --git a/xen/include/xen/hypercall.h b/xen/include/xen/hypercall.h
> index cc99aea57d..12517b5e90 100644
> --- a/xen/include/xen/hypercall.h
> +++ b/xen/include/xen/hypercall.h
> @@ -19,6 +19,8 @@
>  #include 
>  #include 
>  
> +extern void __sanitizer_cov_trace_pc(void);
> +
>  extern long
>  do_sched_op(
>  int cmd,
> -- 
> 2.11.0
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [GSoC] GSoC Introduction : Fuzzing Xen hypercall interface

2017-03-29 Thread Felix Schmoll
2017-03-29 17:54 GMT+02:00 Wei Liu :

> On Wed, Mar 29, 2017 at 04:24:15PM +0200, Felix Schmoll wrote:
> > Hi,
> >
> > here the final patch for the domain_id:
>
> Please have a look at
>
> https://wiki.xenproject.org/wiki/Submitting_Xen_Project_Patches
>
> And follow the instructions to submit patches.
>

Hi,

was that right this time?

It didn't seem to make sense to add this to the patch, so I'm appending the
program I used for testing the hypercall patch here just for completeness:

#include 
#include 
#include 

int main(void) {
xc_interface *xch = xc_interface_open(NULL, NULL, 0);

int ver = xc_version(xch, XENVER_version, NULL);

printf("Xen Version %d.%d\n", ver >> 16, ver & 0x);

int domid = xc_domid(xch);

printf("Xen domain: %d\n", domid);

if(domid == -1) {
printf("errno=%d (%s)\n", errno, xc_strerror(xch, errno));
}

xc_interface_close(xch);

return 0;
}

Felix
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH] Enable compiling with gcc tracing

2017-03-29 Thread Felix Schmoll
Make minimal adjustments in order to enable the compilation of the
xen source-code with gcc-6's -fsanitize-coverage=trace-pc option.

Due to a bug in Xen's build-system the flag for the compiler has
to be handed in via the command line, i.e. for compiling one would
use:

make CC=

This is an experimental patch as in a final version you would not
want all files to be compiled with this option by default.

Signed-off-by: Felix Schmoll 
---
 xen/Rules.mk| 1 +
 xen/common/kernel.c | 2 ++
 xen/include/xen/hypercall.h | 2 ++
 3 files changed, 5 insertions(+)

diff --git a/xen/Rules.mk b/xen/Rules.mk
index 77bcd44922..254cc4381e 100644
--- a/xen/Rules.mk
+++ b/xen/Rules.mk
@@ -46,6 +46,7 @@ else
 CFLAGS += -O2 -fomit-frame-pointer
 endif
 
+CFLAGS += -fsanitize-coverage=trace-pc
 CFLAGS += -nostdinc -fno-builtin -fno-common
 CFLAGS += -Werror -Wredundant-decls -Wno-pointer-arith
 CFLAGS += -pipe -g -D__XEN__ -include $(BASEDIR)/include/xen/config.h
diff --git a/xen/common/kernel.c b/xen/common/kernel.c
index 84618715dc..77b22effb3 100644
--- a/xen/common/kernel.c
+++ b/xen/common/kernel.c
@@ -238,6 +238,8 @@ void __init do_initcalls(void)
 
 # define DO(fn) long do_##fn
 
+void __sanitizer_cov_trace_pc(void) { return; }
+
 #endif
 
 /*
diff --git a/xen/include/xen/hypercall.h b/xen/include/xen/hypercall.h
index cc99aea57d..12517b5e90 100644
--- a/xen/include/xen/hypercall.h
+++ b/xen/include/xen/hypercall.h
@@ -19,6 +19,8 @@
 #include 
 #include 
 
+extern void __sanitizer_cov_trace_pc(void);
+
 extern long
 do_sched_op(
 int cmd,
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH] Implement new hypercall to return domain id

2017-03-29 Thread Felix Schmoll
Minimal implementation of a new hypercall that returns the domain
id of the invoking domain with adjustments in libxc.

Signed-off-by: Felix Schmoll 
---
 tools/libxc/include/xenctrl.h | 1 +
 tools/libxc/xc_private.c  | 6 ++
 xen/arch/arm/traps.c  | 1 +
 xen/arch/x86/hvm/hypercall.c  | 1 +
 xen/arch/x86/hypercall.c  | 1 +
 xen/arch/x86/pv/hypercall.c   | 1 +
 xen/common/kernel.c   | 6 ++
 xen/include/public/xen.h  | 1 +
 xen/include/xen/hypercall.h   | 3 +++
 9 files changed, 21 insertions(+)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 2d97d36c38..1e152c8a07 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -1569,6 +1569,7 @@ int xc_domctl(xc_interface *xch, struct xen_domctl 
*domctl);
 int xc_sysctl(xc_interface *xch, struct xen_sysctl *sysctl);
 
 int xc_version(xc_interface *xch, int cmd, void *arg);
+int xc_domid(xc_interface *xch);
 
 int xc_flask_op(xc_interface *xch, xen_flask_op_t *op);
 
diff --git a/tools/libxc/xc_private.c b/tools/libxc/xc_private.c
index 72e6242417..37b11e41a9 100644
--- a/tools/libxc/xc_private.c
+++ b/tools/libxc/xc_private.c
@@ -530,6 +530,12 @@ int xc_version(xc_interface *xch, int cmd, void *arg)
 return rc;
 }
 
+int xc_domid(xc_interface *xch)
+{
+return xencall0(xch->xcall, __HYPERVISOR_domain_id);
+}
+
+
 unsigned long xc_make_page_below_4G(
 xc_interface *xch, uint32_t domid, unsigned long mfn)
 {
diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
index 614501f761..eddb264f2d 100644
--- a/xen/arch/arm/traps.c
+++ b/xen/arch/arm/traps.c
@@ -1297,6 +1297,7 @@ static arm_hypercall_t arm_hypercall_table[] = {
 HYPERCALL(platform_op, 1),
 HYPERCALL_ARM(vcpu_op, 3),
 HYPERCALL(vm_assist, 2),
+HYPERCALL(domain_id, 0),
 };
 
 #ifndef NDEBUG
diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
index e7238ce293..3d541e01e1 100644
--- a/xen/arch/x86/hvm/hypercall.c
+++ b/xen/arch/x86/hvm/hypercall.c
@@ -132,6 +132,7 @@ static const hypercall_table_t hvm_hypercall_table[] = {
 COMPAT_CALL(mmuext_op),
 HYPERCALL(xenpmu_op),
 COMPAT_CALL(dm_op),
+HYPERCALL(domain_id),
 HYPERCALL(arch_1)
 };
 
diff --git a/xen/arch/x86/hypercall.c b/xen/arch/x86/hypercall.c
index e30181817a..184741bf16 100644
--- a/xen/arch/x86/hypercall.c
+++ b/xen/arch/x86/hypercall.c
@@ -67,6 +67,7 @@ const hypercall_args_t hypercall_args_table[NR_hypercalls] =
 ARGS(tmem_op, 1),
 ARGS(xenpmu_op, 2),
 ARGS(dm_op, 3),
+ARGS(domain_id, 0),
 ARGS(mca, 1),
 ARGS(arch_1, 1),
 };
diff --git a/xen/arch/x86/pv/hypercall.c b/xen/arch/x86/pv/hypercall.c
index 9d29d2f088..f12314b5ca 100644
--- a/xen/arch/x86/pv/hypercall.c
+++ b/xen/arch/x86/pv/hypercall.c
@@ -79,6 +79,7 @@ static const hypercall_table_t pv_hypercall_table[] = {
 #endif
 HYPERCALL(xenpmu_op),
 COMPAT_CALL(dm_op),
+HYPERCALL(domain_id),
 HYPERCALL(mca),
 HYPERCALL(arch_1),
 };
diff --git a/xen/common/kernel.c b/xen/common/kernel.c
index 84618715dc..5107aacd06 100644
--- a/xen/common/kernel.c
+++ b/xen/common/kernel.c
@@ -431,6 +431,12 @@ DO(xen_version)(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 return -ENOSYS;
 }
 
+DO(domain_id)(void)
+{
+struct domain *d = current->domain;
+return d->domain_id;
+}
+
 DO(nmi_op)(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 {
 struct xennmi_callback cb;
diff --git a/xen/include/public/xen.h b/xen/include/public/xen.h
index 91ba8bb48e..3a8c4af281 100644
--- a/xen/include/public/xen.h
+++ b/xen/include/public/xen.h
@@ -121,6 +121,7 @@ DEFINE_XEN_GUEST_HANDLE(xen_ulong_t);
 #define __HYPERVISOR_xc_reserved_op   39 /* reserved for XenClient */
 #define __HYPERVISOR_xenpmu_op40
 #define __HYPERVISOR_dm_op41
+#define __HYPERVISOR_domain_id42
 
 /* Architecture-specific hypercall definitions. */
 #define __HYPERVISOR_arch_0   48
diff --git a/xen/include/xen/hypercall.h b/xen/include/xen/hypercall.h
index cc99aea57d..5c7bc6233e 100644
--- a/xen/include/xen/hypercall.h
+++ b/xen/include/xen/hypercall.h
@@ -83,6 +83,9 @@ do_xen_version(
 XEN_GUEST_HANDLE_PARAM(void) arg);
 
 extern long
+do_domain_id(void);
+
+extern long
 do_console_io(
 int cmd,
 int count,
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH RFC 10/20] libxc/xc_sr_save.c: initialise rec.data before free()

2017-03-29 Thread Wei Liu
On Tue, Mar 28, 2017 at 08:59:09PM +0100, Andrew Cooper wrote:
> On 27/03/17 10:06, Joshua Otto wrote:
> > colo_merge_secondary_dirty_bitmap() unconditionally free()s the .data
> > member of its local xc_sr_record structure rec on its exit path.
> > However, if the initial call to read_record() fails then this member is
> > uninitialised.  Initialise it.
> >
> > Signed-off-by: Joshua Otto 
> 
> Reviewed-by: Andrew Cooper 
> 
> This bugfix should be taken ASAP, and needs backporting to Xen 4.7 and 4.8

Acked + applied.

> 
> > ---
> >  tools/libxc/xc_sr_save.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/tools/libxc/xc_sr_save.c b/tools/libxc/xc_sr_save.c
> > index ac97d93..6acc8d3 100644
> > --- a/tools/libxc/xc_sr_save.c
> > +++ b/tools/libxc/xc_sr_save.c
> > @@ -681,7 +681,7 @@ static int send_memory_live(struct xc_sr_context *ctx)
> >  static int colo_merge_secondary_dirty_bitmap(struct xc_sr_context *ctx)
> >  {
> >  xc_interface *xch = ctx->xch;
> > -struct xc_sr_record rec;
> > +struct xc_sr_record rec = { 0, 0, NULL };
> >  uint64_t *pfns = NULL;
> >  uint64_t pfn;
> >  unsigned count, i;
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [libvirt bisection] complete build-i386-libvirt

2017-03-29 Thread osstest service owner
branch xen-unstable
xenbranch xen-unstable
job build-i386-libvirt
testid libvirt-build

Tree: libvirt git://libvirt.org/libvirt.git
Tree: libvirt_gnulib git://git.sv.gnu.org/gnulib.git
Tree: qemu git://xenbits.xen.org/qemu-xen-traditional.git
Tree: qemuu git://xenbits.xen.org/qemu-xen.git
Tree: xen git://xenbits.xen.org/xen.git

*** Found and reproduced problem changeset ***

  Bug is in tree:  libvirt git://libvirt.org/libvirt.git
  Bug introduced:  6760cc4bfddb84a8ab33e8b008c0440013c17499
  Bug not present: 2902771fa0de27a8963f949db2240bce52b6999b
  Last fail repro: http://logs.test-lab.xenproject.org/osstest/logs/106983/


  commit 6760cc4bfddb84a8ab33e8b008c0440013c17499
  Author: John Ferlan 
  Date:   Fri Mar 24 12:25:27 2017 -0400
  
  logical: Need to overwrite/clear more than just first 512 bytes
  
  https://bugzilla.redhat.com/show_bug.cgi?id=1430679
  
  As it turns out some file headers (e.g. ext4) may be larger/longer than
  the 512 bytes of zeros being written prior to a pvcreate, so let's write
  out 2048 bytes similar to how the pvcreate sources would peek at the first
  4 sectors of the device.
  
  Make sure there is at enough bytes on the device to clear before doing
  doing the clear - just to be sure.


For bisection revision-tuple graph see:
   
http://logs.test-lab.xenproject.org/osstest/results/bisect/libvirt/build-i386-libvirt.libvirt-build.html
Revision IDs in each graph node refer, respectively, to the Trees above.


Running cs-bisection-step 
--graph-out=/home/logs/results/bisect/libvirt/build-i386-libvirt.libvirt-build 
--summary-out=tmp/106983.bisection-summary --basis-template=106829 
--blessings=real,real-bisect libvirt build-i386-libvirt libvirt-build
Searching for failure / basis pass:
 106952 fail [host=baroque1] / 106931 [host=huxelrebe1] 106924 
[host=huxelrebe1] 106906 [host=chardonnay0] 106883 [host=elbling1] 106855 
[host=elbling1] 106829 [host=huxelrebe1] 106800 [host=chardonnay1] 106755 
[host=chardonnay1] 106707 [host=huxelrebe1] 106678 [host=chardonnay1] 106650 
[host=huxelrebe1] 106628 [host=elbling1] 106608 [host=huxelrebe1] 106594 
[host=huxelrebe1] 106583 [host=italia0] 106562 [host=chardonnay0] 106543 
[host=huxelrebe1] 106510 [host=huxelrebe1] 106483 [host=rimava0] 106473 
[host=elbling1] 106434 [host=rimava0] 106394 [host=elbling1] 106352 
[host=italia0] 106226 [host=huxelrebe0] 106101 [host=huxelrebe1] 105973 
[host=huxelrebe1] 105938 [host=baroque0] 105921 [host=italia0] 105902 
[host=nocera1] 105895 [host=fiano0] 105870 [host=nobling0] 105805 
[host=italia0] 105785 [host=baroque0] 105759 [host=huxelrebe1] 105720 ok.
Failure / basis pass flights: 106952 / 105720
(tree with no url: minios)
(tree with no url: ovmf)
(tree with no url: seabios)
Tree: libvirt git://libvirt.org/libvirt.git
Tree: libvirt_gnulib git://git.sv.gnu.org/gnulib.git
Tree: qemu git://xenbits.xen.org/qemu-xen-traditional.git
Tree: qemuu git://xenbits.xen.org/qemu-xen.git
Tree: xen git://xenbits.xen.org/xen.git
Latest ecc3a63bf2e998d4bf9e63d5260a8adcd8288150 
94386a13667c645fd42544a7fd302c39fcdf 
8051789e982499050680a26febeada7467e18a8d 
e97832ec6b2a7ddd48b8e6d1d848ffdfee6a31c7 
5b08f85689a8479f947be74563fe993875b9caa9
Basis pass 5620c609596cdfb3b809afab1b37b7f00fe831a1 
94386a13667c645fd42544a7fd302c39fcdf 
b669e922b37b8957248798a5eb7aa96a666cd3fe 
728e90b41d46c1c1c210ac496204efd51936db75 
63e1d01b8fd948b3e0fa3beea494e407668aa43b
Generating revisions with ./adhoc-revtuple-generator  
git://libvirt.org/libvirt.git#5620c609596cdfb3b809afab1b37b7f00fe831a1-ecc3a63bf2e998d4bf9e63d5260a8adcd8288150
 
git://git.sv.gnu.org/gnulib.git#94386a13667c645fd42544a7fd302c39fcdf-94386a13667c645fd42544a7fd302c39fcdf
 
git://xenbits.xen.org/qemu-xen-traditional.git#b669e922b37b8957248798a5eb7aa96a666cd3fe-8051789e982499050680a26febeada7467e18a8d
 
git://xenbits.xen.org/qemu-xen.git#728e90b41d46c1c1c210ac496204efd51936db75-e97832ec6b2a7ddd48b8e6d1d848ffdfee6a31c7
 
git://xenbits.xen.org/xen.git#63e1d01b8fd948b3e0fa3beea494e407668aa43b-5b08f85689a8479f947be74563fe993875b9caa9
Loaded 10252 nodes in revision graph
Searching for test results:
 105720 pass 5620c609596cdfb3b809afab1b37b7f00fe831a1 
94386a13667c645fd42544a7fd302c39fcdf 
b669e922b37b8957248798a5eb7aa96a666cd3fe 
728e90b41d46c1c1c210ac496204efd51936db75 
63e1d01b8fd948b3e0fa3beea494e407668aa43b
 105759 [host=huxelrebe1]
 105805 [host=italia0]
 105785 [host=baroque0]
 105895 [host=fiano0]
 105870 [host=nobling0]
 105902 [host=nocera1]
 105921 [host=italia0]
 105938 [host=baroque0]
 105973 [host=huxelrebe1]
 106101 [host=huxelrebe1]
 106226 [host=huxelrebe0]
 106394 [host=elbling1]
 106352 [host=italia0]
 106483 [host=rimava0]
 106434 [host=rimava0]
 106473 [host=elbling1]
 106510 [host=huxelrebe1]
 106608 [host=huxelrebe1]
 106543 [host=huxelrebe1]
 106594 [host=huxelrebe1]
 106562 [host=chardonnay0]
 106595 

Re: [Xen-devel] [PATCH v1 9/9] mm: Make sure pages are scrubbed

2017-03-29 Thread Andrew Cooper
On 29/03/17 17:45, Wei Liu wrote:
> On Wed, Mar 29, 2017 at 12:35:25PM -0400, Boris Ostrovsky wrote:
>> On 03/29/2017 12:25 PM, Wei Liu wrote:
>>> On Fri, Mar 24, 2017 at 01:05:04PM -0400, Boris Ostrovsky wrote:
 +static void check_one_page(struct page_info *pg)
 +{
 +mfn_t mfn = _mfn(page_to_mfn(pg));
 +uint64_t *ptr;
 +
 +ptr  = map_domain_page(mfn);
 +ASSERT(*ptr != PAGE_POISON);
>>> Should be ASSERT(*ptr == 0) here.
>> We can't assume it will be zero --- see scrub_one_page().
> It's still possible a value is leaked from the guest, and that value has
> rather high probability to be != PAGE_POISON.
>
> In that case there should be a ifdef to match the debug and non-debug
> builds.

The only sensible thing to do is check that the entire page is zeroes. 
This is a debug option after all.

We don't want to waste time poisoning zero pages we hand back to the
free pool.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v4] dm_op: Add xendevicemodel_modified_memory_bulk.

2017-03-29 Thread Jennifer Herbert
From: Jennifer Herbert 

This new lib devicemodel call allows multiple extents of pages to be
marked as modified in a single call.  This is something needed for a
usecase I'm working on.

The xen side of the modified_memory call has been modified to accept
an array of extents.  The devicemodle library either provides an array
of length 1, to support the original library function, or a second
function allows an array to be provided.

Signed-off-by: Jennifer Herbert 
---
Changes as discussed on V3.

Cc: Jan Beulich 
Cc: Ian Jackson 
Cc: Wei Liu 
Cc: Andrew Cooper 
Cc: Paul Durrant 
---
 tools/libs/devicemodel/core.c   |   30 --
 tools/libs/devicemodel/include/xendevicemodel.h |   19 +++-
 xen/arch/x86/hvm/dm.c   |  124 ---
 xen/include/public/hvm/dm_op.h  |   19 +++-
 4 files changed, 140 insertions(+), 52 deletions(-)

diff --git a/tools/libs/devicemodel/core.c b/tools/libs/devicemodel/core.c
index a85cb49..f9e37a5 100644
--- a/tools/libs/devicemodel/core.c
+++ b/tools/libs/devicemodel/core.c
@@ -434,22 +434,36 @@ int xendevicemodel_track_dirty_vram(
  dirty_bitmap, (size_t)(nr + 7) / 8);
 }
 
-int xendevicemodel_modified_memory(
-xendevicemodel_handle *dmod, domid_t domid, uint64_t first_pfn,
-uint32_t nr)
+int xendevicemodel_modified_memory_bulk(
+xendevicemodel_handle *dmod, domid_t domid,
+struct xen_dm_op_modified_memory_extent *extents, uint32_t nr)
 {
 struct xen_dm_op op;
-struct xen_dm_op_modified_memory *data;
+struct xen_dm_op_modified_memory *header;
+size_t extents_size = nr * sizeof(struct xen_dm_op_modified_memory_extent);
 
 memset(, 0, sizeof(op));
 
 op.op = XEN_DMOP_modified_memory;
-data = _memory;
+header = _memory;
 
-data->first_pfn = first_pfn;
-data->nr = nr;
+header->nr_extents = nr;
+header->pfns_processed = 0;
 
-return xendevicemodel_op(dmod, domid, 1, , sizeof(op));
+return xendevicemodel_op(dmod, domid, 2, , sizeof(op),
+ extents, extents_size);
+}
+
+int xendevicemodel_modified_memory(
+xendevicemodel_handle *dmod, domid_t domid, uint64_t first_pfn,
+uint32_t nr)
+{
+struct xen_dm_op_modified_memory_extent extent;
+
+extent.first_pfn = first_pfn;
+extent.nr = nr;
+
+return xendevicemodel_modified_memory_bulk(dmod, domid, , 1);
 }
 
 int xendevicemodel_set_mem_type(
diff --git a/tools/libs/devicemodel/include/xendevicemodel.h 
b/tools/libs/devicemodel/include/xendevicemodel.h
index b3f600e..9c62bf9 100644
--- a/tools/libs/devicemodel/include/xendevicemodel.h
+++ b/tools/libs/devicemodel/include/xendevicemodel.h
@@ -236,8 +236,8 @@ int xendevicemodel_track_dirty_vram(
 uint32_t nr, unsigned long *dirty_bitmap);
 
 /**
- * This function notifies the hypervisor that a set of domain pages
- * have been modified.
+ * This function notifies the hypervisor that a set of contiguous
+ * domain pages have been modified.
  *
  * @parm dmod a handle to an open devicemodel interface.
  * @parm domid the domain id to be serviced
@@ -250,6 +250,21 @@ int xendevicemodel_modified_memory(
 uint32_t nr);
 
 /**
+ * This function notifies the hypervisor that a set of discontiguous
+ * domain pages have been modified.
+ *
+ * @parm dmod a handle to an open devicemodel interface.
+ * @parm domid the domain id to be serviced
+ * @parm extents an array of extent structs, which each hold
+ a start_pfn and nr (number of pfns).
+ * @parm nr the number of extents in the array
+ * @return 0 on success, -1 on failure.
+ */
+int xendevicemodel_modified_memory_bulk(
+xendevicemodel_handle *dmod, domid_t domid,
+struct xen_dm_op_modified_memory_extent extents[], uint32_t nr);
+
+/**
  * This function notifies the hypervisor that a set of domain pages
  * are to be treated in a specific way. (See the definition of
  * hvmmem_type_t).
diff --git a/xen/arch/x86/hvm/dm.c b/xen/arch/x86/hvm/dm.c
index 2122c45..c4fa704 100644
--- a/xen/arch/x86/hvm/dm.c
+++ b/xen/arch/x86/hvm/dm.c
@@ -118,57 +118,108 @@ static int set_isa_irq_level(struct domain *d, uint8_t 
isa_irq,
 return 0;
 }
 
-static int modified_memory(struct domain *d,
-   struct xen_dm_op_modified_memory *data)
+
+int copy_extent_from_guest_array(struct xen_dm_op_modified_memory_extent* 
extent,
+   xen_dm_op_buf_t* buf, unsigned int index)
+
 {
-xen_pfn_t last_pfn = data->first_pfn + data->nr - 1;
-unsigned int iter = 0;
-int rc = 0;
+if ( (buf->size / sizeof(struct xen_dm_op_modified_memory_extent)) <=
+ index )
+return -EINVAL;
 
-if ( (data->first_pfn > last_pfn) ||
- (last_pfn > domain_get_maximum_gpfn(d)) )
+if ( 

Re: [Xen-devel] [PATCH v1 9/9] mm: Make sure pages are scrubbed

2017-03-29 Thread Wei Liu
On Wed, Mar 29, 2017 at 12:35:25PM -0400, Boris Ostrovsky wrote:
> On 03/29/2017 12:25 PM, Wei Liu wrote:
> > On Fri, Mar 24, 2017 at 01:05:04PM -0400, Boris Ostrovsky wrote:
> >> +static void check_one_page(struct page_info *pg)
> >> +{
> >> +mfn_t mfn = _mfn(page_to_mfn(pg));
> >> +uint64_t *ptr;
> >> +
> >> +ptr  = map_domain_page(mfn);
> >> +ASSERT(*ptr != PAGE_POISON);
> > Should be ASSERT(*ptr == 0) here.
> 
> We can't assume it will be zero --- see scrub_one_page().

It's still possible a value is leaked from the guest, and that value has
rather high probability to be != PAGE_POISON.

In that case there should be a ifdef to match the debug and non-debug
builds.

Wei.

> 
> -boris

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v1 9/9] mm: Make sure pages are scrubbed

2017-03-29 Thread Boris Ostrovsky
On 03/29/2017 12:25 PM, Wei Liu wrote:
> On Fri, Mar 24, 2017 at 01:05:04PM -0400, Boris Ostrovsky wrote:
>> +static void check_one_page(struct page_info *pg)
>> +{
>> +mfn_t mfn = _mfn(page_to_mfn(pg));
>> +uint64_t *ptr;
>> +
>> +ptr  = map_domain_page(mfn);
>> +ASSERT(*ptr != PAGE_POISON);
> Should be ASSERT(*ptr == 0) here.

We can't assume it will be zero --- see scrub_one_page().

-boris

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v1 9/9] mm: Make sure pages are scrubbed

2017-03-29 Thread Wei Liu
On Fri, Mar 24, 2017 at 01:05:04PM -0400, Boris Ostrovsky wrote:
> +static void check_one_page(struct page_info *pg)
> +{
> +mfn_t mfn = _mfn(page_to_mfn(pg));
> +uint64_t *ptr;
> +
> +ptr  = map_domain_page(mfn);
> +ASSERT(*ptr != PAGE_POISON);

Should be ASSERT(*ptr == 0) here.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 for-4.9] x86/mm: Drop MEM_LOG() and correct some printed information

2017-03-29 Thread Wei Liu
On Wed, Mar 29, 2017 at 04:38:43PM +0100, Andrew Cooper wrote:
> MEM_LOG() is just a thin wrapper around gdprintk(), obscuring some of the
> common information.  Inline it, and take the opportunity to correct some of
> the printked information.
> 
> Some corrections, each where appropriate:
>  * Correction of pfn/mfn terms and consistent use of PRI_pfn/mfn
>  * s!I/O!MMIO!
>  * Consistently represent domains using d%d notation
>  * Use 0x prefix for otherwise unqualified hex numbers
>  * Remove "ptwr_emulate:" prefix, as the embedded __func__ is already clear
>  * Provide more useful slot information
>  * Delete some not-very-helpful lines entirely
> 
> Signed-off-by: Andrew Cooper 

Reviewed-by: Wei Liu 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [GSoC] GSoC Introduction : Fuzzing Xen hypercall interface

2017-03-29 Thread Wei Liu
On Wed, Mar 29, 2017 at 04:24:15PM +0200, Felix Schmoll wrote:
> Hi,
> 
> here the final patch for the domain_id:

Please have a look at

https://wiki.xenproject.org/wiki/Submitting_Xen_Project_Patches

And follow the instructions to submit patches.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH RFC] x86/emulate: implement hvmemul_cmpxchg() with an actual CMPXCHG

2017-03-29 Thread Razvan Cojocaru
On 03/29/2017 06:04 PM, Razvan Cojocaru wrote:
> On 03/29/2017 05:00 PM, Razvan Cojocaru wrote:
>> On 03/29/2017 04:55 PM, Jan Beulich wrote:
>> On 28.03.17 at 12:50,  wrote:
 On 03/28/2017 01:47 PM, Jan Beulich wrote:
 On 28.03.17 at 12:27,  wrote:
>> On 03/28/2017 01:03 PM, Jan Beulich wrote:
>> On 28.03.17 at 11:14,  wrote:
 I'm not sure that the RETRY model is what the guest OS expects. AFAIK, 
 a
 failed CMPXCHG should happen just once, with the proper registers and 
 ZF
 set. The guest surely expects neither that the instruction resume until
 it succeeds, nor that some hidden loop goes on for an undeterminate
 ammount of time until a CMPXCHG succeeds.
>>>
>>> The guest doesn't observe the CMPXCHG failing - RETRY leads to
>>> the instruction being restarted instead of completed.
>>
>> Indeed, but it works differently with hvm_emulate_one_vm_event() where
>> RETRY currently would have the instruction be re-executed (properly
>> re-executed, not just re-emulated) by the guest.
>
> Right - see my other reply to Andrew: The function likely would
> need to tell apart guest CMPXCHG uses from us using the insn to
> carry out the write by some other one. That may involve
> adjustments to the memory write logic in x86_emulate() itself, as
> the late failure of the comparison then would also need to be
> communicated back (via ZF clear) to the guest.

 Exactly, it would require quite some reworking of x86_emulate().
>>>
>>> I had imagined it to be less intrusive (outside of x86_emulate()),
>>> but I've now learned why Andrew was able to get rid of
>>> X86EMUL_CMPXCHG_FAILED - the apparently intended behavior
>>> was never implemented. Attached a first take at it, which has
>>> seen smoke testing, but nothing more. The way it ends up being
>>> I don't think this can reasonably be considered for 4.9 at this
>>> point in time. (Also Cc-ing Tim for the shadow code changes,
>>> even if this isn't really a proper patch submission.)
>>
>> Thanks! I'll give a spin with a modified version of my CMPXCHG patch as
>> soon as possible.
> 
> With the attached patch with hvmemul_cmpxchg() now returning
> X86EMUL_CMPXCHG_FAILED if __cmpxchg() fails my (32-bit) Windows 7 guest
> gets stuck at the "Starting Windows" screen.

And again this change:

1162 if ( __cmpxchg(map, old, new, bytes) != old )
1163 {
1164 memcpy(p_old, map, bytes);
1165 rc = X86EMUL_CMPXCHG_FAILED;
1166 }

i.e. doing the accumulator <- destination part of a failed CMPXCHG which
might be missing from your patch leads me again to BSODs. I'm not sure
if __cmpxchg() should work differently and do this atomically, or if
this should be done in x86_emulate() and it's not, or if it is done
there somewhere I've missed in the first patch.


Thanks,
Razvan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC PATCH 11/24] ARM: vGICv3: handle virtual LPI pending and property tables

2017-03-29 Thread Andre Przywara
Hi,

On 29/10/16 01:39, Stefano Stabellini wrote:
> On Wed, 28 Sep 2016, Andre Przywara wrote:
>> Allow a guest to provide the address and size for the memory regions
>> it has reserved for the GICv3 pending and property tables.
>> We sanitise the various fields of the respective redistributor
>> registers and map those pages into Xen's address space to have easy
>> access.
>>
>> Signed-off-by: Andre Przywara 
>> ---
>> diff --git a/xen/arch/arm/vgic-v3.c b/xen/arch/arm/vgic-v3.c
>> index e9b6490..8fe8386 100644
>> --- a/xen/arch/arm/vgic-v3.c
>> +++ b/xen/arch/arm/vgic-v3.c


>> +reg = v->domain->arch.vgic.rdist_propbase;
>> +vgic_reg64_update(, r, info);
>> +reg = sanitize_propbaser(reg);
>> +v->domain->arch.vgic.rdist_propbase = reg;
>>  
>> +nr_pages = BIT((v->domain->arch.vgic.rdist_propbase & 0x1f) + 1) - 
>> 8192;
>> +nr_pages = DIV_ROUND_UP(nr_pages, PAGE_SIZE);
> 
> Do we need to set an upper limit on nr_pages? We don't really want to
> allow (2^0x1f)/4096 pages, right?

Why not? This is the virtual property table, and the *guest* provides
the memory. We just comply here and map it. I don't see any issue.

[  ]

>> diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c
>> index b961551..4d9304f 100644
>> --- a/xen/arch/arm/vgic.c
>> +++ b/xen/arch/arm/vgic.c
>> @@ -488,6 +488,10 @@ struct pending_irq *lpi_to_pending(struct vcpu *v, 
>> unsigned int lpi,
>>  empty->pirq.irq = lpi;
>>  }
>>  
>> +/* Update the enabled status */
>> +if ( gicv3_lpi_is_enabled(v->domain, lpi) )
>> +set_bit(GIC_IRQ_GUEST_ENABLED, >pirq.status);
> 
> Where is the GIC_IRQ_GUEST_ENABLED unset?

In the patch where the INV command is emulated. This is how
enabling/disabling LPI works: Software (the guest here) sets the bit in
the property table and issues an ITS command to notify the ITS
(emulation) about it.

>>  return >pirq;
>>  }
>>  
>> diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h
>> index ae8a9de..0cd3500 100644
>> --- a/xen/include/asm-arm/domain.h
>> +++ b/xen/include/asm-arm/domain.h
>> @@ -109,6 +109,8 @@ struct arch_domain
>>  } *rdist_regions;
>>  int nr_regions; /* Number of rdist regions */
>>  uint32_t rdist_stride;  /* Re-Distributor stride */
>> +uint64_t rdist_propbase;
>> +uint8_t *proptable;
> 
> Do we need to keep both rdist_propbase and proptable? It is easy to go
> from proptable to rdist_propbase and I guess it is not an operation that
> is done often? If so, we could save some memory and remove it.

The code has changed meanwhile, so this does not apply direclty anymore,
but just to make sure:
We need rdist_propbase separately, because a guest can happily set and
change it as often as it wants before enabling LPIs. We shouldn't (and
we don't) allocate memory now (and so set proptable) until the LPIs get
enabled.

>>  #endif
>>  } vgic;
>>  
>> @@ -247,7 +249,10 @@ struct arch_vcpu
>>  
>>  /* GICv3: redistributor base and flags for this vCPU */
>>  paddr_t rdist_base;
>> -#define VGIC_V3_RDIST_LAST  (1 << 0)/* last vCPU of the rdist */
>> +#define VGIC_V3_RDIST_LAST  (1 << 0)/* last vCPU of the rdist */
>> +#define VGIC_V3_LPIS_ENABLED(1 << 1)
>> +uint64_t rdist_pendbase;
>> +unsigned long *pendtable;
> 
> Same here.

And the same rationale applies here.

Fixed / addresses the rest.

Cheers,
Andre.

>>  uint8_t flags;
>>  struct list_head pending_lpi_list;
>>  } vgic;
>> diff --git a/xen/include/asm-arm/gic-its.h b/xen/include/asm-arm/gic-its.h
>> index 1f881c0..3b2e5c0 100644
>> --- a/xen/include/asm-arm/gic-its.h
>> +++ b/xen/include/asm-arm/gic-its.h
>> @@ -139,7 +139,11 @@ int gicv3_lpi_drop_host_lpi(struct host_its *its,
>>  
>>  static inline int gicv3_lpi_get_priority(struct domain *d, uint32_t lpi)
>>  {
>> -return GIC_PRI_IRQ;
>> +return d->arch.vgic.proptable[lpi - 8192] & 0xfc;
> 
> Please #define 0xfc. Do we need to check for lpi overflows? As in lpi
> numbers larger than proptable size?
> 
> 
>> +}
>> +static inline bool gicv3_lpi_is_enabled(struct domain *d, uint32_t lpi)
>> +{
>> +return d->arch.vgic.proptable[lpi - 8192] & LPI_PROP_ENABLED;
>>  }
>>  
>>  #else
>> @@ -185,6 +189,10 @@ static inline int gicv3_lpi_get_priority(struct domain 
>> *d, uint32_t lpi)
>>  {
>>  return GIC_PRI_IRQ;
>>  }
>> +static inline bool gicv3_lpi_is_enabled(struct domain *d, uint32_t lpi)
>> +{
>> +return false;
>> +}
>>  
>>  #endif /* CONFIG_HAS_ITS */
>>  
>> diff --git a/xen/include/asm-arm/vgic.h b/xen/include/asm-arm/vgic.h
>> index 4e29ba6..2b216cc 100644
>> --- a/xen/include/asm-arm/vgic.h
>> +++ b/xen/include/asm-arm/vgic.h
>> @@ -285,6 +285,9 @@ VGIC_REG_HELPERS(32, 0x3);
>>  
>>  #undef VGIC_REG_HELPERS
>>  
>> +void *map_guest_pages(struct domain *d, paddr_t guest_addr, int nr_pages);

Re: [Xen-devel] [PATCH v3] dm_op: Add xendevicemodel_modified_memory_bulk.

2017-03-29 Thread Jennifer Herbert

On 29/03/17 15:54, Jan Beulich wrote:

On 28.03.17 at 15:18,  wrote:

@@ -441,13 +481,8 @@ static int dm_op(domid_t domid,
   struct xen_dm_op_modified_memory *data =
   _memory;
   
-const_op = false;

-
-rc = -EINVAL;
-if ( data->pad )
-break;
-
-rc = modified_memory(d, data);
+rc = modified_memory(d, data, [1]);
+const_op = (rc != 0);

Isn't this wrong now, i.e. don't you need to copy back the
header now in all cases?

I only define what I'll set nr_extents to in case of error, and of
course opaque
is opaque.

Well, but you do need the opaque value for the continuation,
don't you? In which case you need to also write back on
-ERESTART. And as you say you need to write back in case
of error. So I'd expect

const_op = !rc;



Quite right, see you point now - I didn't notice I'd inverted the logic.

-jenny


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 for-4.9] x86/mm: Drop MEM_LOG() and correct some printed information

2017-03-29 Thread Andrew Cooper
MEM_LOG() is just a thin wrapper around gdprintk(), obscuring some of the
common information.  Inline it, and take the opportunity to correct some of
the printked information.

Some corrections, each where appropriate:
 * Correction of pfn/mfn terms and consistent use of PRI_pfn/mfn
 * s!I/O!MMIO!
 * Consistently represent domains using d%d notation
 * Use 0x prefix for otherwise unqualified hex numbers
 * Remove "ptwr_emulate:" prefix, as the embedded __func__ is already clear
 * Provide more useful slot information
 * Delete some not-very-helpful lines entirely

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Wei Liu 
CC: Julien Grall 

v2:
 * Futher adjustments and deletions
---
 xen/arch/x86/mm.c | 303 ++
 1 file changed, 166 insertions(+), 137 deletions(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 4dbd24f..22e4af1 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -127,8 +127,6 @@
 l1_pgentry_t __section(".bss.page_aligned") __aligned(PAGE_SIZE)
 l1_fixmap[L1_PAGETABLE_ENTRIES];
 
-#define MEM_LOG(_f, _a...) gdprintk(XENLOG_WARNING , _f "\n" , ## _a)
-
 /*
  * PTE updates can be done with ordinary writes except:
  *  1. Debug builds get extra checking by using CMPXCHG[8B].
@@ -707,7 +705,8 @@ static int get_page_from_pagenr(unsigned long page_nr, 
struct domain *d)
 
 if ( unlikely(!mfn_valid(_mfn(page_nr))) || unlikely(!get_page(page, d)) )
 {
-MEM_LOG("Could not get page ref for pfn %lx", page_nr);
+gdprintk(XENLOG_WARNING,
+ "Could not get page ref for mfn %"PRI_mfn"\n", page_nr);
 return 0;
 }
 
@@ -771,7 +770,8 @@ get_##level##_linear_pagetable( 
\
 \
 if ( (level##e_get_flags(pde) & _PAGE_RW) ) \
 {   \
-MEM_LOG("Attempt to create linear p.t. with write perms");  \
+gdprintk(XENLOG_WARNING,\
+ "Attempt to create linear p.t. with write perms\n");   \
 return 0;   \
 }   \
 \
@@ -892,7 +892,8 @@ get_page_from_l1e(
 
 if ( unlikely(l1f & l1_disallow_mask(l1e_owner)) )
 {
-MEM_LOG("Bad L1 flags %x", l1f & l1_disallow_mask(l1e_owner));
+gdprintk(XENLOG_WARNING, "Bad L1 flags %x\n",
+ l1f & l1_disallow_mask(l1e_owner));
 return -EINVAL;
 }
 
@@ -913,8 +914,9 @@ get_page_from_l1e(
 {
 if ( mfn != (PADDR_MASK >> PAGE_SHIFT) ) /* INVALID_MFN? */
 {
-MEM_LOG("Non-privileged (%u) attempt to map I/O space %08lx", 
-pg_owner->domain_id, mfn);
+gdprintk(XENLOG_WARNING,
+ "d%d non-privileged attempt to map MMIO space 
%"PRI_mfn"\n",
+ pg_owner->domain_id, mfn);
 return -EPERM;
 }
 return -EINVAL;
@@ -925,9 +927,10 @@ get_page_from_l1e(
 {
 if ( mfn != (PADDR_MASK >> PAGE_SHIFT) ) /* INVALID_MFN? */
 {
-MEM_LOG("Dom%u attempted to map I/O space %08lx in dom%u to 
dom%u",
-curr->domain->domain_id, mfn, pg_owner->domain_id,
-l1e_owner->domain_id);
+gdprintk(XENLOG_WARNING,
+ "d%d attempted to map MMIO space %"PRI_mfn" in d%d to 
d%d\n",
+ curr->domain->domain_id, mfn, pg_owner->domain_id,
+ l1e_owner->domain_id);
 return -EPERM;
 }
 return -EINVAL;
@@ -998,9 +1001,10 @@ get_page_from_l1e(
 if ( (real_pg_owner == NULL) || (pg_owner == l1e_owner) ||
  xsm_priv_mapping(XSM_TARGET, pg_owner, real_pg_owner) )
 {
-MEM_LOG("pg_owner %d l1e_owner %d, but real_pg_owner %d",
-pg_owner->domain_id, l1e_owner->domain_id,
-real_pg_owner?real_pg_owner->domain_id:-1);
+gdprintk(XENLOG_WARNING,
+ "pg_owner d%d l1e_owner d%d, but real_pg_owner d%d\n",
+ pg_owner->domain_id, l1e_owner->domain_id,
+ real_pg_owner ? real_pg_owner->domain_id : -1);
 goto could_not_pin;
 }
 pg_owner = real_pg_owner;
@@ -1019,7 +1023,7 @@ get_page_from_l1e(
 ((l1e_owner == pg_owner) || !paging_mode_external(pg_owner));
 if ( write && !get_page_type(page, PGT_writable_page) )
 

Re: [Xen-devel] [PATCH RFC] x86/emulate: implement hvmemul_cmpxchg() with an actual CMPXCHG

2017-03-29 Thread Razvan Cojocaru
On 03/29/2017 05:00 PM, Razvan Cojocaru wrote:
> On 03/29/2017 04:55 PM, Jan Beulich wrote:
> On 28.03.17 at 12:50,  wrote:
>>> On 03/28/2017 01:47 PM, Jan Beulich wrote:
>>> On 28.03.17 at 12:27,  wrote:
> On 03/28/2017 01:03 PM, Jan Beulich wrote:
> On 28.03.17 at 11:14,  wrote:
>>> I'm not sure that the RETRY model is what the guest OS expects. AFAIK, a
>>> failed CMPXCHG should happen just once, with the proper registers and ZF
>>> set. The guest surely expects neither that the instruction resume until
>>> it succeeds, nor that some hidden loop goes on for an undeterminate
>>> ammount of time until a CMPXCHG succeeds.
>>
>> The guest doesn't observe the CMPXCHG failing - RETRY leads to
>> the instruction being restarted instead of completed.
>
> Indeed, but it works differently with hvm_emulate_one_vm_event() where
> RETRY currently would have the instruction be re-executed (properly
> re-executed, not just re-emulated) by the guest.

 Right - see my other reply to Andrew: The function likely would
 need to tell apart guest CMPXCHG uses from us using the insn to
 carry out the write by some other one. That may involve
 adjustments to the memory write logic in x86_emulate() itself, as
 the late failure of the comparison then would also need to be
 communicated back (via ZF clear) to the guest.
>>>
>>> Exactly, it would require quite some reworking of x86_emulate().
>>
>> I had imagined it to be less intrusive (outside of x86_emulate()),
>> but I've now learned why Andrew was able to get rid of
>> X86EMUL_CMPXCHG_FAILED - the apparently intended behavior
>> was never implemented. Attached a first take at it, which has
>> seen smoke testing, but nothing more. The way it ends up being
>> I don't think this can reasonably be considered for 4.9 at this
>> point in time. (Also Cc-ing Tim for the shadow code changes,
>> even if this isn't really a proper patch submission.)
> 
> Thanks! I'll give a spin with a modified version of my CMPXCHG patch as
> soon as possible.

With the attached patch with hvmemul_cmpxchg() now returning
X86EMUL_CMPXCHG_FAILED if __cmpxchg() fails my (32-bit) Windows 7 guest
gets stuck at the "Starting Windows" screen. It's state appears to be:

# ./xenctx -a 3
cs:eip: 0008:8bcd85d6
flags: 00200246 cid i z p
ss:esp: 0010:82736b9c
eax:    ebx: 84f3a678   ecx: 84ee2610   edx: 001eb615
esi: 40008000   edi: 82739d20   ebp: 82736c20
 ds: 0023es: 0023fs: 0030gs: 

cr0: 8001003b
cr2: 8fd94000
cr3: 00185000
cr4: 000406f9

dr0: 
dr1: 
dr2: 
dr3: 
dr6: fffe0ff0
dr7: 0400
Code (instr addr 8bcd85d6)
47 fc 83 c7 14 4e 75 ef 5f 5e c3 cc cc cc cc cc cc 8b ff fb f4  cc
cc cc cc cc 8b ff 55 8b ec

# ./xenctx -a 3
cs:eip: 0008:8bcd85d6
flags: 00200246 cid i z p
ss:esp: 0010:82736b9c
eax:    ebx: 84f3a678   ecx: 84ee2610   edx: 002ca60d
esi: 40008000   edi: 82739d20   ebp: 82736c20
 ds: 0023es: 0023fs: 0030gs: 

cr0: 8001003b
cr2: 8fd94000
cr3: 00185000
cr4: 000406f9

dr0: 
dr1: 
dr2: 
dr3: 
dr6: fffe0ff0
dr7: 0400
Code (instr addr 8bcd85d6)
47 fc 83 c7 14 4e 75 ef 5f 5e c3 cc cc cc cc cc cc 8b ff fb f4  cc
cc cc cc cc 8b ff 55 8b ec

This only happens in SMP scenarios (my guest had 10 VCPUs for easy
reproduction). With a single VCPU, the guest booted fine. So something
somehow is still not right when a CMPXCHG fails in a race-type situation
(unless something's obviously wrong with my patch, but I don't see it).


Thanks,
Razvan

diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index 2d92957..b946ef7 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -1029,6 +1030,77 @@ static int hvmemul_wbinvd_discard(
 return X86EMUL_OKAY;
 }
 
+static int hvmemul_vaddr_to_mfn(
+unsigned long addr,
+mfn_t *mfn,
+uint32_t pfec,
+struct x86_emulate_ctxt *ctxt)
+{
+paddr_t gpa = addr & ~PAGE_MASK;
+struct page_info *page;
+p2m_type_t p2mt;
+unsigned long gfn;
+struct vcpu *curr = current;
+struct hvm_emulate_ctxt *hvmemul_ctxt =
+container_of(ctxt, struct hvm_emulate_ctxt, ctxt);
+
+gfn = paging_gva_to_gfn(curr, addr, );
+
+if ( gfn == gfn_x(INVALID_GFN) )
+{
+pagefault_info_t pfinfo = {};
+
+if ( ( pfec & PFEC_page_paged ) || ( pfec & PFEC_page_shared ) )
+return X86EMUL_RETRY;
+
+pfinfo.linear = addr;
+pfinfo.ec = pfec;
+
+x86_emul_pagefault(pfinfo.ec, pfinfo.linear, _ctxt->ctxt);
+return X86EMUL_EXCEPTION;
+}
+
+gpa |= (paddr_t)gfn << PAGE_SHIFT;
+
+/*
+ * No need to do the P2M lookup for internally handled MMIO, 

Re: [Xen-devel] [PATCH v3] dm_op: Add xendevicemodel_modified_memory_bulk.

2017-03-29 Thread Jan Beulich
>>> On 29.03.17 at 16:35,  wrote:
> On 29/03/17 11:38, Jan Beulich wrote:
> On 28.03.17 at 15:18,  wrote:
>>> @@ -441,13 +481,8 @@ static int dm_op(domid_t domid,
>>>   struct xen_dm_op_modified_memory *data =
>>>   _memory;
>>>   
>>> -const_op = false;
>>> -
>>> -rc = -EINVAL;
>>> -if ( data->pad )
>>> -break;
>>> -
>>> -rc = modified_memory(d, data);
>>> +rc = modified_memory(d, data, [1]);
>>> +const_op = (rc != 0);
>> Isn't this wrong now, i.e. don't you need to copy back the
>> header now in all cases?
> 
> I only define what I'll set nr_extents to in case of error, and of 
> course opaque
> is opaque.

Well, but you do need the opaque value for the continuation,
don't you? In which case you need to also write back on
-ERESTART. And as you say you need to write back in case
of error. So I'd expect

   const_op = !rc;

> By only writing back on error, I hoped to improve efficiency for the 
> common case,
> (especially for existing use with calls of one extent).  (I know its 
> only a small difference)
> If you want me to write back - what do you want me to write back for 
> success?

Right, avoiding to write something useless is sensible. If anything,
the original value of nr_extents would make sense to be written
back, but that value was long lost by that time.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH] docs: Update xen-tscmode.pod.7 to reflect default TSC mode changes

2017-03-29 Thread Boris Ostrovsky
A number of changes have been made to how we determine whether TSC
is emulated (e.g. commit 4fc380ac0077 ("x86/time: don't use virtual TSC
if host and guest frequencies are equal")).

Update the man page to reflect those changes

Signed-off-by: Boris Ostrovsky 
Suggested-by: Olaf Hering 
---
 docs/man/xen-tscmode.pod.7 | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/docs/man/xen-tscmode.pod.7 b/docs/man/xen-tscmode.pod.7
index 0da57e5..0f93453 100644
--- a/docs/man/xen-tscmode.pod.7
+++ b/docs/man/xen-tscmode.pod.7
@@ -203,12 +203,12 @@ The default mode (tsc_mode==0) checks TSC-safeness of the 
underlying
 hardware on which the virtual machine is launched.  If it is
 TSC-safe, rdtsc will execute at hardware speed; if it is not, rdtsc
 will be emulated.  Once a virtual machine is save/restored or migrated,
-however, there are two possibilities:  For a paravirtualized (PV) domain,
-TSC will always be emulated.  For a fully-virtualized (HVM) domain,
-TSC remains native IF the source physical machine and target physical machine
-have the same TSC frequency; else TSC is emulated.  Note that, though
-emulated, the "apparent" TSC frequency will be the TSC frequency
-of the initial physical machine, even after migration.
+however, there are two possibilities: TSC remains native IF the source
+physical machine and target physical machine have the same TSC frequency
+(or, for HVM/PVH guests, if TSC scaling support is available); else TSC
+is emulated.  Note that, though emulated, the "apparent" TSC frequency
+will be the TSC frequency of the initial physical machine, even after
+migration.
 
 For environments where both TSC-safeness AND highest performance
 even across migration is a requirement, application code can be specially
-- 
2.7.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v3 3/8] x86/irq: rename NR_HVM_IRQS and break it's dependency on VIOAPIC_NUM_PINS

2017-03-29 Thread Roger Pau Monne
Rename it to NR_HVM_DOMU_IRQS, and get it's value from the size of the DomU vIO
APIC redirection table.

Signed-off-by: Roger Pau Monné 
---
Cc: Jan Beulich 
Cc: Andrew Cooper 
---
Changes since v2:
 - New in this version.

NB: this patch makes it easier to get rid of VIOAPIC_NUM_PINS in later patches.
---
 xen/arch/x86/physdev.c   | 6 --
 xen/drivers/passthrough/io.c | 2 +-
 xen/include/xen/hvm/irq.h| 4 ++--
 3 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/xen/arch/x86/physdev.c b/xen/arch/x86/physdev.c
index 6c15f9bf49..eec4a41231 100644
--- a/xen/arch/x86/physdev.c
+++ b/xen/arch/x86/physdev.c
@@ -32,6 +32,8 @@ static int physdev_hvm_map_pirq(
 {
 int ret = 0;
 
+ASSERT(!is_hardware_domain(d));
+
 spin_lock(>event_lock);
 switch ( type )
 {
@@ -39,7 +41,7 @@ static int physdev_hvm_map_pirq(
 const struct hvm_irq_dpci *hvm_irq_dpci;
 unsigned int machine_gsi = 0;
 
-if ( *index < 0 || *index >= NR_HVM_IRQS )
+if ( *index < 0 || *index >= NR_HVM_DOMU_IRQS )
 {
 ret = -EINVAL;
 break;
@@ -52,7 +54,7 @@ static int physdev_hvm_map_pirq(
 {
 const struct hvm_girq_dpci_mapping *girq;
 
-BUILD_BUG_ON(ARRAY_SIZE(hvm_irq_dpci->girq) < NR_HVM_IRQS);
+BUILD_BUG_ON(ARRAY_SIZE(hvm_irq_dpci->girq) < NR_HVM_DOMU_IRQS);
 list_for_each_entry ( girq,
   _irq_dpci->girq[*index],
   list )
diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
index f48eb31420..83e096131e 100644
--- a/xen/drivers/passthrough/io.c
+++ b/xen/drivers/passthrough/io.c
@@ -330,7 +330,7 @@ int pt_irq_create_bind(
 spin_unlock(>event_lock);
 return -ENOMEM;
 }
-for ( i = 0; i < NR_HVM_IRQS; i++ )
+for ( i = 0; i < NR_HVM_DOMU_IRQS; i++ )
 INIT_LIST_HEAD(_irq_dpci->girq[i]);
 
 hvm_domain_irq(d)->dpci = hvm_irq_dpci;
diff --git a/xen/include/xen/hvm/irq.h b/xen/include/xen/hvm/irq.h
index d3f8623c0c..f04125248e 100644
--- a/xen/include/xen/hvm/irq.h
+++ b/xen/include/xen/hvm/irq.h
@@ -76,13 +76,13 @@ struct hvm_girq_dpci_mapping {
 #define NR_ISAIRQS  16
 #define NR_LINK 4
 #if defined(CONFIG_X86)
-# define NR_HVM_IRQS VIOAPIC_NUM_PINS
+# define NR_HVM_DOMU_IRQS ARRAY_SIZE(((struct hvm_hw_vioapic *)0)->redirtbl)
 #endif
 
 /* Protected by domain's event_lock */
 struct hvm_irq_dpci {
 /* Guest IRQ to guest device/intx mapping. */
-struct list_head girq[NR_HVM_IRQS];
+struct list_head girq[NR_HVM_DOMU_IRQS];
 /* Record of mapped ISA IRQs */
 DECLARE_BITMAP(isairq_map, NR_ISAIRQS);
 /* Record of mapped Links */
-- 
2.11.0 (Apple Git-81)


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v3 4/8] x86/hvm: convert gsi_assert_count into a variable size array

2017-03-29 Thread Roger Pau Monne
Rearrange the fields of hvm_irq so that gsi_assert_count can be converted into
a variable size array and add a new field to account the number of GSIs.

Due to this changes the irq member in the hvm_domain struct also needs to
become a pointer set at runtime.

Signed-off-by: Roger Pau Monné 
---
Cc: Jan Beulich 
Cc: Andrew Cooper 
---
Changes since v2:
 - Parenthesize is_hvm_pv_evtchn_domain argument.
 - Add a build BUG_ON to make sure DomU number of IRQs covers the ISA range at
   least.
 - Add an ASSERT to make sure nr_gsis covers the ISA range (those two are
   identical now, but that's going to change when Dom0 introduces a variable
   number of GSIs, hence the ASSERT and the BUILD_BUG_ON).
 - Don't expand the ASSERTs in the irq assert/deassert routines (the above
   additions already cover those).
 - Use %2 as format specifier to print the GSIs assert count (the existing code
   has been left as-is, using %2.2 instead).
---
 xen/arch/x86/hvm/hvm.c   | 14 +-
 xen/arch/x86/hvm/irq.c   | 19 ++-
 xen/include/asm-x86/domain.h |  2 +-
 xen/include/asm-x86/hvm/domain.h |  2 +-
 xen/include/asm-x86/hvm/irq.h| 28 +++-
 5 files changed, 44 insertions(+), 21 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 98dede20db..6c3c944abd 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -620,11 +620,19 @@ int hvm_domain_initialise(struct domain *d)
 d->arch.hvm_domain.params = xzalloc_array(uint64_t, HVM_NR_PARAMS);
 d->arch.hvm_domain.io_handler = xzalloc_array(struct hvm_io_handler,
   NR_IO_HANDLERS);
+d->arch.hvm_domain.irq = xzalloc_bytes(hvm_irq_size(NR_HVM_DOMU_IRQS));
+
 rc = -ENOMEM;
-if ( !d->arch.hvm_domain.pl_time ||
+if ( !d->arch.hvm_domain.pl_time || !d->arch.hvm_domain.irq ||
  !d->arch.hvm_domain.params  || !d->arch.hvm_domain.io_handler )
 goto fail1;
 
+/* Set the default number of GSIs */
+hvm_domain_irq(d)->nr_gsis = NR_HVM_DOMU_IRQS;
+
+BUILD_BUG_ON(NR_HVM_DOMU_IRQS < NR_ISAIRQS);
+ASSERT(hvm_domain_irq(d)->nr_gsis >= NR_ISAIRQS);
+
 /* need link to containing domain */
 d->arch.hvm_domain.pl_time->domain = d;
 
@@ -681,6 +689,7 @@ int hvm_domain_initialise(struct domain *d)
 xfree(d->arch.hvm_domain.io_handler);
 xfree(d->arch.hvm_domain.params);
 xfree(d->arch.hvm_domain.pl_time);
+xfree(d->arch.hvm_domain.irq);
  fail0:
 hvm_destroy_cacheattr_region_list(d);
 return rc;
@@ -727,6 +736,9 @@ void hvm_domain_destroy(struct domain *d)
 xfree(d->arch.hvm_domain.pl_time);
 d->arch.hvm_domain.pl_time = NULL;
 
+xfree(d->arch.hvm_domain.irq);
+d->arch.hvm_domain.irq = NULL;
+
 list_for_each_safe ( ioport_list, tmp,
  >arch.hvm_domain.g2m_ioport_list )
 {
diff --git a/xen/arch/x86/hvm/irq.c b/xen/arch/x86/hvm/irq.c
index c2951ccf8a..00713257c9 100644
--- a/xen/arch/x86/hvm/irq.c
+++ b/xen/arch/x86/hvm/irq.c
@@ -69,6 +69,7 @@ static void __hvm_pci_intx_assert(
 return;
 
 gsi = hvm_pci_intx_gsi(device, intx);
+ASSERT(gsi < hvm_irq->nr_gsis);
 if ( hvm_irq->gsi_assert_count[gsi]++ == 0 )
 assert_gsi(d, gsi);
 
@@ -99,6 +100,7 @@ static void __hvm_pci_intx_deassert(
 return;
 
 gsi = hvm_pci_intx_gsi(device, intx);
+ASSERT(gsi < hvm_irq->nr_gsis);
 --hvm_irq->gsi_assert_count[gsi];
 
 link= hvm_pci_intx_link(device, intx);
@@ -363,7 +365,7 @@ void hvm_set_callback_via(struct domain *d, uint64_t via)
 {
 case HVMIRQ_callback_gsi:
 gsi = hvm_irq->callback_via.gsi = (uint8_t)via;
-if ( (gsi == 0) || (gsi >= ARRAY_SIZE(hvm_irq->gsi_assert_count)) )
+if ( (gsi == 0) || (gsi >= hvm_irq->nr_gsis) )
 hvm_irq->callback_via_type = HVMIRQ_callback_none;
 else if ( hvm_irq->callback_via_asserted &&
   (hvm_irq->gsi_assert_count[gsi]++ == 0) )
@@ -419,9 +421,9 @@ struct hvm_intack hvm_vcpu_has_pending_irq(struct vcpu *v)
 if ( unlikely(v->mce_pending) )
 return hvm_intack_mce;
 
-if ( (plat->irq.callback_via_type == HVMIRQ_callback_vector)
+if ( (plat->irq->callback_via_type == HVMIRQ_callback_vector)
  && vcpu_info(v, evtchn_upcall_pending) )
-return hvm_intack_vector(plat->irq.callback_via.vector);
+return hvm_intack_vector(plat->irq->callback_via.vector);
 
 if ( vlapic_accept_pic_intr(v) && plat->vpic[0].int_output )
 return hvm_intack_pic(0);
@@ -495,7 +497,7 @@ static void irq_dump(struct domain *d)
(uint32_t) hvm_irq->isa_irq.pad[0], 
hvm_irq->pci_link.route[0], hvm_irq->pci_link.route[1],
hvm_irq->pci_link.route[2], hvm_irq->pci_link.route[3]);
-for ( i = 0 ; i < VIOAPIC_NUM_PINS; i += 8 )
+for ( i = 0; i < 

[Xen-devel] [PATCH v3 7/8] x86/ioapic: add prototype for io_apic_gsi_base to io_apic.h

2017-03-29 Thread Roger Pau Monne
So that the function can be called from other files without adding prototypes
to each of them.

Signed-off-by: Roger Pau Monné 
Acked-by: Jan Beulich 
---
Cc: Jan Beulich 
Cc: Andrew Cooper 
---
Changes since v1:
 - Add io_ prefix to avoid confusion.
 - Make the parameter unsigned.
---
 xen/arch/x86/io_apic.c| 4 +---
 xen/arch/x86/mpparse.c| 2 +-
 xen/include/asm-x86/io_apic.h | 3 +++
 3 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/xen/arch/x86/io_apic.c b/xen/arch/x86/io_apic.c
index 24ee431b00..d18046067c 100644
--- a/xen/arch/x86/io_apic.c
+++ b/xen/arch/x86/io_apic.c
@@ -2274,8 +2274,6 @@ static int ioapic_physbase_to_id(unsigned long physbase)
 return -EINVAL;
 }
 
-unsigned apic_gsi_base(int apic);
-
 static int apic_pin_2_gsi_irq(int apic, int pin)
 {
 int idx;
@@ -2286,7 +2284,7 @@ static int apic_pin_2_gsi_irq(int apic, int pin)
 idx = find_irq_entry(apic, pin, mp_INT);
 
 return idx >= 0 ? pin_2_irq(idx, apic, pin)
-: apic_gsi_base(apic) + pin;
+: io_apic_gsi_base(apic) + pin;
 }
 
 int ioapic_guest_read(unsigned long physbase, unsigned int reg, u32 *pval)
diff --git a/xen/arch/x86/mpparse.c b/xen/arch/x86/mpparse.c
index 1eb7c99ea7..efcbc6115d 100644
--- a/xen/arch/x86/mpparse.c
+++ b/xen/arch/x86/mpparse.c
@@ -913,7 +913,7 @@ unsigned __init highest_gsi(void)
return res;
 }
 
-unsigned apic_gsi_base(int apic)
+unsigned int io_apic_gsi_base(unsigned int apic)
 {
return mp_ioapic_routing[apic].gsi_base;
 }
diff --git a/xen/include/asm-x86/io_apic.h b/xen/include/asm-x86/io_apic.h
index 225edd63b2..8029c8f400 100644
--- a/xen/include/asm-x86/io_apic.h
+++ b/xen/include/asm-x86/io_apic.h
@@ -127,6 +127,9 @@ struct __packed IO_APIC_route_entry {
 /* I/O APIC entries */
 extern struct mpc_config_ioapic mp_ioapics[MAX_IO_APICS];
 
+/* Base GSI for this IO APIC */
+unsigned int io_apic_gsi_base(unsigned int apic);
+
 /* Only need to remap ioapic RTE (reg: 10~3Fh) */
 #define ioapic_reg_remapped(reg) (iommu_intremap && ((reg) >= 0x10))
 
-- 
2.11.0 (Apple Git-81)


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v3 5/8] x86/vioapic: allow the vIO APIC to have a variable number of pins

2017-03-29 Thread Roger Pau Monne
Although it's still always set to VIOAPIC_NUM_PINS (48).

Add a new field to the hvm_ioapic struct to contain the number of pins (number
of IO redirection table entries) and turn the redirection table into a variable
sized array.

Signed-off-by: Roger Pau Monné 
---
Cc: Jan Beulich 
Cc: Andrew Cooper 
---
Changes since v2:
 - Undefine VIOAPIC_NUM_PINS for hypervisor code.

Changes since v1:
 - Almost completely reworked due to previous changes.
---
 xen/arch/x86/hvm/vioapic.c | 23 ---
 xen/include/asm-x86/hvm/vioapic.h  |  4 +++-
 xen/include/public/arch-x86/hvm/save.h |  2 ++
 3 files changed, 21 insertions(+), 8 deletions(-)

diff --git a/xen/arch/x86/hvm/vioapic.c b/xen/arch/x86/hvm/vioapic.c
index 3e92947abf..6bc8dbdd42 100644
--- a/xen/arch/x86/hvm/vioapic.c
+++ b/xen/arch/x86/hvm/vioapic.c
@@ -53,7 +53,7 @@ static uint32_t vioapic_read_indirect(const struct 
hvm_vioapic *vioapic)
 case VIOAPIC_REG_VERSION:
 result = ((union IO_APIC_reg_01){
   .bits = { .version = VIOAPIC_VERSION_ID,
-.entries = VIOAPIC_NUM_PINS - 1 }
+.entries = vioapic->nr_pins - 1 }
   }).raw;
 break;
 
@@ -73,7 +73,7 @@ static uint32_t vioapic_read_indirect(const struct 
hvm_vioapic *vioapic)
 uint32_t redir_index = (vioapic->ioregsel - VIOAPIC_REG_RTE0) >> 1;
 uint64_t redir_content;
 
-if ( redir_index >= VIOAPIC_NUM_PINS )
+if ( redir_index >= vioapic->nr_pins )
 {
 gdprintk(XENLOG_WARNING, "apic_mem_readl:undefined ioregsel %x\n",
  vioapic->ioregsel);
@@ -197,7 +197,7 @@ static void vioapic_write_indirect(
 HVM_DBG_LOG(DBG_LEVEL_IOAPIC, "rte[%02x].%s = %08x",
 redir_index, vioapic->ioregsel & 1 ? "hi" : "lo", val);
 
-if ( redir_index >= VIOAPIC_NUM_PINS )
+if ( redir_index >= vioapic->nr_pins )
 {
 gdprintk(XENLOG_WARNING, "vioapic_write_indirect "
  "error register %x\n", vioapic->ioregsel);
@@ -368,7 +368,7 @@ void vioapic_irq_positive_edge(struct domain *d, unsigned 
int irq)
 
 HVM_DBG_LOG(DBG_LEVEL_IOAPIC, "irq %x", irq);
 
-ASSERT(irq < VIOAPIC_NUM_PINS);
+ASSERT(irq < vioapic->nr_pins);
 ASSERT(spin_is_locked(>arch.hvm_domain.irq_lock));
 
 ent = >redirtbl[irq];
@@ -397,7 +397,7 @@ void vioapic_update_EOI(struct domain *d, u8 vector)
 
 spin_lock(>arch.hvm_domain.irq_lock);
 
-for ( gsi = 0; gsi < VIOAPIC_NUM_PINS; gsi++ )
+for ( gsi = 0; gsi < vioapic->nr_pins; gsi++ )
 {
 ent = >redirtbl[gsi];
 if ( ent->fields.vector != vector )
@@ -431,6 +431,9 @@ static int ioapic_save(struct domain *d, 
hvm_domain_context_t *h)
 if ( !has_vioapic(d) )
 return 0;
 
+if ( s->nr_pins != ARRAY_SIZE(s->domU.redirtbl) )
+return -EOPNOTSUPP;
+
 return hvm_save_entry(IOAPIC, 0, h, >domU);
 }
 
@@ -441,6 +444,9 @@ static int ioapic_load(struct domain *d, 
hvm_domain_context_t *h)
 if ( !has_vioapic(d) )
 return -ENODEV;
 
+if ( s->nr_pins != ARRAY_SIZE(s->domU.redirtbl) )
+return -EOPNOTSUPP;
+
 return hvm_load_entry(IOAPIC, h, >domU);
 }
 
@@ -449,14 +455,16 @@ HVM_REGISTER_SAVE_RESTORE(IOAPIC, ioapic_save, 
ioapic_load, 1, HVMSR_PER_DOM);
 void vioapic_reset(struct domain *d)
 {
 struct hvm_vioapic *vioapic = domain_vioapic(d);
+uint32_t nr_pins = vioapic->nr_pins;
 int i;
 
 if ( !has_vioapic(d) )
 return;
 
-memset(vioapic, 0, sizeof(*vioapic));
+memset(vioapic, 0, hvm_vioapic_size(nr_pins));
 vioapic->domain = d;
-for ( i = 0; i < VIOAPIC_NUM_PINS; i++ )
+vioapic->nr_pins = nr_pins;
+for ( i = 0; i < nr_pins; i++ )
 vioapic->redirtbl[i].fields.mask = 1;
 vioapic->base_address = VIOAPIC_DEFAULT_BASE_ADDRESS;
 }
@@ -471,6 +479,7 @@ int vioapic_init(struct domain *d)
 return -ENOMEM;
 
 d->arch.hvm_domain.vioapic->domain = d;
+domain_vioapic(d)->nr_pins = ARRAY_SIZE(domain_vioapic(d)->domU.redirtbl);
 vioapic_reset(d);
 
 register_mmio_handler(d, _mmio_ops);
diff --git a/xen/include/asm-x86/hvm/vioapic.h 
b/xen/include/asm-x86/hvm/vioapic.h
index ab7be9e741..711f294fbe 100644
--- a/xen/include/asm-x86/hvm/vioapic.h
+++ b/xen/include/asm-x86/hvm/vioapic.h
@@ -49,12 +49,14 @@
 
 struct hvm_vioapic {
 struct domain *domain;
+uint32_t nr_pins;
 union {
-XEN_HVM_VIOAPIC(, VIOAPIC_NUM_PINS);
+XEN_HVM_VIOAPIC(, 0);
 struct hvm_hw_vioapic domU;
 };
 };
 
+#define hvm_vioapic_size(cnt) offsetof(struct hvm_vioapic, redirtbl[cnt])
 #define domain_vioapic(d) ((d)->arch.hvm_domain.vioapic)
 #define vioapic_domain(v) ((v)->domain)
 
diff --git a/xen/include/public/arch-x86/hvm/save.h 
b/xen/include/public/arch-x86/hvm/save.h
index ab848f6467..816973b9c2 

[Xen-devel] [PATCH v3 6/8] x86/vioapic: introduce support for multiple vIO APICS

2017-03-29 Thread Roger Pau Monne
Add support for multiple vIO APICs on the same domain, thus turning
d->arch.hvm_domain.vioapic into an array of vIO APIC control structures.

Note that this functionality is not exposed to unprivileged guests, and will
only be used by PVHv2 Dom0.

Signed-off-by: Roger Pau Monné 
---
Cc: Jan Beulich 
Cc: Andrew Cooper 
---
Changes since v2:
 - More constify.
 - gsi_vioapic should return the pin and not the base GSI.
 - Change pin_gsi to base_gsi, and make it return the base vIO APIC GSI.
 - Make sure base_gsi doesn't overrun the array.
 - Add ASSERTs to make sure DomU don't use more than one vIO APIC.

Changes since v1:
 - Constify some parameters.
 - Make gsi_vioapic return the base GSI of the IO APIC.
 - Add checks to pt_irq_vector in order to prevent dereferencing NULL.
---
 xen/arch/x86/hvm/dom0_build.c |   2 +-
 xen/arch/x86/hvm/vioapic.c| 218 --
 xen/arch/x86/hvm/vlapic.c |   2 +-
 xen/arch/x86/hvm/vpt.c|  25 -
 xen/include/asm-x86/hvm/domain.h  |   3 +-
 xen/include/asm-x86/hvm/vioapic.h |   4 +-
 6 files changed, 188 insertions(+), 66 deletions(-)

diff --git a/xen/arch/x86/hvm/dom0_build.c b/xen/arch/x86/hvm/dom0_build.c
index 5576db4ee8..daa791d3f4 100644
--- a/xen/arch/x86/hvm/dom0_build.c
+++ b/xen/arch/x86/hvm/dom0_build.c
@@ -729,7 +729,7 @@ static int __init pvh_setup_acpi_madt(struct domain *d, 
paddr_t *addr)
 io_apic = (void *)(madt + 1);
 io_apic->header.type = ACPI_MADT_TYPE_IO_APIC;
 io_apic->header.length = sizeof(*io_apic);
-io_apic->id = domain_vioapic(d)->id;
+io_apic->id = domain_vioapic(d, 0)->id;
 io_apic->address = VIOAPIC_DEFAULT_BASE_ADDRESS;
 
 x2apic = (void *)(io_apic + 1);
diff --git a/xen/arch/x86/hvm/vioapic.c b/xen/arch/x86/hvm/vioapic.c
index 6bc8dbdd42..990ad707ec 100644
--- a/xen/arch/x86/hvm/vioapic.c
+++ b/xen/arch/x86/hvm/vioapic.c
@@ -42,7 +42,57 @@
 /* HACK: Route IRQ0 only to VCPU0 to prevent time jumps. */
 #define IRQ0_SPECIAL_ROUTING 1
 
-static void vioapic_deliver(struct hvm_vioapic *vioapic, int irq);
+static void vioapic_deliver(struct hvm_vioapic *vioapic, unsigned int irq);
+
+static struct hvm_vioapic *addr_vioapic(const struct domain *d,
+unsigned long addr)
+{
+unsigned int i;
+
+for ( i = 0; i < d->arch.hvm_domain.nr_vioapics; i++ )
+{
+struct hvm_vioapic *vioapic = domain_vioapic(d, i);
+
+if ( addr >= vioapic->base_address &&
+ addr < vioapic->base_address + VIOAPIC_MEM_LENGTH )
+return vioapic;
+}
+
+return NULL;
+}
+
+struct hvm_vioapic *gsi_vioapic(const struct domain *d, unsigned int gsi,
+unsigned int *pin)
+{
+unsigned int i, base_gsi = 0;
+
+for ( i = 0; i < d->arch.hvm_domain.nr_vioapics; i++ )
+{
+struct hvm_vioapic *vioapic = domain_vioapic(d, i);
+
+if ( gsi >= base_gsi && gsi < base_gsi + vioapic->nr_pins )
+{
+*pin = gsi - base_gsi;
+return vioapic;
+}
+
+base_gsi += vioapic->nr_pins;
+}
+
+return NULL;
+}
+
+static unsigned int base_gsi(const struct domain *d,
+ const struct hvm_vioapic *vioapic)
+{
+const struct hvm_vioapic *tmp;
+unsigned int base_gsi = 0, nr_vioapics = d->arch.hvm_domain.nr_vioapics;
+
+for ( tmp = domain_vioapic(d, 0); --nr_vioapics && tmp != vioapic; tmp++ )
+base_gsi += tmp->nr_pins;
+
+return base_gsi;
+}
 
 static uint32_t vioapic_read_indirect(const struct hvm_vioapic *vioapic)
 {
@@ -94,11 +144,14 @@ static int vioapic_read(
 struct vcpu *v, unsigned long addr,
 unsigned int length, unsigned long *pval)
 {
-const struct hvm_vioapic *vioapic = domain_vioapic(v->domain);
+const struct hvm_vioapic *vioapic;
 uint32_t result;
 
 HVM_DBG_LOG(DBG_LEVEL_IOAPIC, "addr %lx", addr);
 
+vioapic = addr_vioapic(v->domain, addr);
+ASSERT(vioapic);
+
 switch ( addr & 0xff )
 {
 case VIOAPIC_REG_SELECT:
@@ -126,6 +179,7 @@ static void vioapic_write_redirent(
 struct hvm_irq *hvm_irq = hvm_domain_irq(d);
 union vioapic_redir_entry *pent, ent;
 int unmasked = 0;
+unsigned int gsi = base_gsi(d, vioapic) + idx;
 
 spin_lock(>arch.hvm_domain.irq_lock);
 
@@ -149,7 +203,7 @@ static void vioapic_write_redirent(
 
 *pent = ent;
 
-if ( idx == 0 )
+if ( gsi == 0 )
 {
 vlapic_adjust_i8259_target(d);
 }
@@ -165,7 +219,7 @@ static void vioapic_write_redirent(
 
 spin_unlock(>arch.hvm_domain.irq_lock);
 
-if ( idx == 0 || unmasked )
+if ( gsi == 0 || unmasked )
 pt_may_unmask_irq(d, NULL);
 }
 
@@ -215,7 +269,10 @@ static int vioapic_write(
 struct vcpu *v, unsigned long addr,
 unsigned int length, unsigned long val)
 {
-struct hvm_vioapic *vioapic = domain_vioapic(v->domain);
+

[Xen-devel] [PATCH v3 8/8] x86/vioapic: allow PVHv2 Dom0 to have more than one IO APIC

2017-03-29 Thread Roger Pau Monne
The base address, id and number of pins of the vIO APICs exposed to PVHv2 Dom0
is the same as the values found on bare metal.

Signed-off-by: Roger Pau Monné 
Reviewed-by: Jan Beulich 
---
Cc: Jan Beulich 
Cc: Andrew Cooper 
---
 xen/arch/x86/hvm/dom0_build.c | 33 -
 xen/arch/x86/hvm/hvm.c|  8 +---
 xen/arch/x86/hvm/vioapic.c| 29 +++--
 3 files changed, 40 insertions(+), 30 deletions(-)

diff --git a/xen/arch/x86/hvm/dom0_build.c b/xen/arch/x86/hvm/dom0_build.c
index daa791d3f4..db9be87612 100644
--- a/xen/arch/x86/hvm/dom0_build.c
+++ b/xen/arch/x86/hvm/dom0_build.c
@@ -681,12 +681,7 @@ static int __init pvh_setup_acpi_madt(struct domain *d, 
paddr_t *addr)
 max_vcpus = dom0_max_vcpus();
 /* Calculate the size of the crafted MADT. */
 size = sizeof(*madt);
-/*
- * FIXME: the current vIO-APIC code just supports one IO-APIC instance
- * per domain. This must be fixed in order to provide the same amount of
- * IO APICs as available on bare metal.
- */
-size += sizeof(*io_apic);
+size += sizeof(*io_apic) * nr_ioapics;
 size += sizeof(*intsrcovr) * acpi_intr_overrides;
 size += sizeof(*nmisrc) * acpi_nmi_sources;
 size += sizeof(*x2apic) * max_vcpus;
@@ -716,23 +711,19 @@ static int __init pvh_setup_acpi_madt(struct domain *d, 
paddr_t *addr)
  */
 madt->header.revision = min_t(unsigned char, table->revision, 4);
 
-/*
- * Setup the IO APIC entry.
- * FIXME: the current vIO-APIC code just supports one IO-APIC instance
- * per domain. This must be fixed in order to provide the same amount of
- * IO APICs as available on bare metal, and with the same IDs as found in
- * the native IO APIC MADT entries.
- */
-if ( nr_ioapics > 1 )
-printk("WARNING: found %d IO APICs, Dom0 will only have access to 1 
emulated IO APIC\n",
-   nr_ioapics);
+/* Setup the IO APIC entries. */
 io_apic = (void *)(madt + 1);
-io_apic->header.type = ACPI_MADT_TYPE_IO_APIC;
-io_apic->header.length = sizeof(*io_apic);
-io_apic->id = domain_vioapic(d, 0)->id;
-io_apic->address = VIOAPIC_DEFAULT_BASE_ADDRESS;
+for ( i = 0; i < nr_ioapics; i++ )
+{
+io_apic->header.type = ACPI_MADT_TYPE_IO_APIC;
+io_apic->header.length = sizeof(*io_apic);
+io_apic->id = domain_vioapic(d, i)->id;
+io_apic->address = domain_vioapic(d, i)->base_address;
+io_apic->global_irq_base = io_apic_gsi_base(i);
+io_apic++;
+}
 
-x2apic = (void *)(io_apic + 1);
+x2apic = (void *)io_apic;
 for ( i = 0; i < max_vcpus; i++ )
 {
 x2apic->header.type = ACPI_MADT_TYPE_LOCAL_X2APIC;
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 6c3c944abd..9a9732b308 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -595,6 +595,7 @@ static int hvm_print_line(
 
 int hvm_domain_initialise(struct domain *d)
 {
+unsigned int nr_gsis;
 int rc;
 
 if ( !hvm_enabled )
@@ -616,19 +617,20 @@ int hvm_domain_initialise(struct domain *d)
 if ( rc != 0 )
 goto fail0;
 
+nr_gsis = is_hardware_domain(d) ? nr_irqs_gsi : NR_HVM_DOMU_IRQS;
 d->arch.hvm_domain.pl_time = xzalloc(struct pl_time);
 d->arch.hvm_domain.params = xzalloc_array(uint64_t, HVM_NR_PARAMS);
 d->arch.hvm_domain.io_handler = xzalloc_array(struct hvm_io_handler,
   NR_IO_HANDLERS);
-d->arch.hvm_domain.irq = xzalloc_bytes(hvm_irq_size(NR_HVM_DOMU_IRQS));
+d->arch.hvm_domain.irq = xzalloc_bytes(hvm_irq_size(nr_gsis));
 
 rc = -ENOMEM;
 if ( !d->arch.hvm_domain.pl_time || !d->arch.hvm_domain.irq ||
  !d->arch.hvm_domain.params  || !d->arch.hvm_domain.io_handler )
 goto fail1;
 
-/* Set the default number of GSIs */
-hvm_domain_irq(d)->nr_gsis = NR_HVM_DOMU_IRQS;
+/* Set the number of GSIs */
+hvm_domain_irq(d)->nr_gsis = nr_gsis;
 
 BUILD_BUG_ON(NR_HVM_DOMU_IRQS < NR_ISAIRQS);
 ASSERT(hvm_domain_irq(d)->nr_gsis >= NR_ISAIRQS);
diff --git a/xen/arch/x86/hvm/vioapic.c b/xen/arch/x86/hvm/vioapic.c
index 990ad707ec..02ae58ba5f 100644
--- a/xen/arch/x86/hvm/vioapic.c
+++ b/xen/arch/x86/hvm/vioapic.c
@@ -534,10 +534,20 @@ void vioapic_reset(struct domain *d)
 memset(vioapic, 0, hvm_vioapic_size(nr_pins));
 for ( j = 0; j < nr_pins; j++ )
 vioapic->redirtbl[j].fields.mask = 1;
-ASSERT(!i);
-vioapic->base_address = VIOAPIC_DEFAULT_BASE_ADDRESS +
-VIOAPIC_MEM_LENGTH * i;
-vioapic->id = i;
+
+if ( !is_hardware_domain(d) )
+{
+ASSERT(!i);
+vioapic->base_address = VIOAPIC_DEFAULT_BASE_ADDRESS +
+VIOAPIC_MEM_LENGTH * i;
+vioapic->id = i;
+   

[Xen-devel] [PATCH v3 2/8] x86/hvm: introduce hvm_domain_irq macro

2017-03-29 Thread Roger Pau Monne
Introduce a macro to get a pointer to the hvm_irq for a HVM domain. No
functional change.

Signed-off-by: Roger Pau Monné 
---
Cc: Jan Beulich 
Cc: Andrew Cooper 
Cc: Kevin Tian 
---
Changes since v2:
 - Switch d->arch.hvm_domain.irq.dpci accesses to use the macro also.
---
NB: this is a pre-patch in order to make patch #3 smaller.
---
 xen/arch/x86/hvm/hvm.c|  2 +-
 xen/arch/x86/hvm/irq.c| 30 +++---
 xen/arch/x86/hvm/vioapic.c|  4 ++--
 xen/arch/x86/hvm/vlapic.c |  6 +++---
 xen/arch/x86/physdev.c|  2 +-
 xen/drivers/passthrough/io.c  |  8 
 xen/drivers/passthrough/pci.c |  2 +-
 xen/drivers/passthrough/vtd/x86/vtd.c |  2 +-
 xen/include/asm-x86/hvm/irq.h |  1 +
 9 files changed, 29 insertions(+), 28 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index b6c5c9bf8d..98dede20db 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -457,7 +457,7 @@ void hvm_migrate_pirqs(struct vcpu *v)
 {
 struct domain *d = v->domain;
 
-if ( !iommu_enabled || !d->arch.hvm_domain.irq.dpci )
+if ( !iommu_enabled || !hvm_domain_irq(d)->dpci )
return;
 
 spin_lock(>event_lock);
diff --git a/xen/arch/x86/hvm/irq.c b/xen/arch/x86/hvm/irq.c
index a774ed7450..c2951ccf8a 100644
--- a/xen/arch/x86/hvm/irq.c
+++ b/xen/arch/x86/hvm/irq.c
@@ -60,7 +60,7 @@ static void deassert_irq(struct domain *d, unsigned isa_irq)
 static void __hvm_pci_intx_assert(
 struct domain *d, unsigned int device, unsigned int intx)
 {
-struct hvm_irq *hvm_irq = >arch.hvm_domain.irq;
+struct hvm_irq *hvm_irq = hvm_domain_irq(d);
 unsigned int gsi, link, isa_irq;
 
 ASSERT((device <= 31) && (intx <= 3));
@@ -90,7 +90,7 @@ void hvm_pci_intx_assert(
 static void __hvm_pci_intx_deassert(
 struct domain *d, unsigned int device, unsigned int intx)
 {
-struct hvm_irq *hvm_irq = >arch.hvm_domain.irq;
+struct hvm_irq *hvm_irq = hvm_domain_irq(d);
 unsigned int gsi, link, isa_irq;
 
 ASSERT((device <= 31) && (intx <= 3));
@@ -119,7 +119,7 @@ void hvm_pci_intx_deassert(
 void hvm_isa_irq_assert(
 struct domain *d, unsigned int isa_irq)
 {
-struct hvm_irq *hvm_irq = >arch.hvm_domain.irq;
+struct hvm_irq *hvm_irq = hvm_domain_irq(d);
 unsigned int gsi = hvm_isa_irq_to_gsi(isa_irq);
 
 ASSERT(isa_irq <= 15);
@@ -136,7 +136,7 @@ void hvm_isa_irq_assert(
 void hvm_isa_irq_deassert(
 struct domain *d, unsigned int isa_irq)
 {
-struct hvm_irq *hvm_irq = >arch.hvm_domain.irq;
+struct hvm_irq *hvm_irq = hvm_domain_irq(d);
 unsigned int gsi = hvm_isa_irq_to_gsi(isa_irq);
 
 ASSERT(isa_irq <= 15);
@@ -153,7 +153,7 @@ void hvm_isa_irq_deassert(
 static void hvm_set_callback_irq_level(struct vcpu *v)
 {
 struct domain *d = v->domain;
-struct hvm_irq *hvm_irq = >arch.hvm_domain.irq;
+struct hvm_irq *hvm_irq = hvm_domain_irq(d);
 unsigned int gsi, pdev, pintx, asserted;
 
 ASSERT(v->vcpu_id == 0);
@@ -201,7 +201,7 @@ static void hvm_set_callback_irq_level(struct vcpu *v)
 void hvm_maybe_deassert_evtchn_irq(void)
 {
 struct domain *d = current->domain;
-struct hvm_irq *hvm_irq = >arch.hvm_domain.irq;
+struct hvm_irq *hvm_irq = hvm_domain_irq(d);
 
 if ( hvm_irq->callback_via_asserted &&
  !vcpu_info(d->vcpu[0], evtchn_upcall_pending) )
@@ -230,7 +230,7 @@ void hvm_assert_evtchn_irq(struct vcpu *v)
 
 int hvm_set_pci_link_route(struct domain *d, u8 link, u8 isa_irq)
 {
-struct hvm_irq *hvm_irq = >arch.hvm_domain.irq;
+struct hvm_irq *hvm_irq = hvm_domain_irq(d);
 u8 old_isa_irq;
 int i;
 
@@ -323,7 +323,7 @@ int hvm_inject_msi(struct domain *d, uint64_t addr, 
uint32_t data)
 
 void hvm_set_callback_via(struct domain *d, uint64_t via)
 {
-struct hvm_irq *hvm_irq = >arch.hvm_domain.irq;
+struct hvm_irq *hvm_irq = hvm_domain_irq(d);
 unsigned int gsi=0, pdev=0, pintx=0;
 uint8_t via_type;
 
@@ -486,7 +486,7 @@ void arch_evtchn_inject(struct vcpu *v)
 
 static void irq_dump(struct domain *d)
 {
-struct hvm_irq *hvm_irq = >arch.hvm_domain.irq;
+struct hvm_irq *hvm_irq = hvm_domain_irq(d);
 int i; 
 printk("Domain %d:\n", d->domain_id);
 printk("PCI 0x%16.16"PRIx64"%16.16"PRIx64
@@ -541,7 +541,7 @@ __initcall(dump_irq_info_key_init);
 
 static int irq_save_pci(struct domain *d, hvm_domain_context_t *h)
 {
-struct hvm_irq *hvm_irq = >arch.hvm_domain.irq;
+struct hvm_irq *hvm_irq = hvm_domain_irq(d);
 unsigned int asserted, pdev, pintx;
 int rc;
 
@@ -573,7 +573,7 @@ static int irq_save_pci(struct domain *d, 
hvm_domain_context_t *h)
 
 static int irq_save_isa(struct domain *d, hvm_domain_context_t *h)
 {
-struct hvm_irq *hvm_irq = >arch.hvm_domain.irq;
+struct hvm_irq *hvm_irq = hvm_domain_irq(d);
 
 /* Save ISA 

[Xen-devel] [PATCH v3 1/8] x86/vioapic: expand hvm_vioapic to contain vIO APIC internal state

2017-03-29 Thread Roger Pau Monne
This is required in order to have a variable number of vIO APIC pins, instead
of the current fixed value (48). Note that this patch only expands the fields
of the hvm_vioapic struct, without actually introducing any new fields or
functionality.

The reason to expand the hvm_vioapic structure instead of the hvm_hw_vioapic
one is that the variable number of pins functionality is only going to be used
by the hardware domain, so no modifications are needed to the save format.

Signed-off-by: Roger Pau Monné 
---
Cc: Jan Beulich 
Cc: Andrew Cooper 
---
Changes since v2:
 - Change patch title.
 - Use an unnamed struct to store the vioapic state inside of hvm_vioapic.
 - Use a define to declare the hvm_hw_vioapic struct (and the equivalent
   unnamed struct inside of hvm_vioapic).
 - Remove the BUILD_BUG_ON.

Changes since v1:
 - New in this version.
---
 xen/arch/x86/hvm/vioapic.c | 39 +
 xen/include/asm-x86/hvm/vioapic.h  | 10 ---
 xen/include/public/arch-x86/hvm/save.h | 53 --
 3 files changed, 57 insertions(+), 45 deletions(-)

diff --git a/xen/arch/x86/hvm/vioapic.c b/xen/arch/x86/hvm/vioapic.c
index fdbb21f097..23abdfc4c6 100644
--- a/xen/arch/x86/hvm/vioapic.c
+++ b/xen/arch/x86/hvm/vioapic.c
@@ -42,9 +42,9 @@
 /* HACK: Route IRQ0 only to VCPU0 to prevent time jumps. */
 #define IRQ0_SPECIAL_ROUTING 1
 
-static void vioapic_deliver(struct hvm_hw_vioapic *vioapic, int irq);
+static void vioapic_deliver(struct hvm_vioapic *vioapic, int irq);
 
-static uint32_t vioapic_read_indirect(const struct hvm_hw_vioapic *vioapic)
+static uint32_t vioapic_read_indirect(const struct hvm_vioapic *vioapic)
 {
 uint32_t result = 0;
 
@@ -94,7 +94,7 @@ static int vioapic_read(
 struct vcpu *v, unsigned long addr,
 unsigned int length, unsigned long *pval)
 {
-const struct hvm_hw_vioapic *vioapic = domain_vioapic(v->domain);
+const struct hvm_vioapic *vioapic = domain_vioapic(v->domain);
 uint32_t result;
 
 HVM_DBG_LOG(DBG_LEVEL_IOAPIC, "addr %lx", addr);
@@ -119,7 +119,7 @@ static int vioapic_read(
 }
 
 static void vioapic_write_redirent(
-struct hvm_hw_vioapic *vioapic, unsigned int idx,
+struct hvm_vioapic *vioapic, unsigned int idx,
 int top_word, uint32_t val)
 {
 struct domain *d = vioapic_domain(vioapic);
@@ -170,7 +170,7 @@ static void vioapic_write_redirent(
 }
 
 static void vioapic_write_indirect(
-struct hvm_hw_vioapic *vioapic, uint32_t val)
+struct hvm_vioapic *vioapic, uint32_t val)
 {
 switch ( vioapic->ioregsel )
 {
@@ -215,7 +215,7 @@ static int vioapic_write(
 struct vcpu *v, unsigned long addr,
 unsigned int length, unsigned long val)
 {
-struct hvm_hw_vioapic *vioapic = domain_vioapic(v->domain);
+struct hvm_vioapic *vioapic = domain_vioapic(v->domain);
 
 switch ( addr & 0xff )
 {
@@ -242,7 +242,7 @@ static int vioapic_write(
 
 static int vioapic_range(struct vcpu *v, unsigned long addr)
 {
-struct hvm_hw_vioapic *vioapic = domain_vioapic(v->domain);
+struct hvm_vioapic *vioapic = domain_vioapic(v->domain);
 
 return ((addr >= vioapic->base_address &&
  (addr < vioapic->base_address + VIOAPIC_MEM_LENGTH)));
@@ -255,7 +255,7 @@ static const struct hvm_mmio_ops vioapic_mmio_ops = {
 };
 
 static void ioapic_inj_irq(
-struct hvm_hw_vioapic *vioapic,
+struct hvm_vioapic *vioapic,
 struct vlapic *target,
 uint8_t vector,
 uint8_t trig_mode,
@@ -275,7 +275,7 @@ static inline int pit_channel0_enabled(void)
 return pt_active(>domain->arch.vpit.pt0);
 }
 
-static void vioapic_deliver(struct hvm_hw_vioapic *vioapic, int irq)
+static void vioapic_deliver(struct hvm_vioapic *vioapic, int irq)
 {
 uint16_t dest = vioapic->redirtbl[irq].fields.dest_id;
 uint8_t dest_mode = vioapic->redirtbl[irq].fields.dest_mode;
@@ -361,7 +361,7 @@ static void vioapic_deliver(struct hvm_hw_vioapic *vioapic, 
int irq)
 
 void vioapic_irq_positive_edge(struct domain *d, unsigned int irq)
 {
-struct hvm_hw_vioapic *vioapic = domain_vioapic(d);
+struct hvm_vioapic *vioapic = domain_vioapic(d);
 union vioapic_redir_entry *ent;
 
 ASSERT(has_vioapic(d));
@@ -388,7 +388,7 @@ void vioapic_irq_positive_edge(struct domain *d, unsigned 
int irq)
 
 void vioapic_update_EOI(struct domain *d, u8 vector)
 {
-struct hvm_hw_vioapic *vioapic = domain_vioapic(d);
+struct hvm_vioapic *vioapic = domain_vioapic(d);
 struct hvm_irq *hvm_irq = >arch.hvm_domain.irq;
 union vioapic_redir_entry *ent;
 int gsi;
@@ -426,38 +426,39 @@ void vioapic_update_EOI(struct domain *d, u8 vector)
 
 static int ioapic_save(struct domain *d, hvm_domain_context_t *h)
 {
-struct hvm_hw_vioapic *s = domain_vioapic(d);
+struct hvm_vioapic *s = domain_vioapic(d);
 
 if ( !has_vioapic(d) )
 return 0;
 
-return hvm_save_entry(IOAPIC, 0, h, 

[Xen-devel] [PATCH v3 0/8] x86/vioapic: introduce support for multiple vIO APICs

2017-03-29 Thread Roger Pau Monne
Hello,

This patch series introduce support for having a variable number of entries in
vIO APICs, and also having a variable number of vIO APICs per domain. This
functionality is not used by unprivileged guests, that are still limited to a
single IO APIC with 48 entries.

The functionality introduced is only used by PVHv2 Dom0, in order to copy the
IO APIC topology found on bare metal.

A private osstest flight is currently running against this series.

They can also be found in my personal git tree:

git://xenbits.xen.org/people/royger/xen.git vioapics_v3

Thanks, Roger.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3] dm_op: Add xendevicemodel_modified_memory_bulk.

2017-03-29 Thread Jennifer Herbert



On 29/03/17 11:38, Jan Beulich wrote:

On 28.03.17 at 15:18,  wrote:
Perhaps drop "already"? Personally I also wouldn't mind you dropping 
the variable altogether and using header->opaque directly, but I 
guess that's too "opaque" for your taste? 


It would make the code too opaque for my taste, so I'll just drop the 
'already' bit.



@@ -441,13 +481,8 @@ static int dm_op(domid_t domid,
  struct xen_dm_op_modified_memory *data =
  _memory;
  
-const_op = false;

-
-rc = -EINVAL;
-if ( data->pad )
-break;
-
-rc = modified_memory(d, data);
+rc = modified_memory(d, data, [1]);
+const_op = (rc != 0);

Isn't this wrong now, i.e. don't you need to copy back the
header now in all cases?


I only define what I'll set nr_extents to in case of error, and of 
course opaque

is opaque.
If I where to write back, I'd be writing back 0 to nr_extents - which 
wouldn’t really
mean anything since I’m not defining the order for which I’m processing 
them.
In fact the only thing it tells you is that extent 0 is the last one 
processed, which

I don't think its all that useful.

Ideally I'd prefer to leave it untouched on success, but the original 
value is lost on

continuation, this would be more involved.

By only writing back on error, I hoped to improve efficiency for the 
common case,
(especially for existing use with calls of one extent).  (I know its 
only a small difference)
If you want me to write back - what do you want me to write back for 
success?



Cheers,

-jenny


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [GSoC] GSoC Introduction : Fuzzing Xen hypercall interface

2017-03-29 Thread Felix Schmoll
Hi,

here the final patch for the domain_id:

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 2d97d36c38..1e152c8a07 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -1569,6 +1569,7 @@ int xc_domctl(xc_interface *xch, struct xen_domctl
*domctl);
 int xc_sysctl(xc_interface *xch, struct xen_sysctl *sysctl);

 int xc_version(xc_interface *xch, int cmd, void *arg);
+int xc_domid(xc_interface *xch);

 int xc_flask_op(xc_interface *xch, xen_flask_op_t *op);

diff --git a/tools/libxc/xc_private.c b/tools/libxc/xc_private.c
index 72e6242417..37b11e41a9 100644
--- a/tools/libxc/xc_private.c
+++ b/tools/libxc/xc_private.c
@@ -530,6 +530,12 @@ int xc_version(xc_interface *xch, int cmd, void *arg)
 return rc;
 }

+int xc_domid(xc_interface *xch)
+{
+return xencall0(xch->xcall, __HYPERVISOR_domain_id);
+}
+
+
 unsigned long xc_make_page_below_4G(
 xc_interface *xch, uint32_t domid, unsigned long mfn)
 {
diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
index 614501f761..eddb264f2d 100644
--- a/xen/arch/arm/traps.c
+++ b/xen/arch/arm/traps.c
@@ -1297,6 +1297,7 @@ static arm_hypercall_t arm_hypercall_table[] = {
 HYPERCALL(platform_op, 1),
 HYPERCALL_ARM(vcpu_op, 3),
 HYPERCALL(vm_assist, 2),
+HYPERCALL(domain_id, 0),
 };

 #ifndef NDEBUG
diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
index e7238ce293..3d541e01e1 100644
--- a/xen/arch/x86/hvm/hypercall.c
+++ b/xen/arch/x86/hvm/hypercall.c
@@ -132,6 +132,7 @@ static const hypercall_table_t hvm_hypercall_table[] = {
 COMPAT_CALL(mmuext_op),
 HYPERCALL(xenpmu_op),
 COMPAT_CALL(dm_op),
+HYPERCALL(domain_id),
 HYPERCALL(arch_1)
 };

diff --git a/xen/arch/x86/hypercall.c b/xen/arch/x86/hypercall.c
index e30181817a..184741bf16 100644
--- a/xen/arch/x86/hypercall.c
+++ b/xen/arch/x86/hypercall.c
@@ -67,6 +67,7 @@ const hypercall_args_t
hypercall_args_table[NR_hypercalls] =
 ARGS(tmem_op, 1),
 ARGS(xenpmu_op, 2),
 ARGS(dm_op, 3),
+ARGS(domain_id, 0),
 ARGS(mca, 1),
 ARGS(arch_1, 1),
 };
diff --git a/xen/arch/x86/pv/hypercall.c b/xen/arch/x86/pv/hypercall.c
index 9d29d2f088..f12314b5ca 100644
--- a/xen/arch/x86/pv/hypercall.c
+++ b/xen/arch/x86/pv/hypercall.c
@@ -79,6 +79,7 @@ static const hypercall_table_t pv_hypercall_table[] = {
 #endif
 HYPERCALL(xenpmu_op),
 COMPAT_CALL(dm_op),
+HYPERCALL(domain_id),
 HYPERCALL(mca),
 HYPERCALL(arch_1),
 };
diff --git a/xen/common/kernel.c b/xen/common/kernel.c
index 84618715dc..5107aacd06 100644
--- a/xen/common/kernel.c
+++ b/xen/common/kernel.c
@@ -431,6 +431,12 @@ DO(xen_version)(int cmd, XEN_GUEST_HANDLE_PARAM(void)
arg)
 return -ENOSYS;
 }

+DO(domain_id)(void)
+{
+struct domain *d = current->domain;
+return d->domain_id;
+}
+
 DO(nmi_op)(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 {
 struct xennmi_callback cb;
diff --git a/xen/include/public/xen.h b/xen/include/public/xen.h
index 91ba8bb48e..4ad62aa01b 100644
--- a/xen/include/public/xen.h
+++ b/xen/include/public/xen.h
@@ -121,6 +121,7 @@ DEFINE_XEN_GUEST_HANDLE(xen_ulong_t);
 #define __HYPERVISOR_xc_reserved_op   39 /* reserved for XenClient */
 #define __HYPERVISOR_xenpmu_op40
 #define __HYPERVISOR_dm_op41
+#define __HYPERVISOR_domain_id42 /* custom hypercall */

 /* Architecture-specific hypercall definitions. */
 #define __HYPERVISOR_arch_0   48
diff --git a/xen/include/xen/hypercall.h b/xen/include/xen/hypercall.h
index cc99aea57d..5c7bc6233e 100644
--- a/xen/include/xen/hypercall.h
+++ b/xen/include/xen/hypercall.h
@@ -83,6 +83,9 @@ do_xen_version(
 XEN_GUEST_HANDLE_PARAM(void) arg);

 extern long
+do_domain_id(void);
+
+extern long
 do_console_io(
 int cmd,
 int count,

Felix

2017-03-29 12:41 GMT+02:00 Wei Liu :

> On Wed, Mar 29, 2017 at 07:52:47AM +0200, Felix Schmoll wrote:
> > >
> > > Yes. That would be good.
> > >
> >
> > I'm free every afternoon this week (German time, I suppose you're in
> > Europe), so just let me know at least three hours in advance when you're
> > free
> > to have a chat.
> >
>
> I can do 4-5pm today and tomorrow. Please join #xendevel on freenode.
>
> Wei.
>
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3] xen/arm: alternative: Register re-mapped Xen area as a temporary virtual region

2017-03-29 Thread Julien Grall

Hi Wei,

On 27/03/17 09:40, Wei Chen wrote:

Signed-off-by: Wei Chen 


Thank you for updating the commit message :). With the small change below:

Reviewed-by: Julien Grall 


---
Notes:
This bug will affect the staging, staging-4.8 and stable-4.8 source trees.

---
v2->v3
1. Fix typos.
2. Explain this bug only happening when booting Xen in commit message.

---
 xen/arch/arm/alternative.c | 18 +-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/xen/arch/arm/alternative.c b/xen/arch/arm/alternative.c
index 1d10f51..96859fc 100644
--- a/xen/arch/arm/alternative.c
+++ b/xen/arch/arm/alternative.c
@@ -24,6 +24,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -154,8 +155,12 @@ static int __apply_alternatives_multi_stop(void *unused)
 int ret;
 struct alt_region region;
 mfn_t xen_mfn = _mfn(virt_to_mfn(_start));
-unsigned int xen_order = get_order_from_bytes(_end - _start);
+unsigned int xen_size = _end - _start;


I didn't notice it on the previous reviews. xen_size should technically 
be paddr_t.


It is more for consistency than a real bug as the result _end - _start 
will unlikely ever be > 4GB. I think Stefano should be able to fix on 
commit. So no need to resend the patch.


Cheers,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH RFC] x86/emulate: implement hvmemul_cmpxchg() with an actual CMPXCHG

2017-03-29 Thread Razvan Cojocaru
On 03/29/2017 04:55 PM, Jan Beulich wrote:
 On 28.03.17 at 12:50,  wrote:
>> On 03/28/2017 01:47 PM, Jan Beulich wrote:
>> On 28.03.17 at 12:27,  wrote:
 On 03/28/2017 01:03 PM, Jan Beulich wrote:
 On 28.03.17 at 11:14,  wrote:
>> I'm not sure that the RETRY model is what the guest OS expects. AFAIK, a
>> failed CMPXCHG should happen just once, with the proper registers and ZF
>> set. The guest surely expects neither that the instruction resume until
>> it succeeds, nor that some hidden loop goes on for an undeterminate
>> ammount of time until a CMPXCHG succeeds.
>
> The guest doesn't observe the CMPXCHG failing - RETRY leads to
> the instruction being restarted instead of completed.

 Indeed, but it works differently with hvm_emulate_one_vm_event() where
 RETRY currently would have the instruction be re-executed (properly
 re-executed, not just re-emulated) by the guest.
>>>
>>> Right - see my other reply to Andrew: The function likely would
>>> need to tell apart guest CMPXCHG uses from us using the insn to
>>> carry out the write by some other one. That may involve
>>> adjustments to the memory write logic in x86_emulate() itself, as
>>> the late failure of the comparison then would also need to be
>>> communicated back (via ZF clear) to the guest.
>>
>> Exactly, it would require quite some reworking of x86_emulate().
> 
> I had imagined it to be less intrusive (outside of x86_emulate()),
> but I've now learned why Andrew was able to get rid of
> X86EMUL_CMPXCHG_FAILED - the apparently intended behavior
> was never implemented. Attached a first take at it, which has
> seen smoke testing, but nothing more. The way it ends up being
> I don't think this can reasonably be considered for 4.9 at this
> point in time. (Also Cc-ing Tim for the shadow code changes,
> even if this isn't really a proper patch submission.)

I have this xenstored-related error when trying to build the latest
staging, not sure who this should be forwarded to (hopefully I'm not
spamming):

make -C xenstored install
make[6]: Entering directory `/home/red/work/xen.git/tools/ocaml/xenstored'
rm -f paths.ml.tmp;  printf "let %s = \"%s\";;\n" sbindir /usr/sbin
>>paths.ml.tmp;  printf "let %s = \"%s\";;\n" bindir /usr/bin
>>paths.ml.tmp;  printf "let %s = \"%s\";;\n" libexec /usr/lib/xen
>>paths.ml.tmp;  printf "let %s = \"%s\";;\n" libexec_bin
/usr/lib/xen/bin >>paths.ml.tmp;  printf "let %s = \"%s\";;\n" libdir
/usr/lib >>paths.ml.tmp;  printf "let %s = \"%s\";;\n" sharedir
/usr/share >>paths.ml.tmp;  printf "let %s = \"%s\";;\n" xenfirmwaredir
/usr/lib/xen/boot >>paths.ml.tmp;  printf "let %s = \"%s\";;\n"
xen_config_dir /etc/xen >>paths.ml.tmp;  printf "let %s = \"%s\";;\n"
xen_script_dir /etc/xen/scripts >>paths.ml.tmp;  printf "let %s =
\"%s\";;\n" xen_lock_dir /var/lock >>paths.ml.tmp;  printf "let %s =
\"%s\";;\n" xen_run_dir /var/run/xen >>paths.ml.tmp;  printf "let %s =
\"%s\";;\n" xen_paging_dir /var/lib/xen/xenpaging >>paths.ml.tmp;
printf "let %s = \"%s\";;\n" xen_dump_dir /var/lib/xen/dump
>>paths.ml.tmp;  printf "let %s = \"%s\";;\n" xen_log_dir /var/log/xen
>>paths.ml.tmp;  printf "let %s = \"%s\";;\n" xen_lib_dir /var/lib/xen
>>paths.ml.tmp;  printf "let %s = \"%s\";;\n" xen_run_stored
/var/run/xenstored >>paths.ml.tmp;  if ! cmp -s paths.ml.tmp paths.ml;
then mv -f paths.ml.tmp paths.ml; else rm -f paths.ml.tmp; fi
rm -f _paths.h.tmp;  echo "#define sbindir \"/usr/sbin\""
>>_paths.h.tmp;  echo "#define bindir \"/usr/bin\"" >>_paths.h.tmp;
echo "#define LIBEXEC \"/usr/lib/xen\"" >>_paths.h.tmp;  echo "#define
LIBEXEC_BIN \"/usr/lib/xen/bin\"" >>_paths.h.tmp;  echo "#define libdir
\"/usr/lib\"" >>_paths.h.tmp;  echo "#define SHAREDIR \"/usr/share\""
>>_paths.h.tmp;  echo "#define XENFIRMWAREDIR \"/usr/lib/xen/boot\""
>>_paths.h.tmp;  echo "#define XEN_CONFIG_DIR \"/etc/xen\""
>>_paths.h.tmp;  echo "#define XEN_SCRIPT_DIR \"/etc/xen/scripts\""
>>_paths.h.tmp;  echo "#define XEN_LOCK_DIR \"/var/lock\""
>>_paths.h.tmp;  echo "#define XEN_RUN_DIR \"/var/run/xen\""
>>_paths.h.tmp;  echo "#define XEN_PAGING_DIR
\"/var/lib/xen/xenpaging\"" >>_paths.h.tmp;  echo "#define XEN_DUMP_DIR
\"/var/lib/xen/dump\"" >>_paths.h.tmp;  echo "#define XEN_LOG_DIR
\"/var/log/xen\"" >>_paths.h.tmp;  echo "#define XEN_LIB_DIR
\"/var/lib/xen\"" >>_paths.h.tmp;  echo "#define XEN_RUN_STORED
\"/var/run/xenstored\"" >>_paths.h.tmp;  if ! cmp -s _paths.h.tmp
_paths.h; then mv -f _paths.h.tmp _paths.h; else rm -f _paths.h.tmp; fi
 MLOPTstore.cmx
File "store.ml", line 1:
Error: The files perms.cmi and define.cmi make inconsistent assumptions
   over interface Define
make[6]: *** [store.cmx] Error 2

This happens on "make dist".


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3 0/4] xenstore: rework of transaction handling

2017-03-29 Thread Juergen Gross
Cc-ing Julien, as this series is meant for 4.9.

Juergen

On 28/03/17 18:26, Juergen Gross wrote:
> Rework the transaction handling of xenstored to no longer raise
> conflicts so often.
> 
> This series has been sent for pre-review to some reviewers before as the
> series is related to XSA 206 which has been disclosed only today. So V1
> and V2 have been non-public in order to speed up review process without
> disclosing the XSA.
> 
> Changes in V3:
> - don't always return EAGAIN in case of a failed transaction:
>   it can be ENOMEM or ENOSPC, too.
> 
> Changes in V2:
> - Rebase on top of those patches
> - split patch 1 in two patches as suggested by Ian
> 
> Juergen Gross (4):
>   xenstore: let write_node() and some callers return errno
>   xenstore: undo function rename
>   xenstore: rework of transaction handling
>   xenstore: cleanup tdb.c
> 
>  tools/xenstore/tdb.c   | 439 
> +
>  tools/xenstore/tdb.h   |  22 --
>  tools/xenstore/xenstored_core.c| 173 ++---
>  tools/xenstore/xenstored_core.h|  17 +-
>  tools/xenstore/xenstored_domain.c  |  24 +-
>  tools/xenstore/xenstored_domain.h  |   2 +-
>  tools/xenstore/xenstored_transaction.c | 429 ++--
>  tools/xenstore/xenstored_transaction.h |  18 +-
>  8 files changed, 481 insertions(+), 643 deletions(-)
> 


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] x86/mm: Drop MEM_LOG() and correct some printed information

2017-03-29 Thread Andrew Cooper
On 29/03/17 15:01, Jan Beulich wrote:
 On 29.03.17 at 15:50,  wrote:
>> On 29/03/17 14:06, Jan Beulich wrote:
>> On 29.03.17 at 14:29,  wrote:
 @@ -1068,10 +1073,10 @@ get_page_from_l1e(
  return 0;
  
   could_not_pin:
 -MEM_LOG("Error getting mfn %lx (pfn %lx) from L1 entry %" PRIpte
 -" for l1e_owner=%d, pg_owner=%d",
 -mfn, get_gpfn_from_mfn(mfn),
 -l1e_get_intpte(l1e), l1e_owner->domain_id, 
 pg_owner->domain_id);
 +gdprintk(XENLOG_WARNING, "Error getting mfn %" PRI_mfn " (pfn %" 
 PRI_pfn
 + ") from L1 entry %" PRIpte " for l1e_owner d%d, pg_owner 
 d%d",
 + mfn, get_gpfn_from_mfn(mfn),
 + l1e_get_intpte(l1e), l1e_owner->domain_id, 
 pg_owner->domain_id);
>>> Especially here the wrapping of the format string is rather
>>> unfortunate. Didn't we agree to allow format strings to exceed
>>> the 80 column restriction anyway?
>> It is split at a formatting boundary, which doesn't affect grep-ability.
>>
>> Putting this all on one line is 123 characters, which IMO is too long.
> Hmm, you're right, 123 seems a little excessive.
>
 @@ -1388,7 +1398,7 @@ static int alloc_l1_table(struct page_info *page)
  return 0;
  
   fail:
 -MEM_LOG("Failure in alloc_l1_table: entry %d", i);
 +gdprintk(XENLOG_WARNING, "Failure in alloc_l1_table: entry %d\n", i);
>>> %u (or even %03x; same in alloc_l[234]_table())
>> Actually, "slot %#x" would be clearer here.  I though I fixed the 0x
>> prefix in alloc_l[]_table(), and I am not sure the leading zeroes are
>> helpful.
> I'm not too fussed about the leading zeros, but I do actively
> dislike 0x prefixes except when a message mixes hex and dec
> numbers.

Mixed hex and dec is obviously a problem, but it is also very much a
problem for a number which isn't clear from context how it is
formatted.  *fn are all uniformly formatted as hex everywhere, whereas
slot/entry could easily be either.

>
 @@ -4459,10 +4512,11 @@ int steal_page(
  
   fail:
  spin_unlock(>page_alloc_lock);
 -MEM_LOG("Bad page %lx: ed=%d sd=%d caf=%08lx taf=%" PRtype_info,
 -page_to_mfn(page), d->domain_id,
 -owner ? owner->domain_id : DOMID_INVALID,
 -page->count_info, page->u.inuse.type_info);
 +gdprintk(XENLOG_WARNING, "Bad mfn %" PRI_mfn
 + ": ed=%d sd=%d caf=%08lx taf=%" PRtype_info "\n",
 + page_to_mfn(page), d->domain_id,
 + owner ? owner->domain_id : DOMID_INVALID,
 + page->count_info, page->u.inuse.type_info);
>>> Same here.
>>>
>>> Is this intended for 4.9?
>> At this point, yes.
> In which case you should Cc Julien.

Will do on v2.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2] xen/arm32: Introduce alternative runtime patching

2017-03-29 Thread Julien Grall



On 29/03/17 10:28, Wei Chen wrote:

Hi Julien,

On 2017/3/29 16:40, Julien Grall wrote:

Hi Wei,

On 28/03/2017 08:23, Wei Chen wrote:

diff --git a/xen/include/asm-arm/arm32/insn.h b/xen/include/asm-arm/arm32/insn.h
new file mode 100644
index 000..4cda69e
--- /dev/null
+++ b/xen/include/asm-arm/arm32/insn.h
@@ -0,0 +1,65 @@
+/*
+  * Copyright (C) 2017 ARM Ltd.
+  *
+  * This program is free software; you can redistribute it and/or modify
+  * it under the terms of the GNU General Public License version 2 as
+  * published by the Free Software Foundation.
+  *
+  * This program is distributed in the hope that it will be useful,
+  * but WITHOUT ANY WARRANTY; without even the implied warranty of
+  * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+  * GNU General Public License for more details.
+  *
+  * You should have received a copy of the GNU General Public License
+  * along with this program.  If not, see .
+  */
+#ifndef __ARCH_ARM_ARM32_INSN
+#define __ARCH_ARM_ARM32_INSN
+
+#include 
+
+#define __AARCH32_INSN_FUNCS(abbr, mask, val)   \
+static always_inline bool_t aarch32_insn_is_##abbr(uint32_t code) \
+{ \
+return (code & (mask)) == (val);  \
+}
+
+/*
+ * From ARM DDI 0406C.c Section A8.8.18 and A8.8.25. We can see that
+ * unconditional blx and conditional b have the same value field and imm
+ * length. And from ARM DDI 0406C.c Section A5.7 Table A5-23, we can see
+ * that the blx is the only one unconditional instruction has the same
+ * value with conditional branch instructions. So we define the b and blx
+ * in the same macro to check them at the same time.
+ */


I don't think this is true. The encodings are:
  - b   1010
  - bl  1011
  - blx 101H

where  != 0b. So both helpers (aarch32_insn_is_{b_or_blx,bl})
will recognize the blx instruction depending on the value of bit H.



I think I had made a misunderstanding of the H bit. I always thought
the H bit in ARM instruction set is 0.


Because Xen is only using ARM instructions, blx will always have H = 0.
But this is not what you described in your comment.




That's why I suggested to introduce a new helper checking for blx.



I think that's not enough. Current macro will mask the conditional bits.
So no matter what the value of H bit, the blx will be recognized in
aarch32_insn_is_{b, bl}.

I think we should update the __AARCH32_INSN_FUNCS to cover the cond
bits.

#define __UNCONDITIONAL_INSN(code)   (((code) >> 28) == 0xF)

#define __AARCH32_INSN_FUNCS(abbr, mask, val)   \
static always_inline bool_t aarch32_insn_is_##abbr(uint32_t code) \
{ \
 return !__UNCONDITIONAL_INSN(code) && (code & (mask)) == (val);   \
}

#define __AARCH32_UNCOND_INSN_FUNCS(abbr, mask, val)   \
static always_inline bool_t aarch32_insn_is_##abbr(uint32_t code) \
{ \
 return __UNCONDITIONAL_INSN(code) && (code & (mask)) == (val);   \
}

__AARCH32_UNCOND_INSN_FUNCS(blx,  0x0E00, 0x0A00)


Looking at the code you aarch32_insn_is_* helpers are only used in 
aarch32_insn_is_branch_imm. So why don't you open-code the checks in the 
latter helper?


Cheers,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v1 6/9] spinlock: Introduce spin_lock_cb()

2017-03-29 Thread Boris Ostrovsky
On 03/29/2017 09:47 AM, Boris Ostrovsky wrote:
> On 03/29/2017 06:28 AM, Wei Liu wrote:
>> On Fri, Mar 24, 2017 at 01:05:01PM -0400, Boris Ostrovsky wrote:
>>> While waiting for a lock we may want to periodically run some
>>> code. We could use spin_trylock() but since it doesn't take lock
>>> ticket it may take a long time until the lock is taken.
>>>
>>> Add spin_lock_cb() that allows us to execute a callback while waiting.
>>> Also add spin_lock_kick() that will wake up the waiters.
>>>
>>> Signed-off-by: Boris Ostrovsky 
>>> ---
>>>  xen/common/spinlock.c  |   20 
>>>  xen/include/xen/spinlock.h |3 +++
>>>  2 files changed, 23 insertions(+), 0 deletions(-)
>>>
>>> diff --git a/xen/common/spinlock.c b/xen/common/spinlock.c
>>> index 2a06406..d1de3ca 100644
>>> --- a/xen/common/spinlock.c
>>> +++ b/xen/common/spinlock.c
>>> @@ -129,6 +129,26 @@ static always_inline u16 
>>> observe_head(spinlock_tickets_t *t)
>>>  return read_atomic(>head);
>>>  }
>>>  
>>> +void _spin_lock_cb(spinlock_t *lock, void (*cb)(void *), void *data)
>>> +{
>>> +spinlock_tickets_t tickets = SPINLOCK_TICKET_INC;
>>> +LOCK_PROFILE_VAR;
>>> +
>>> +check_lock(>debug);
>>> +tickets.head_tail = arch_fetch_and_add(>tickets.head_tail,
>>> +   tickets.head_tail);
>>> +while ( tickets.tail != observe_head(>tickets) )
>>> +{
>>> +LOCK_PROFILE_BLOCK;
>>> +if ( cb )
>>> +cb(data);
>>> +arch_lock_relax();
>>> +}
>>> +LOCK_PROFILE_GOT;
>>> +preempt_disable();
>>> +arch_lock_acquire_barrier();
>>> +}
>>> +
>>>  void _spin_lock(spinlock_t *lock)
>> You should be able to use _spin_lock_cb to implement _spin_lock, right?
>
> I did consider this but decided not to do it because we'd be adding a
> few extra instructions and a call on potentially hot path.
>
> (And doing it as a #define of a spin_lock() would make things even worse).

Although declaring _spin_lock_cb() as an inline makes generated assembly
look essentially the same as with current _spin_lock() (I don't care
about the extra check inside the loop since that's a slow path).

So maybe I can indeed do this.

-boris

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] x86/mm: Drop MEM_LOG() and correct some printed information

2017-03-29 Thread Jan Beulich
>>> On 29.03.17 at 15:50,  wrote:
> On 29/03/17 14:06, Jan Beulich wrote:
> On 29.03.17 at 14:29,  wrote:
>>> @@ -1068,10 +1073,10 @@ get_page_from_l1e(
>>>  return 0;
>>>  
>>>   could_not_pin:
>>> -MEM_LOG("Error getting mfn %lx (pfn %lx) from L1 entry %" PRIpte
>>> -" for l1e_owner=%d, pg_owner=%d",
>>> -mfn, get_gpfn_from_mfn(mfn),
>>> -l1e_get_intpte(l1e), l1e_owner->domain_id, 
>>> pg_owner->domain_id);
>>> +gdprintk(XENLOG_WARNING, "Error getting mfn %" PRI_mfn " (pfn %" 
>>> PRI_pfn
>>> + ") from L1 entry %" PRIpte " for l1e_owner d%d, pg_owner d%d",
>>> + mfn, get_gpfn_from_mfn(mfn),
>>> + l1e_get_intpte(l1e), l1e_owner->domain_id, 
>>> pg_owner->domain_id);
>> Especially here the wrapping of the format string is rather
>> unfortunate. Didn't we agree to allow format strings to exceed
>> the 80 column restriction anyway?
> 
> It is split at a formatting boundary, which doesn't affect grep-ability.
> 
> Putting this all on one line is 123 characters, which IMO is too long.

Hmm, you're right, 123 seems a little excessive.

>>> @@ -1388,7 +1398,7 @@ static int alloc_l1_table(struct page_info *page)
>>>  return 0;
>>>  
>>>   fail:
>>> -MEM_LOG("Failure in alloc_l1_table: entry %d", i);
>>> +gdprintk(XENLOG_WARNING, "Failure in alloc_l1_table: entry %d\n", i);
>> %u (or even %03x; same in alloc_l[234]_table())
> 
> Actually, "slot %#x" would be clearer here.  I though I fixed the 0x
> prefix in alloc_l[]_table(), and I am not sure the leading zeroes are
> helpful.

I'm not too fussed about the leading zeros, but I do actively
dislike 0x prefixes except when a message mixes hex and dec
numbers.

>>> @@ -4459,10 +4512,11 @@ int steal_page(
>>>  
>>>   fail:
>>>  spin_unlock(>page_alloc_lock);
>>> -MEM_LOG("Bad page %lx: ed=%d sd=%d caf=%08lx taf=%" PRtype_info,
>>> -page_to_mfn(page), d->domain_id,
>>> -owner ? owner->domain_id : DOMID_INVALID,
>>> -page->count_info, page->u.inuse.type_info);
>>> +gdprintk(XENLOG_WARNING, "Bad mfn %" PRI_mfn
>>> + ": ed=%d sd=%d caf=%08lx taf=%" PRtype_info "\n",
>>> + page_to_mfn(page), d->domain_id,
>>> + owner ? owner->domain_id : DOMID_INVALID,
>>> + page->count_info, page->u.inuse.type_info);
>> Same here.
>>
>> Is this intended for 4.9?
> 
> At this point, yes.

In which case you should Cc Julien.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH RFC] x86/emulate: implement hvmemul_cmpxchg() with an actual CMPXCHG

2017-03-29 Thread Razvan Cojocaru
On 03/29/2017 04:55 PM, Jan Beulich wrote:
 On 28.03.17 at 12:50,  wrote:
>> On 03/28/2017 01:47 PM, Jan Beulich wrote:
>> On 28.03.17 at 12:27,  wrote:
 On 03/28/2017 01:03 PM, Jan Beulich wrote:
 On 28.03.17 at 11:14,  wrote:
>> I'm not sure that the RETRY model is what the guest OS expects. AFAIK, a
>> failed CMPXCHG should happen just once, with the proper registers and ZF
>> set. The guest surely expects neither that the instruction resume until
>> it succeeds, nor that some hidden loop goes on for an undeterminate
>> ammount of time until a CMPXCHG succeeds.
>
> The guest doesn't observe the CMPXCHG failing - RETRY leads to
> the instruction being restarted instead of completed.

 Indeed, but it works differently with hvm_emulate_one_vm_event() where
 RETRY currently would have the instruction be re-executed (properly
 re-executed, not just re-emulated) by the guest.
>>>
>>> Right - see my other reply to Andrew: The function likely would
>>> need to tell apart guest CMPXCHG uses from us using the insn to
>>> carry out the write by some other one. That may involve
>>> adjustments to the memory write logic in x86_emulate() itself, as
>>> the late failure of the comparison then would also need to be
>>> communicated back (via ZF clear) to the guest.
>>
>> Exactly, it would require quite some reworking of x86_emulate().
> 
> I had imagined it to be less intrusive (outside of x86_emulate()),
> but I've now learned why Andrew was able to get rid of
> X86EMUL_CMPXCHG_FAILED - the apparently intended behavior
> was never implemented. Attached a first take at it, which has
> seen smoke testing, but nothing more. The way it ends up being
> I don't think this can reasonably be considered for 4.9 at this
> point in time. (Also Cc-ing Tim for the shadow code changes,
> even if this isn't really a proper patch submission.)

Thanks! I'll give a spin with a modified version of my CMPXCHG patch as
soon as possible.


Razvan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 03/27] ARM: GICv3 ITS: allocate device and collection table

2017-03-29 Thread Andre Przywara
Hi,

On 22/03/17 16:33, Julien Grall wrote:
[ ... ]
  gicv3_dist_init();
 +res = gicv3_its_init();
 +if ( res )
 +printk(XENLOG_WARNING "GICv3: ITS: initialization failed:
 %d\n", res);
>>>
>>> I would have expect a panic here because the ITS subsystem could be half
>>> initialized and it is not safe to continue.
>>
>> OK, let me check what actually happens here if there is no ITS ;-)
> 
> Technically, this message should not happen when there is no ITS because
> it is not mandatory to have one on the platform.
> 
> So this would be an coding error for me.

Having no ITS (node in the DT) would result in a empty host_its_list and
a "0" return, so doesn't raise any issues.

Technically we could cope with one or all ITSes to not initialize (by
not propagating them to any guests).
But for now I can just panic here, I guess, because it should point to
some serious issue.
If there is a use case, we can always add a more relaxed behavior later.

Cheers,
Andre.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH RFC] x86/emulate: implement hvmemul_cmpxchg() with an actual CMPXCHG

2017-03-29 Thread Jan Beulich
>>> On 28.03.17 at 12:50,  wrote:
> On 03/28/2017 01:47 PM, Jan Beulich wrote:
> On 28.03.17 at 12:27,  wrote:
>>> On 03/28/2017 01:03 PM, Jan Beulich wrote:
>>> On 28.03.17 at 11:14,  wrote:
> I'm not sure that the RETRY model is what the guest OS expects. AFAIK, a
> failed CMPXCHG should happen just once, with the proper registers and ZF
> set. The guest surely expects neither that the instruction resume until
> it succeeds, nor that some hidden loop goes on for an undeterminate
> ammount of time until a CMPXCHG succeeds.

 The guest doesn't observe the CMPXCHG failing - RETRY leads to
 the instruction being restarted instead of completed.
>>>
>>> Indeed, but it works differently with hvm_emulate_one_vm_event() where
>>> RETRY currently would have the instruction be re-executed (properly
>>> re-executed, not just re-emulated) by the guest.
>> 
>> Right - see my other reply to Andrew: The function likely would
>> need to tell apart guest CMPXCHG uses from us using the insn to
>> carry out the write by some other one. That may involve
>> adjustments to the memory write logic in x86_emulate() itself, as
>> the late failure of the comparison then would also need to be
>> communicated back (via ZF clear) to the guest.
> 
> Exactly, it would require quite some reworking of x86_emulate().

I had imagined it to be less intrusive (outside of x86_emulate()),
but I've now learned why Andrew was able to get rid of
X86EMUL_CMPXCHG_FAILED - the apparently intended behavior
was never implemented. Attached a first take at it, which has
seen smoke testing, but nothing more. The way it ends up being
I don't think this can reasonably be considered for 4.9 at this
point in time. (Also Cc-ing Tim for the shadow code changes,
even if this isn't really a proper patch submission.)

Jan

x86emul: correctly handle CMPXCHG* comparison failures

If the ->cmpxchg() hook finds a mismatch, we should deal with this the
same as when the "manual" comparison reports a mismatch.

This involves reverting bfce0e62c3 ("x86/emul: Drop
X86EMUL_CMPXCHG_FAILED"), albeit with X86EMUL_CMPXCHG_FAILED now
becoming a value distinct from X86EMUL_RETRY.

In order to not leave mixed code also fully switch affected functions
from paddr_t to intpte_t.

Signed-off-by: Jan Beulich 
---
The code could be further simplified if we could rely on all
->cmpxchg() hooks always using CMPXCHG, but for now we need to cope
with them using plain write (and hence accept the double reads if
CMPXCHG is actually being used).
Note that the patch doesn't address the incorrectness of there not
being a memory write even in the comparison-failed case.

--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -5236,16 +5236,17 @@ static int ptwr_emulated_read(
 
 static int ptwr_emulated_update(
 unsigned long addr,
-paddr_t old,
-paddr_t val,
+intpte_t *p_old,
+intpte_t val,
 unsigned int bytes,
-unsigned int do_cmpxchg,
 struct ptwr_emulate_ctxt *ptwr_ctxt)
 {
 unsigned long mfn;
 unsigned long unaligned_addr = addr;
 struct page_info *page;
 l1_pgentry_t pte, ol1e, nl1e, *pl1e;
+intpte_t old = p_old ? *p_old : 0;
+unsigned int offset = 0;
 struct vcpu *v = current;
 struct domain *d = v->domain;
 int ret;
@@ -5259,28 +5260,30 @@ static int ptwr_emulated_update(
 }
 
 /* Turn a sub-word access into a full-word access. */
-if ( bytes != sizeof(paddr_t) )
+if ( bytes != sizeof(val) )
 {
-paddr_t  full;
-unsigned int rc, offset = addr & (sizeof(paddr_t)-1);
+intpte_t full;
+unsigned int rc;
+
+offset = addr & (sizeof(full) - 1);
 
 /* Align address; read full word. */
-addr &= ~(sizeof(paddr_t)-1);
-if ( (rc = copy_from_user(, (void *)addr, sizeof(paddr_t))) != 0 )
+addr &= ~(sizeof(full) - 1);
+if ( (rc = copy_from_user(, (void *)addr, sizeof(full))) != 0 )
 {
 x86_emul_pagefault(0, /* Read fault. */
-   addr + sizeof(paddr_t) - rc,
+   addr + sizeof(full) - rc,
_ctxt->ctxt);
 return X86EMUL_EXCEPTION;
 }
 /* Mask out bits provided by caller. */
-full &= ~paddr_t)1 << (bytes*8)) - 1) << (offset*8));
+full &= ~intpte_t)1 << (bytes * 8)) - 1) << (offset * 8));
 /* Shift the caller value and OR in the missing bits. */
-val  &= (((paddr_t)1 << (bytes*8)) - 1);
+val  &= (((intpte_t)1 << (bytes * 8)) - 1);
 val <<= (offset)*8;
 val  |= full;
 /* Also fill in missing parts of the cmpxchg old value. */
-old  &= (((paddr_t)1 << (bytes*8)) - 1);
+old  &= (((intpte_t)1 << (bytes * 8)) - 1);
 old <<= (offset)*8;
 old  |= full;
 }
@@ -5302,7 +5305,7 @@ 

Re: [Xen-devel] [PATCH v1 6/9] spinlock: Introduce spin_lock_cb()

2017-03-29 Thread Boris Ostrovsky
On 03/29/2017 06:28 AM, Wei Liu wrote:
> On Fri, Mar 24, 2017 at 01:05:01PM -0400, Boris Ostrovsky wrote:
>> While waiting for a lock we may want to periodically run some
>> code. We could use spin_trylock() but since it doesn't take lock
>> ticket it may take a long time until the lock is taken.
>>
>> Add spin_lock_cb() that allows us to execute a callback while waiting.
>> Also add spin_lock_kick() that will wake up the waiters.
>>
>> Signed-off-by: Boris Ostrovsky 
>> ---
>>  xen/common/spinlock.c  |   20 
>>  xen/include/xen/spinlock.h |3 +++
>>  2 files changed, 23 insertions(+), 0 deletions(-)
>>
>> diff --git a/xen/common/spinlock.c b/xen/common/spinlock.c
>> index 2a06406..d1de3ca 100644
>> --- a/xen/common/spinlock.c
>> +++ b/xen/common/spinlock.c
>> @@ -129,6 +129,26 @@ static always_inline u16 
>> observe_head(spinlock_tickets_t *t)
>>  return read_atomic(>head);
>>  }
>>  
>> +void _spin_lock_cb(spinlock_t *lock, void (*cb)(void *), void *data)
>> +{
>> +spinlock_tickets_t tickets = SPINLOCK_TICKET_INC;
>> +LOCK_PROFILE_VAR;
>> +
>> +check_lock(>debug);
>> +tickets.head_tail = arch_fetch_and_add(>tickets.head_tail,
>> +   tickets.head_tail);
>> +while ( tickets.tail != observe_head(>tickets) )
>> +{
>> +LOCK_PROFILE_BLOCK;
>> +if ( cb )
>> +cb(data);
>> +arch_lock_relax();
>> +}
>> +LOCK_PROFILE_GOT;
>> +preempt_disable();
>> +arch_lock_acquire_barrier();
>> +}
>> +
>>  void _spin_lock(spinlock_t *lock)
> You should be able to use _spin_lock_cb to implement _spin_lock, right?


I did consider this but decided not to do it because we'd be adding a
few extra instructions and a call on potentially hot path.

(And doing it as a #define of a spin_lock() would make things even worse).

-boris


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] x86/mm: Drop MEM_LOG() and correct some printed information

2017-03-29 Thread Andrew Cooper
On 29/03/17 14:06, Jan Beulich wrote:
 On 29.03.17 at 14:29,  wrote:
>>  * Use 0x prefix for otherwise unqualified hex numbers.
> I'm glad this in fact refers to just a single place.
>
>> @@ -1057,10 +1062,10 @@ get_page_from_l1e(
>>  put_page_type(page);
>>  put_page(page);
>>  
>> -MEM_LOG("Error updating mappings for mfn %lx (pfn %lx,"
>> -" from L1 entry %" PRIpte ") for %d",
>> -mfn, get_gpfn_from_mfn(mfn),
>> -l1e_get_intpte(l1e), l1e_owner->domain_id);
>> +gdprintk(XENLOG_WARNING, "Error updating mappings for mfn %" 
>> PRI_mfn
>> + " (pfn %" PRI_pfn ", from L1 entry %" PRIpte ") for 
>> d%d\n",
>> + mfn, get_gpfn_from_mfn(mfn),
>> + l1e_get_intpte(l1e), l1e_owner->domain_id);
>>  return err;
>>  }
>>  }
>> @@ -1068,10 +1073,10 @@ get_page_from_l1e(
>>  return 0;
>>  
>>   could_not_pin:
>> -MEM_LOG("Error getting mfn %lx (pfn %lx) from L1 entry %" PRIpte
>> -" for l1e_owner=%d, pg_owner=%d",
>> -mfn, get_gpfn_from_mfn(mfn),
>> -l1e_get_intpte(l1e), l1e_owner->domain_id, pg_owner->domain_id);
>> +gdprintk(XENLOG_WARNING, "Error getting mfn %" PRI_mfn " (pfn %" PRI_pfn
>> + ") from L1 entry %" PRIpte " for l1e_owner d%d, pg_owner d%d",
>> + mfn, get_gpfn_from_mfn(mfn),
>> + l1e_get_intpte(l1e), l1e_owner->domain_id, 
>> pg_owner->domain_id);
> Especially here the wrapping of the format string is rather
> unfortunate. Didn't we agree to allow format strings to exceed
> the 80 column restriction anyway?

It is split at a formatting boundary, which doesn't affect grep-ability.

Putting this all on one line is 123 characters, which IMO is too long.

>
>> @@ -1388,7 +1398,7 @@ static int alloc_l1_table(struct page_info *page)
>>  return 0;
>>  
>>   fail:
>> -MEM_LOG("Failure in alloc_l1_table: entry %d", i);
>> +gdprintk(XENLOG_WARNING, "Failure in alloc_l1_table: entry %d\n", i);
> %u (or even %03x; same in alloc_l[234]_table())

Actually, "slot %#x" would be clearer here.  I though I fixed the 0x
prefix in alloc_l[]_table(), and I am not sure the leading zeroes are
helpful.

>
>> @@ -1979,7 +1991,8 @@ static int mod_l2_entry(l2_pgentry_t *pl2e,
>>  
>>  if ( unlikely(!is_guest_l2_slot(d, type, pgentry_ptr_to_slot(pl2e))) )
>>  {
>> -MEM_LOG("Illegal L2 update attempt in Xen-private area %p", pl2e);
>> +gdprintk(XENLOG_WARNING,
>> + "Illegal L2 update attempt in Xen-private area %p\n", 
>> pl2e);
> Could you make this message useful at once? The pointer is
> not really helpful to diagnose anything, I think. Same for
> mod_l[34]_entry() then.

Yes - can switch them to slot information.

>
>> @@ -3179,7 +3208,7 @@ long do_mmuext_op(
>>  
>>  if ( unlikely(__copy_from_guest(, uops, 1) != 0) )
>>  {
>> -MEM_LOG("Bad __copy_from_guest");
>> +gdprintk(XENLOG_WARNING, "Bad __copy_from_guest\n");
> I'd suggest to drop this one altogether.

Yeah - I had considered that.  Will drop.

>
>> @@ -3195,7 +3224,8 @@ long do_mmuext_op(
>>  case MMUEXT_UNPIN_TABLE:
>>  break;
>>  default:
>> -MEM_LOG("Invalid extended pt command %#x", op.cmd);
>> +gdprintk(XENLOG_WARNING,
>> + "Invalid extended pt command %#x\n", op.cmd);
> And this one too.

Ok.

>
>> @@ -3297,7 +3329,8 @@ long do_mmuext_op(
>>  page = get_page_from_gfn(pg_owner, op.arg1.mfn, NULL, 
>> P2M_ALLOC);
>>  if ( unlikely(!page) )
>>  {
>> -MEM_LOG("Mfn %lx bad domain", op.arg1.mfn);
>> +gdprintk(XENLOG_WARNING,
>> + "mfn %" PRI_mfn " bad domain\n", op.arg1.mfn);
> Perhaps also include the domain which was supposedly bad?

It is not that simple.  This error message covers both the mfn being
bad, and a good mfn not belonging to the requested domain.  I will see
if I can word something appropriately.

>
>> @@ -3458,7 +3493,8 @@ long do_mmuext_op(
>>  rc = -EPERM;
>>  else if ( unlikely(!cache_flush_permitted(d)) )
>>  {
>> -MEM_LOG("Non-physdev domain tried to FLUSH_CACHE.");
>> +gdprintk(XENLOG_WARNING,
>> + "Non-physdev domain tried to FLUSH_CACHE\n");
>>  rc = -EACCES;
>>  }
>>  else
>> @@ -3484,7 +3520,8 @@ long do_mmuext_op(
>>  }
>>  else
>>  {
>> -MEM_LOG("Non-physdev domain tried to FLUSH_CACHE_GLOBAL");
>> +gdprintk(XENLOG_WARNING,
>> + "Non-physdev domain tried to 
>> FLUSH_CACHE_GLOBAL\n");
>>  rc = -EINVAL;
>>  }
>>

Re: [Xen-devel] [PATCH v9 1/5] x86/ioreq server: Release the p2m lock after mmio is handled.

2017-03-29 Thread Jan Beulich
>>> On 29.03.17 at 15:39,  wrote:
> On Tue, Mar 21, 2017 at 2:52 AM, Yu Zhang  wrote:
>> Routine hvmemul_do_io() may need to peek the p2m type of a gfn to
>> select the ioreq server. For example, operations on gfns with
>> p2m_ioreq_server type will be delivered to a corresponding ioreq
>> server, and this requires that the p2m type not be switched back
>> to p2m_ram_rw during the emulation process. To avoid this race
>> condition, we delay the release of p2m lock in hvm_hap_nested_page_fault()
>> until mmio is handled.
>>
>> Note: previously in hvm_hap_nested_page_fault(), put_gfn() was moved
>> before the handling of mmio, due to a deadlock risk between the p2m
>> lock and the event lock(in commit 77b8dfe). Later, a per-event channel
>> lock was introduced in commit de6acb7, to send events. So we do not
>> need to worry about the deadlock issue.
>>
>> Signed-off-by: Yu Zhang 
>> Reviewed-by: Jan Beulich 
> 
> Who else's ack does this need?  It seems like this is a general
> improvement that can go in without the rest of the series.

I didn't put it in on its own because it didn't really seem useful to
me without the rest of the series.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v9 1/5] x86/ioreq server: Release the p2m lock after mmio is handled.

2017-03-29 Thread George Dunlap
On Tue, Mar 21, 2017 at 2:52 AM, Yu Zhang  wrote:
> Routine hvmemul_do_io() may need to peek the p2m type of a gfn to
> select the ioreq server. For example, operations on gfns with
> p2m_ioreq_server type will be delivered to a corresponding ioreq
> server, and this requires that the p2m type not be switched back
> to p2m_ram_rw during the emulation process. To avoid this race
> condition, we delay the release of p2m lock in hvm_hap_nested_page_fault()
> until mmio is handled.
>
> Note: previously in hvm_hap_nested_page_fault(), put_gfn() was moved
> before the handling of mmio, due to a deadlock risk between the p2m
> lock and the event lock(in commit 77b8dfe). Later, a per-event channel
> lock was introduced in commit de6acb7, to send events. So we do not
> need to worry about the deadlock issue.
>
> Signed-off-by: Yu Zhang 
> Reviewed-by: Jan Beulich 

Who else's ack does this need?  It seems like this is a general
improvement that can go in without the rest of the series.

 -George

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 2/2] Add vlapic timer checks

2017-03-29 Thread Andrew Cooper
On 20/03/17 14:38, Anthony PERARD wrote:
> Signed-off-by: Anthony PERARD 
> ---
>  tests/vlapic-timer/Makefile |   9 +++
>  tests/vlapic-timer/main.c   | 131 
> 
>  2 files changed, 140 insertions(+)
>  create mode 100644 tests/vlapic-timer/Makefile
>  create mode 100644 tests/vlapic-timer/main.c
>
> diff --git a/tests/vlapic-timer/Makefile b/tests/vlapic-timer/Makefile
> new file mode 100644
> index 000..01fa4ea
> --- /dev/null
> +++ b/tests/vlapic-timer/Makefile
> @@ -0,0 +1,9 @@
> +include $(ROOT)/build/common.mk
> +
> +NAME  := vlapic-timer
> +CATEGORY  := functional
> +TEST-ENVS := $(HVM_ENVIRONMENTS)

Do you really need all HVM environments?  APIC emulation doesn't have
any interaction with paging or operating modes.

Speaking of this test, I should really see about upstreaming some of my
ad-hoc lapic tests, which have existed almost as long as XTF has.  I am
also going to have to see about getting the test revision logic working,
because I'd prefer to have one "apic functionaltiy test" than a large
number of individual tests each testing a different part of behaviour.

> +
> +obj-perenv += main.o
> +
> +include $(ROOT)/build/gen.mk
> diff --git a/tests/vlapic-timer/main.c b/tests/vlapic-timer/main.c
> new file mode 100644
> index 000..b081661
> --- /dev/null
> +++ b/tests/vlapic-timer/main.c
> @@ -0,0 +1,131 @@
> +/**
> + * @file tests/vlapic-timer/main.c
> + * @ref test-vlapic-timer - LAPIC Timer Emulation
> + *
> + * @page test-vlapic-timer LAPIC Timer Emulation
> + *
> + * Tests the behavior of the vlapic timer emulation by Xen.
> + *
> + * This tests should work on baremetal.
> + *
> + * It is testing switch between different mode, one-shot and periodic.
> + *
> + * @see tests/vlapic-timer/main.c
> + */
> +#include 

No need for this include.  It will come in via xtf.h

> +#include 
> +#include 
> +
> +const char test_title[] = "Test vlapic-timer";
> +
> +static inline void apic_write(unsigned long reg, uint32_t v)
> +{
> +*((volatile uint32_t *)(APIC_BASE + reg)) = v;
> +}
> +
> +static inline uint32_t apic_read(unsigned long reg)
> +{
> +return *((volatile uint32_t *)(APIC_BASE + reg));
> +}

These should be in an piece of common apic library, along with an
initialisation function to probe and enable the lapic.  (after all,
restricted PVH environments one have one at all).

> +
> +static inline void change_mode(unsigned long new_mode)
> +{
> +uint32_t lvtt;
> +
> +lvtt = apic_read(APIC_LVTT);
> +apic_write(APIC_LVTT, (lvtt & ~APIC_TIMER_MODE_MASK) | new_mode);
> +}
> +
> +void wait_until_tmcct_is_zero(uint32_t initial_count, bool stop_when_half)
> +{
> +uint32_t tmcct = apic_read(APIC_TMCCT);
> +
> +if ( tmcct )
> +{
> +while ( tmcct > (initial_count / 2) )
> +tmcct = apic_read(APIC_TMCCT);
> +
> +if ( stop_when_half )
> +return;
> +
> +/* Wait until the counter reach 0 or wrap-around */
> +while ( tmcct <= (initial_count / 2) && tmcct > 0 )
> +tmcct = apic_read(APIC_TMCCT);
> +}
> +}
> +
> +void test_main(void)
> +{
> +uint32_t tmict = 0x99;
> +
> +apic_write(APIC_TMICT, tmict);
> +/*
> + * Assuming that the initial mode is one-shot, change it to periodic. 
> TMICT
> + * should not be reset.

It is not generally a good idea make assumptions like this.  The only
way it would be safe, is to state that one-shot is the reset value of
the APIC state.

> + */
> +change_mode(APIC_TIMER_MODE_PERIODIC);
> +if ( apic_read(APIC_TMICT) != tmict )
> +xtf_failure("Fail: TMICT value reset\n");
> +
> +/* Testing one-shot */
> +printk("Testing one-shot mode\n");
> +change_mode(APIC_TIMER_MODE_ONESHOT);
> +apic_write(APIC_TMICT, tmict);
> +if ( !apic_read(APIC_TMCCT) )
> +xtf_failure("Fail: TMCCT should have a non-zero value\n");
> +wait_until_tmcct_is_zero(tmict, false);
> +if ( apic_read(APIC_TMCCT) )
> +xtf_failure("Fail: TMCCT should have reached 0\n");

Please spread this logic out with newlines, to make it easier to read.

~Andrew

> +
> +/*
> + * Write TMICT before changing mode from one-shot to periodic TMCCT 
> should
> + * be reset to TMICT periodicly
> + */
> +apic_write(APIC_TMICT, tmict);
> +wait_until_tmcct_is_zero(tmict, true);
> +printk("Testing periodic mode\n");
> +change_mode(APIC_TIMER_MODE_PERIODIC);
> +if ( !apic_read(APIC_TMCCT) )
> +xtf_failure("Fail: TMCCT should have a non-zero value\n");
> +/*
> + * After the change of mode, the counter should not be reset and continue
> + * counting down from where it was
> + */
> +if ( apic_read(APIC_TMCCT) > (tmict / 2) )
> +xtf_failure("Fail: TMCCT should not be reset to TMICT value\n");
> +wait_until_tmcct_is_zero(tmict, false);
> +if ( apic_read(APIC_TMCCT) < (tmict / 2) )
> +

Re: [Xen-devel] [PATCH] x86/mm: Drop MEM_LOG() and correct some printed information

2017-03-29 Thread Jan Beulich
>>> On 29.03.17 at 14:29,  wrote:
>  * Use 0x prefix for otherwise unqualified hex numbers.

I'm glad this in fact refers to just a single place.

> @@ -1057,10 +1062,10 @@ get_page_from_l1e(
>  put_page_type(page);
>  put_page(page);
>  
> -MEM_LOG("Error updating mappings for mfn %lx (pfn %lx,"
> -" from L1 entry %" PRIpte ") for %d",
> -mfn, get_gpfn_from_mfn(mfn),
> -l1e_get_intpte(l1e), l1e_owner->domain_id);
> +gdprintk(XENLOG_WARNING, "Error updating mappings for mfn %" 
> PRI_mfn
> + " (pfn %" PRI_pfn ", from L1 entry %" PRIpte ") for 
> d%d\n",
> + mfn, get_gpfn_from_mfn(mfn),
> + l1e_get_intpte(l1e), l1e_owner->domain_id);
>  return err;
>  }
>  }
> @@ -1068,10 +1073,10 @@ get_page_from_l1e(
>  return 0;
>  
>   could_not_pin:
> -MEM_LOG("Error getting mfn %lx (pfn %lx) from L1 entry %" PRIpte
> -" for l1e_owner=%d, pg_owner=%d",
> -mfn, get_gpfn_from_mfn(mfn),
> -l1e_get_intpte(l1e), l1e_owner->domain_id, pg_owner->domain_id);
> +gdprintk(XENLOG_WARNING, "Error getting mfn %" PRI_mfn " (pfn %" PRI_pfn
> + ") from L1 entry %" PRIpte " for l1e_owner d%d, pg_owner d%d",
> + mfn, get_gpfn_from_mfn(mfn),
> + l1e_get_intpte(l1e), l1e_owner->domain_id, pg_owner->domain_id);

Especially here the wrapping of the format string is rather
unfortunate. Didn't we agree to allow format strings to exceed
the 80 column restriction anyway?

> @@ -1388,7 +1398,7 @@ static int alloc_l1_table(struct page_info *page)
>  return 0;
>  
>   fail:
> -MEM_LOG("Failure in alloc_l1_table: entry %d", i);
> +gdprintk(XENLOG_WARNING, "Failure in alloc_l1_table: entry %d\n", i);

%u (or even %03x; same in alloc_l[234]_table())

> @@ -1979,7 +1991,8 @@ static int mod_l2_entry(l2_pgentry_t *pl2e,
>  
>  if ( unlikely(!is_guest_l2_slot(d, type, pgentry_ptr_to_slot(pl2e))) )
>  {
> -MEM_LOG("Illegal L2 update attempt in Xen-private area %p", pl2e);
> +gdprintk(XENLOG_WARNING,
> + "Illegal L2 update attempt in Xen-private area %p\n", pl2e);

Could you make this message useful at once? The pointer is
not really helpful to diagnose anything, I think. Same for
mod_l[34]_entry() then.

> @@ -3179,7 +3208,7 @@ long do_mmuext_op(
>  
>  if ( unlikely(__copy_from_guest(, uops, 1) != 0) )
>  {
> -MEM_LOG("Bad __copy_from_guest");
> +gdprintk(XENLOG_WARNING, "Bad __copy_from_guest\n");

I'd suggest to drop this one altogether.

> @@ -3195,7 +3224,8 @@ long do_mmuext_op(
>  case MMUEXT_UNPIN_TABLE:
>  break;
>  default:
> -MEM_LOG("Invalid extended pt command %#x", op.cmd);
> +gdprintk(XENLOG_WARNING,
> + "Invalid extended pt command %#x\n", op.cmd);

And this one too.

> @@ -3297,7 +3329,8 @@ long do_mmuext_op(
>  page = get_page_from_gfn(pg_owner, op.arg1.mfn, NULL, P2M_ALLOC);
>  if ( unlikely(!page) )
>  {
> -MEM_LOG("Mfn %lx bad domain", op.arg1.mfn);
> +gdprintk(XENLOG_WARNING,
> + "mfn %" PRI_mfn " bad domain\n", op.arg1.mfn);

Perhaps also include the domain which was supposedly bad?

> @@ -3458,7 +3493,8 @@ long do_mmuext_op(
>  rc = -EPERM;
>  else if ( unlikely(!cache_flush_permitted(d)) )
>  {
> -MEM_LOG("Non-physdev domain tried to FLUSH_CACHE.");
> +gdprintk(XENLOG_WARNING,
> + "Non-physdev domain tried to FLUSH_CACHE\n");
>  rc = -EACCES;
>  }
>  else
> @@ -3484,7 +3520,8 @@ long do_mmuext_op(
>  }
>  else
>  {
> -MEM_LOG("Non-physdev domain tried to FLUSH_CACHE_GLOBAL");
> +gdprintk(XENLOG_WARNING,
> + "Non-physdev domain tried to FLUSH_CACHE_GLOBAL\n");
>  rc = -EINVAL;
>  }
>  break;

I think these could also be dropped (and perhaps a few more right
below here).

> @@ -3734,7 +3779,7 @@ long do_mmu_update(
>  
>  if ( unlikely(__copy_from_guest(, ureqs, 1) != 0) )
>  {
> -MEM_LOG("Bad __copy_from_guest");
> +gdprintk(XENLOG_WARNING, "Bad __copy_from_guest\n");

And this one.

> @@ -4201,7 +4250,8 @@ static int replace_grant_va_mapping(
>  /* Delete pagetable entry. */
>  if ( unlikely(!UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, v, 0)) )
>  {
> -MEM_LOG("Cannot delete PTE entry at %p", (unsigned long *)pl1e);
> +gdprintk(XENLOG_WARNING,
> + "Cannot delete PTE entry at %p\n", (unsigned long *)pl1e);


Re: [Xen-devel] [PATCH 1/2] Import apicdef.h from xen.git

2017-03-29 Thread Andrew Cooper
On 20/03/17 14:38, Anthony PERARD wrote:
> Have only changed the value of APIC_BASE.
>
> Signed-off-by: Anthony PERARD 
> ---
>  arch/x86/include/arch/apicdef.h | 392 
> 
>  1 file changed, 392 insertions(+)
>  create mode 100644 arch/x86/include/arch/apicdef.h
>
> diff --git a/arch/x86/include/arch/apicdef.h b/arch/x86/include/arch/apicdef.h
> new file mode 100644
> index 000..242c45e
> --- /dev/null
> +++ b/arch/x86/include/arch/apicdef.h

Other than what Roger said concerning licensing issue, I'd just
introduce the constants that you currently need.  Most of this content
will never be needed in XTF.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] mm: use heap macro in init_node_heap

2017-03-29 Thread Wei Liu
Cc Julien

On Wed, Mar 29, 2017 at 06:00:08AM -0600, Jan Beulich wrote:
> >>> On 29.03.17 at 13:15,  wrote:
> > --- a/xen/common/page_alloc.c
> > +++ b/xen/common/page_alloc.c
> > @@ -574,7 +574,7 @@ static unsigned long init_node_heap(int node, unsigned 
> > long mfn,
> >  
> >  for ( i = 0; i < NR_ZONES; i++ )
> >  for ( j = 0; j <= MAX_ORDER; j++ )
> > -INIT_PAGE_LIST_HEAD(&(*_heap[node])[i][j]);
> > +INIT_PAGE_LIST_HEAD(&(heap(node, i, j)));
> 
> With the now stray pair of parentheses removed,
> Acked-by: Jan Beulich 
> 
> You should have Cc-ed Julien, assuming you want to get this in now
> rather than after 4.9 was branched off.
> 
> Jan
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH] x86/mm: Drop MEM_LOG() and correct some printed information

2017-03-29 Thread Andrew Cooper
MEM_LOG() is just a thin wrapper around gdprintk(), obscuring some of the
common information.  Inline it, and take the opportunity to correct some of
the printked information.

Some corrections, each where appropriate:
 * Correction of pfn/mfn terms and consistent use of PRI_pfn/mfn.
 * s!I/O!MMIO!
 * Consistently represent domains using d%d notation.
 * Use 0x prefix for otherwise unqualified hex numbers.
 * Remove "ptwr_emulate:" prefix, as the embedded __func__ is already clear.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Wei Liu 
---
 xen/arch/x86/mm.c | 308 --
 1 file changed, 181 insertions(+), 127 deletions(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 4dbd24f..06ef44c 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -127,8 +127,6 @@
 l1_pgentry_t __section(".bss.page_aligned") __aligned(PAGE_SIZE)
 l1_fixmap[L1_PAGETABLE_ENTRIES];
 
-#define MEM_LOG(_f, _a...) gdprintk(XENLOG_WARNING , _f "\n" , ## _a)
-
 /*
  * PTE updates can be done with ordinary writes except:
  *  1. Debug builds get extra checking by using CMPXCHG[8B].
@@ -707,7 +705,8 @@ static int get_page_from_pagenr(unsigned long page_nr, 
struct domain *d)
 
 if ( unlikely(!mfn_valid(_mfn(page_nr))) || unlikely(!get_page(page, d)) )
 {
-MEM_LOG("Could not get page ref for pfn %lx", page_nr);
+gdprintk(XENLOG_WARNING,
+ "Could not get page ref for mfn %"PRI_mfn"\n", page_nr);
 return 0;
 }
 
@@ -771,7 +770,8 @@ get_##level##_linear_pagetable( 
\
 \
 if ( (level##e_get_flags(pde) & _PAGE_RW) ) \
 {   \
-MEM_LOG("Attempt to create linear p.t. with write perms");  \
+gdprintk(XENLOG_WARNING,\
+ "Attempt to create linear p.t. with write perms\n");   \
 return 0;   \
 }   \
 \
@@ -892,7 +892,8 @@ get_page_from_l1e(
 
 if ( unlikely(l1f & l1_disallow_mask(l1e_owner)) )
 {
-MEM_LOG("Bad L1 flags %x", l1f & l1_disallow_mask(l1e_owner));
+gdprintk(XENLOG_WARNING, "Bad L1 flags %x\n",
+ l1f & l1_disallow_mask(l1e_owner));
 return -EINVAL;
 }
 
@@ -913,8 +914,9 @@ get_page_from_l1e(
 {
 if ( mfn != (PADDR_MASK >> PAGE_SHIFT) ) /* INVALID_MFN? */
 {
-MEM_LOG("Non-privileged (%u) attempt to map I/O space %08lx", 
-pg_owner->domain_id, mfn);
+gdprintk(XENLOG_WARNING,
+ "d%d non-privileged attempt to map MMIO space 
%"PRI_mfn"\n",
+ pg_owner->domain_id, mfn);
 return -EPERM;
 }
 return -EINVAL;
@@ -925,9 +927,10 @@ get_page_from_l1e(
 {
 if ( mfn != (PADDR_MASK >> PAGE_SHIFT) ) /* INVALID_MFN? */
 {
-MEM_LOG("Dom%u attempted to map I/O space %08lx in dom%u to 
dom%u",
-curr->domain->domain_id, mfn, pg_owner->domain_id,
-l1e_owner->domain_id);
+gdprintk(XENLOG_WARNING,
+ "d%d attempted to map MMIO space %"PRI_mfn" in d%d to 
d%d\n",
+ curr->domain->domain_id, mfn, pg_owner->domain_id,
+ l1e_owner->domain_id);
 return -EPERM;
 }
 return -EINVAL;
@@ -998,9 +1001,10 @@ get_page_from_l1e(
 if ( (real_pg_owner == NULL) || (pg_owner == l1e_owner) ||
  xsm_priv_mapping(XSM_TARGET, pg_owner, real_pg_owner) )
 {
-MEM_LOG("pg_owner %d l1e_owner %d, but real_pg_owner %d",
-pg_owner->domain_id, l1e_owner->domain_id,
-real_pg_owner?real_pg_owner->domain_id:-1);
+gdprintk(XENLOG_WARNING,
+ "pg_owner d%d l1e_owner d%d, but real_pg_owner d%d\n",
+ pg_owner->domain_id, l1e_owner->domain_id,
+ real_pg_owner ? real_pg_owner->domain_id : -1);
 goto could_not_pin;
 }
 pg_owner = real_pg_owner;
@@ -1019,7 +1023,7 @@ get_page_from_l1e(
 ((l1e_owner == pg_owner) || !paging_mode_external(pg_owner));
 if ( write && !get_page_type(page, PGT_writable_page) )
 {
-MEM_LOG("Could not get page type PGT_writable_page");
+gdprintk(XENLOG_WARNING, "Could not get page type 
PGT_writable_page\n");
 goto 

[Xen-devel] [PATCH v11 5/6] VT-d: introduce update_irte to update irte safely

2017-03-29 Thread Chao Gao
We used structure assignment to update irte which was non-atomic when the
whole IRTE was to be updated. It is unsafe when a interrupt happened during
update. Furthermore, no bug or warning would be reported when this happened.

This patch introduces two variants, atomic and non-atomic, to update
irte. Both variants will update IRTE if possible. If the caller requests a
atomic update but we can't meet it, we raise a bug.

Signed-off-by: Chao Gao 
---
v11:
- Add two variant function to update IRTE. Call the non-atomic one for init
and clear operations. Call the atomic one for other cases.
- Add a new field to indicate the remap_entry associated with msi_desc is
initialized or not.

v10:
- rename copy_irte_to_irt to update_irte
- remove copy_from_to_irt
- change commmit message and add some comments to illustrate on which
condition update_irte() is safe.

 xen/arch/x86/msi.c |  1 +
 xen/drivers/passthrough/vtd/intremap.c | 78 --
 xen/include/asm-x86/msi.h  |  1 +
 3 files changed, 76 insertions(+), 4 deletions(-)

diff --git a/xen/arch/x86/msi.c b/xen/arch/x86/msi.c
index 3374cd4..7ed1243 100644
--- a/xen/arch/x86/msi.c
+++ b/xen/arch/x86/msi.c
@@ -578,6 +578,7 @@ static struct msi_desc *alloc_msi_entry(unsigned int nr)
 entry[nr].dev = NULL;
 entry[nr].irq = -1;
 entry[nr].remap_index = -1;
+entry[nr].remap_entry_initialized = false;
 entry[nr].pi_desc = NULL;
 }
 
diff --git a/xen/drivers/passthrough/vtd/intremap.c 
b/xen/drivers/passthrough/vtd/intremap.c
index b992f23..b7f3cf1 100644
--- a/xen/drivers/passthrough/vtd/intremap.c
+++ b/xen/drivers/passthrough/vtd/intremap.c
@@ -169,10 +169,64 @@ bool_t __init iommu_supports_eim(void)
 return 1;
 }
 
+static void update_irte(struct iremap_entry *entry,
+const struct iremap_entry *new_ire,
+bool atomic)
+{
+if ( cpu_has_cx16 )
+{
+__uint128_t ret;
+struct iremap_entry old_ire;
+
+old_ire = *entry;
+ret = cmpxchg16b(entry, _ire, new_ire);
+
+/*
+ * In the above, we use cmpxchg16 to atomically update the 128-bit
+ * IRTE, and the hardware cannot update the IRTE behind us, so
+ * the return value of cmpxchg16 should be the same as old_ire.
+ * This ASSERT validate it.
+ */
+ASSERT(ret == old_ire.val);
+}
+else
+{
+/*
+ * The following code will update irte atomically if possible.
+ * If the caller requests a atomic update but we can't meet it, 
+ * a bug will be raised.
+ */
+if ( entry->lo == new_ire->lo )
+entry->hi = new_ire->hi;
+else if ( entry->hi == new_ire->hi )
+entry->lo = new_ire->lo;
+else if ( !atomic )
+{
+entry->lo = new_ire->lo;
+entry->hi = new_ire->hi;
+}
+else
+BUG();
+}
+}
+
+static inline void update_irte_non_atomic(struct iremap_entry *entry,
+  const struct iremap_entry *new_ire)
+{
+update_irte(entry, new_ire, false);
+}
+
+static inline void update_irte_atomic(struct iremap_entry *entry,
+  const struct iremap_entry *new_ire)
+{
+update_irte(entry, new_ire, true);
+}
+
+
 /* Mark specified intr remap entry as free */
 static void free_remap_entry(struct iommu *iommu, int index)
 {
-struct iremap_entry *iremap_entry = NULL, *iremap_entries;
+struct iremap_entry *iremap_entry = NULL, *iremap_entries, new_ire = { };
 struct ir_ctrl *ir_ctrl = iommu_ir_ctrl(iommu);
 
 if ( index < 0 || index > IREMAP_ENTRY_NR - 1 )
@@ -183,7 +237,7 @@ static void free_remap_entry(struct iommu *iommu, int index)
 GET_IREMAP_ENTRY(ir_ctrl->iremap_maddr, index,
  iremap_entries, iremap_entry);
 
-memset(iremap_entry, 0, sizeof(*iremap_entry));
+update_irte_non_atomic(iremap_entry, _ire);
 iommu_flush_cache_entry(iremap_entry, sizeof(*iremap_entry));
 iommu_flush_iec_index(iommu, 0, index);
 
@@ -286,6 +340,7 @@ static int ioapic_rte_to_remap_entry(struct iommu *iommu,
 int index;
 unsigned long flags;
 struct ir_ctrl *ir_ctrl = iommu_ir_ctrl(iommu);
+bool init = false;
 
 remap_rte = (struct IO_APIC_route_remap_entry *) old_rte;
 spin_lock_irqsave(_ctrl->iremap_lock, flags);
@@ -296,6 +351,7 @@ static int ioapic_rte_to_remap_entry(struct iommu *iommu,
 index = alloc_remap_entry(iommu, 1);
 if ( index < IREMAP_ENTRY_NR )
 apic_pin_2_ir_idx[apic][ioapic_pin] = index;
+init = true;
 }
 
 if ( index > IREMAP_ENTRY_NR - 1 )
@@ -353,7 +409,11 @@ static int ioapic_rte_to_remap_entry(struct iommu *iommu,
 remap_rte->format = 1;/* indicate remap format */
 }
 
-*iremap_entry = new_ire;
+if ( init )
+

[Xen-devel] [PATCH v11 0/6] VMX: Properly handle pi descriptor and per-cpu blocking list

2017-03-29 Thread Chao Gao
The current VT-d PI related code may operate incorrectly in the 
following scenarios: 
1. When VT-d PI is enabled, neen't migrate pirq which is using VT-d PI during
vCPU migration. Patch [1/6] solves this by introducing a new flag to indicate
that the pt-irq is delivered through VT-d PI.

2. msi_msg_to_remap_entry() is buggy when the live IRTE is in posted format.
It wrongly inherits the 'im' field from the live IRTE but updates all the
other fileds to remapping format. Patch [2/6] handles this.

3. [3/6] is a cleanup patch 

4. When a pCPU is unplugged, and there might be vCPUs on its 
list. Since the pCPU is offline, those vCPUs might not be woken 
up again. [4/6] addresses it. 

5. IRTE is updated through structure assigment which is unsafe in some cases.
To resolve this, Patch [5/6] provides two variants, atomic and non-atomic, to
update IRTE. And a bug is raised when we can't meet the caller's atomic
requirement.

6. We didn't change the IRTE to remapping format when pt-irq is configurated
to have multi-destination vCPUs. Patch [6/6] resolves this problem.


Chao Gao (4):
  passthrough: don't migrate pirq when it is delivered through VT-d PI
  VT-d: Introduce new fields in msi_desc to track binding with guest
interrupt
  VT-d: introduce update_irte to update irte safely
  passthrough/io: Fall back to remapping interrupt when we can't use
VT-d PI

Feng Wu (2):
  VT-d: Some cleanups
  VMX: Fixup PI descriptor when cpu is offline

 xen/arch/x86/hvm/hvm.c |   3 +
 xen/arch/x86/hvm/vmx/vmcs.c|   1 +
 xen/arch/x86/hvm/vmx/vmx.c |  70 ++
 xen/arch/x86/msi.c |   2 +
 xen/drivers/passthrough/io.c   |  71 ++
 xen/drivers/passthrough/vtd/intremap.c | 236 +++--
 xen/include/asm-x86/hvm/vmx/vmx.h  |   1 +
 xen/include/asm-x86/msi.h  |   3 +
 xen/include/xen/hvm/irq.h  |   1 +
 9 files changed, 204 insertions(+), 184 deletions(-)

-- 
1.8.3.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v11 6/6] passthrough/io: Fall back to remapping interrupt when we can't use VT-d PI

2017-03-29 Thread Chao Gao
The current logic of using VT-d pi is when guest configurates the pirq's
destination vcpu to a single vcpu, the according IRTE is updated to
posted format. If the destination of the pirq is multiple vcpus, we will
stay in posted format. Obviously, we should fall back to remapping interrupt
when guest wrongly configurate destination of pirq or makes it have
multi-destination vcpus.

Signed-off-by: Chao Gao 
Reviewed-by: Jan Beulich 
Reviewed-by: Kevin Tian 
---
v11:
- move the code (one line) that allow the parameter 'vcpu' of pi_update_irte()
can be NULL to Patch [2/6].

v10:
- Newly added
 xen/drivers/passthrough/io.c | 9 +
 1 file changed, 1 insertion(+), 8 deletions(-)

diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
index 5dbfe53..93de0c2 100644
--- a/xen/drivers/passthrough/io.c
+++ b/xen/drivers/passthrough/io.c
@@ -412,14 +412,7 @@ int pt_irq_create_bind(
 
 /* Use interrupt posting if it is supported. */
 if ( iommu_intpost )
-{
-if ( vcpu )
-pi_update_irte(vcpu, info, pirq_dpci->gmsi.gvec);
-else
-dprintk(XENLOG_G_INFO,
-"%pv: deliver interrupt in remapping mode,gvec:%02x\n",
-vcpu, pirq_dpci->gmsi.gvec);
-}
+pi_update_irte(vcpu, info, pirq_dpci->gmsi.gvec);
 
 break;
 }
-- 
1.8.3.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v11 1/6] passthrough: don't migrate pirq when it is delivered through VT-d PI

2017-03-29 Thread Chao Gao
When a vCPU migrated to another pCPU, pt irqs binded to this vCPU also needed
migration. When VT-d PI is enabled, interrupt vector will be recorded to
a main memory resident data-structure and a notification whose destination
is decided by NDST is generated. NDST is properly adjusted during vCPU
migration so pirq directly injected to guest needn't be migrated.

This patch adds a indicator, @posted, to show whether the pt irq is delivered
through VT-d PI.

Signed-off-by: Chao Gao 
---
v11:
- rename the indicator to 'posted'
- move setting 'posted' field to event lock un-locked region.

v10:
- Newly added.

 xen/arch/x86/hvm/hvm.c   |  3 +++
 xen/drivers/passthrough/io.c | 62 +---
 xen/include/xen/hvm/irq.h|  1 +
 3 files changed, 16 insertions(+), 50 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 0282986..2d8de16 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -438,6 +438,9 @@ static int hvm_migrate_pirq(struct domain *d, struct 
hvm_pirq_dpci *pirq_dpci,
 struct vcpu *v = arg;
 
 if ( (pirq_dpci->flags & HVM_IRQ_DPCI_MACH_MSI) &&
+ (pirq_dpci->flags & HVM_IRQ_DPCI_GUEST_MSI) &&
+ /* Needn't migrate pirq if this pirq is delivered to guest directly.*/
+ (!pirq_dpci->gmsi.posted) &&
  (pirq_dpci->gmsi.dest_vcpu_id == v->vcpu_id) )
 {
 struct irq_desc *desc =
diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
index 080183e..d53976c 100644
--- a/xen/drivers/passthrough/io.c
+++ b/xen/drivers/passthrough/io.c
@@ -259,52 +259,6 @@ static struct vcpu *vector_hashing_dest(const struct 
domain *d,
 return dest;
 }
 
-/*
- * The purpose of this routine is to find the right destination vCPU for
- * an interrupt which will be delivered by VT-d posted-interrupt. There
- * are several cases as below:
- *
- * - For lowest-priority interrupts, use vector-hashing mechanism to find
- *   the destination.
- * - Otherwise, for single destination interrupt, it is straightforward to
- *   find the destination vCPU and return true.
- * - For multicast/broadcast vCPU, we cannot handle it via interrupt posting,
- *   so return NULL.
- */
-static struct vcpu *pi_find_dest_vcpu(const struct domain *d, uint32_t dest_id,
-  bool_t dest_mode, uint8_t delivery_mode,
-  uint8_t gvec)
-{
-unsigned int dest_vcpus = 0;
-struct vcpu *v, *dest = NULL;
-
-switch ( delivery_mode )
-{
-case dest_LowestPrio:
-return vector_hashing_dest(d, dest_id, dest_mode, gvec);
-case dest_Fixed:
-for_each_vcpu ( d, v )
-{
-if ( !vlapic_match_dest(vcpu_vlapic(v), NULL, APIC_DEST_NOSHORT,
-dest_id, dest_mode) )
-continue;
-
-dest_vcpus++;
-dest = v;
-}
-
-/* For fixed mode, we only handle single-destination interrupts. */
-if ( dest_vcpus == 1 )
-return dest;
-
-break;
-default:
-break;
-}
-
-return NULL;
-}
-
 int pt_irq_create_bind(
 struct domain *d, xen_domctl_bind_pt_irq_t *pt_irq_bind)
 {
@@ -365,6 +319,7 @@ int pt_irq_create_bind(
 {
 uint8_t dest, dest_mode, delivery_mode;
 int dest_vcpu_id;
+const struct vcpu *vcpu;
 
 if ( !(pirq_dpci->flags & HVM_IRQ_DPCI_MAPPED) )
 {
@@ -442,17 +397,24 @@ int pt_irq_create_bind(
 dest_vcpu_id = hvm_girq_dest_2_vcpu_id(d, dest, dest_mode);
 pirq_dpci->gmsi.dest_vcpu_id = dest_vcpu_id;
 spin_unlock(>event_lock);
+
+pirq_dpci->gmsi.posted = false;
+vcpu = (dest_vcpu_id >= 0) ? d->vcpu[dest_vcpu_id] : NULL;
+if ( iommu_intpost && (delivery_mode == dest_LowestPrio) )
+{
+vcpu = vector_hashing_dest(d, dest, dest_mode,
+   pirq_dpci->gmsi.gvec);
+if ( vcpu )
+pirq_dpci->gmsi.posted = true;
+}
 if ( dest_vcpu_id >= 0 )
 hvm_migrate_pirqs(d->vcpu[dest_vcpu_id]);
 
 /* Use interrupt posting if it is supported. */
 if ( iommu_intpost )
 {
-const struct vcpu *vcpu = pi_find_dest_vcpu(d, dest, dest_mode,
-  delivery_mode, pirq_dpci->gmsi.gvec);
-
 if ( vcpu )
-pi_update_irte( vcpu, info, pirq_dpci->gmsi.gvec );
+pi_update_irte(vcpu, info, pirq_dpci->gmsi.gvec);
 else
 dprintk(XENLOG_G_INFO,
 "%pv: deliver interrupt in remapping mode,gvec:%02x\n",
diff --git a/xen/include/xen/hvm/irq.h b/xen/include/xen/hvm/irq.h
index d3f8623..566854a 100644
--- a/xen/include/xen/hvm/irq.h
+++ b/xen/include/xen/hvm/irq.h
@@ -63,6 +63,7 @@ struct hvm_gmsi_info {
 uint32_t gvec;
 uint32_t gflags;
 

[Xen-devel] [PATCH v11 4/6] VMX: Fixup PI descriptor when cpu is offline

2017-03-29 Thread Chao Gao
From: Feng Wu 

When cpu is offline, we need to move all the vcpus in its blocking
list to another online cpu, this patch handles it.

Signed-off-by: Feng Wu 
Signed-off-by: Chao Gao 
Reviewed-by: Jan Beulich 
Acked-by: Kevin Tian 
---
 xen/arch/x86/hvm/vmx/vmcs.c   |  1 +
 xen/arch/x86/hvm/vmx/vmx.c| 70 +++
 xen/include/asm-x86/hvm/vmx/vmx.h |  1 +
 3 files changed, 72 insertions(+)

diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index 934674c..99c77b9 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -591,6 +591,7 @@ void vmx_cpu_dead(unsigned int cpu)
 vmx_free_vmcs(per_cpu(vmxon_region, cpu));
 per_cpu(vmxon_region, cpu) = 0;
 nvmx_cpu_dead(cpu);
+vmx_pi_desc_fixup(cpu);
 }
 
 int vmx_cpu_up(void)
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index d201956..25f9ec9 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -199,6 +199,76 @@ static void vmx_pi_do_resume(struct vcpu *v)
 vmx_pi_unblock_vcpu(v);
 }
 
+void vmx_pi_desc_fixup(unsigned int cpu)
+{
+unsigned int new_cpu, dest;
+unsigned long flags;
+struct arch_vmx_struct *vmx, *tmp;
+spinlock_t *new_lock, *old_lock = _cpu(vmx_pi_blocking, cpu).lock;
+struct list_head *blocked_vcpus = _cpu(vmx_pi_blocking, cpu).list;
+
+if ( !iommu_intpost )
+return;
+
+/*
+ * We are in the context of CPU_DEAD or CPU_UP_CANCELED notification,
+ * and it is impossible for a second CPU go down in parallel. So we
+ * can safely acquire the old cpu's lock and then acquire the new_cpu's
+ * lock after that.
+ */
+spin_lock_irqsave(old_lock, flags);
+
+list_for_each_entry_safe(vmx, tmp, blocked_vcpus, pi_blocking.list)
+{
+/*
+ * Suppress notification or we may miss an interrupt when the
+ * target cpu is dying.
+ */
+pi_set_sn(>pi_desc);
+
+/*
+ * Check whether a notification is pending before doing the
+ * movement, if that is the case we need to wake up it directly
+ * other than moving it to the new cpu's list.
+ */
+if ( pi_test_on(>pi_desc) )
+{
+list_del(>pi_blocking.list);
+vmx->pi_blocking.lock = NULL;
+vcpu_unblock(container_of(vmx, struct vcpu, arch.hvm_vmx));
+}
+else
+{
+/*
+ * We need to find an online cpu as the NDST of the PI descriptor, 
it
+ * doesn't matter whether it is within the cpupool of the domain or
+ * not. As long as it is online, the vCPU will be woken up once the
+ * notification event arrives.
+ */
+new_cpu = cpumask_any(_online_map);
+new_lock = _cpu(vmx_pi_blocking, new_cpu).lock;
+
+spin_lock(new_lock);
+
+ASSERT(vmx->pi_blocking.lock == old_lock);
+
+dest = cpu_physical_id(new_cpu);
+write_atomic(>pi_desc.ndst,
+ x2apic_enabled ? dest : MASK_INSR(dest, 
PI_xAPIC_NDST_MASK));
+
+list_move(>pi_blocking.list,
+  _cpu(vmx_pi_blocking, new_cpu).list);
+vmx->pi_blocking.lock = new_lock;
+
+spin_unlock(new_lock);
+}
+
+pi_clear_sn(>pi_desc);
+}
+
+spin_unlock_irqrestore(old_lock, flags);
+}
+
 /*
  * To handle posted interrupts correctly, we need to set the following
  * state:
diff --git a/xen/include/asm-x86/hvm/vmx/vmx.h 
b/xen/include/asm-x86/hvm/vmx/vmx.h
index 2b781ab..5ead57c 100644
--- a/xen/include/asm-x86/hvm/vmx/vmx.h
+++ b/xen/include/asm-x86/hvm/vmx/vmx.h
@@ -597,6 +597,7 @@ void free_p2m_hap_data(struct p2m_domain *p2m);
 void p2m_init_hap_data(struct p2m_domain *p2m);
 
 void vmx_pi_per_cpu_init(unsigned int cpu);
+void vmx_pi_desc_fixup(unsigned int cpu);
 
 void vmx_pi_hooks_assign(struct domain *d);
 void vmx_pi_hooks_deassign(struct domain *d);
-- 
1.8.3.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


  1   2   >