[Xen-devel] [linux-linus test] 148949: regressions - trouble: fail/pass/starved

2020-03-24 Thread osstest service owner
flight 148949 linux-linus real [real]
http://logs.test-lab.xenproject.org/osstest/logs/148949/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-amd64-qemuu-nested-intel 17 debian-hvm-install/l1/l2 fail REGR. vs. 
133580

Regressions which are regarded as allowable (not blocking):
 test-armhf-armhf-xl-rtds16 guest-start/debian.repeat fail REGR. vs. 133580

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemut-win7-amd64 17 guest-stopfail like 133580
 test-armhf-armhf-libvirt 14 saverestore-support-checkfail  like 133580
 test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stopfail like 133580
 test-amd64-amd64-xl-qemut-ws16-amd64 17 guest-stopfail like 133580
 test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop fail like 133580
 test-amd64-i386-xl-qemut-win7-amd64 17 guest-stop fail like 133580
 test-armhf-armhf-libvirt-raw 13 saverestore-support-checkfail  like 133580
 test-amd64-amd64-xl-qemuu-ws16-amd64 17 guest-stopfail like 133580
 test-amd64-i386-xl-pvshim12 guest-start  fail   never pass
 test-amd64-amd64-dom0pvh-xl-intel 15 guest-saverestore fail never pass
 test-amd64-amd64-libvirt 13 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  13 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  14 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2  fail never pass
 test-arm64-arm64-xl-thunderx 13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  14 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 13 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 14 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-rtds 13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 14 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  13 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 13 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 14 saverestore-support-checkfail never pass
 test-armhf-armhf-libvirt-raw 12 migrate-support-checkfail   never pass
 test-amd64-i386-xl-qemuu-ws16-amd64 17 guest-stop  fail never pass
 test-amd64-i386-xl-qemut-ws16-amd64 17 guest-stop  fail never pass
 test-amd64-amd64-dom0pvh-xl-amd  2 hosts-allocate   starved  n/a

version targeted for testing:
 linux979e52ca0469fb38646bc51d26a0263a740c9f03
baseline version:
 linux736706bee3298208343a76096370e4f6a5c55915

Last test of basis   133580  2019-03-04 19:53:09 Z  386 days
Failing since

[Xen-devel] [seabios test] 148952: regressions - FAIL

2020-03-24 Thread osstest service owner
flight 148952 seabios real [real]
http://logs.test-lab.xenproject.org/osstest/logs/148952/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-amd64-qemuu-nested-intel 17 debian-hvm-install/l1/l2 fail REGR. vs. 
148666

Tests which did not succeed, but are not blocking:
 test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop fail like 148666
 test-amd64-amd64-xl-qemuu-ws16-amd64 17 guest-stopfail like 148666
 test-amd64-i386-xl-qemuu-ws16-amd64 17 guest-stop fail like 148666
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2  fail never pass
 test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stop  fail starved in 148666

version targeted for testing:
 seabios  de88a9628426e82f1cee4b61b06e67e6787301b1
baseline version:
 seabios  066a9956097b54530888b88ab9aa1ea02e42af5a

Last test of basis   148666  2020-03-17 13:39:45 Z7 days
Failing since148690  2020-03-18 06:43:59 Z6 days9 attempts
Testing same since   148794  2020-03-20 23:39:57 Z4 days5 attempts


People who touched revisions under test:
  Gerd Hoffmann 
  Matt DeVillier 
  Paul Menzel 

jobs:
 build-amd64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm   pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsmpass
 test-amd64-amd64-xl-qemuu-debianhvm-i386-xsm pass
 test-amd64-i386-xl-qemuu-debianhvm-i386-xsm  pass
 test-amd64-amd64-qemuu-nested-amdfail
 test-amd64-i386-qemuu-rhel6hvm-amd   pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-i386-xl-qemuu-debianhvm-amd64 pass
 test-amd64-amd64-xl-qemuu-win7-amd64 fail
 test-amd64-i386-xl-qemuu-win7-amd64  fail
 test-amd64-amd64-xl-qemuu-ws16-amd64 fail
 test-amd64-i386-xl-qemuu-ws16-amd64  fail
 test-amd64-amd64-xl-qemuu-dmrestrict-amd64-dmrestrictpass
 test-amd64-i386-xl-qemuu-dmrestrict-amd64-dmrestrict pass
 test-amd64-amd64-qemuu-nested-intel  fail
 test-amd64-i386-qemuu-rhel6hvm-intel pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64-shadow pass
 test-amd64-i386-xl-qemuu-debianhvm-amd64-shadow  pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Not pushing.


commit de88a9628426e82f1cee4b61b06e67e6787301b1
Author: Paul Menzel 
Date:   Wed Mar 4 14:51:27 2020 +0100

std/tcg: Replace zero-length array with flexible-array member

GCC 10 gives the warnings below:

In file included from out/ccode32flat.o.tmp.c:54:
./src/tcgbios.c: In function 'tpm20_write_EfiSpecIdEventStruct':
./src/tcgbios.c:290:30: warning: array subscript '() + 
4294967295' is outside the bounds of an interior zero-length array 'struct 
TCG_EfiSpecIdEventAlgorithmSize[0]' [-Wzero-length-bounds]
  290 | event.hdr.digestSizes[count].algorithmId = 
be16_to_cpu(sel->hashAlg);
  | ~^~~
In file included from ./src/tcgbios.c:22,
 from out/ccode32flat.o.tmp.c:54:
./src/std/tcg.h:527:7: note: while referencing 'digestSizes'
  527 | } digestSizes[0];
  |   ^~~
In file included from out/ccode32flat.o.tmp.c:54:

[Xen-devel] [qemu-mainline test] 148937: regressions - trouble: fail/pass/starved

2020-03-24 Thread osstest service owner
flight 148937 qemu-mainline real [real]
http://logs.test-lab.xenproject.org/osstest/logs/148937/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-amd64-xl-qemuu-dmrestrict-amd64-dmrestrict 10 debian-hvm-install 
fail REGR. vs. 144861
 test-amd64-amd64-xl-qemuu-win7-amd64 10 windows-install  fail REGR. vs. 144861
 test-amd64-amd64-xl-qemuu-ovmf-amd64 10 debian-hvm-install fail REGR. vs. 
144861
 test-amd64-i386-xl-qemuu-debianhvm-amd64 10 debian-hvm-install fail REGR. vs. 
144861
 test-amd64-amd64-qemuu-nested-amd 10 debian-hvm-install  fail REGR. vs. 144861
 test-amd64-i386-xl-qemuu-dmrestrict-amd64-dmrestrict 10 debian-hvm-install 
fail REGR. vs. 144861
 test-amd64-i386-freebsd10-i386 11 guest-startfail REGR. vs. 144861
 test-amd64-i386-qemuu-rhel6hvm-intel 10 redhat-install   fail REGR. vs. 144861
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 10 debian-hvm-install fail 
REGR. vs. 144861
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 10 debian-hvm-install fail 
REGR. vs. 144861
 test-amd64-i386-xl-qemuu-debianhvm-i386-xsm 10 debian-hvm-install fail REGR. 
vs. 144861
 test-amd64-amd64-xl-qemuu-debianhvm-i386-xsm 10 debian-hvm-install fail REGR. 
vs. 144861
 test-amd64-amd64-xl-qemuu-ws16-amd64 10 windows-install  fail REGR. vs. 144861
 test-amd64-amd64-xl-qemuu-debianhvm-amd64 10 debian-hvm-install fail REGR. vs. 
144861
 test-amd64-i386-xl-qemuu-ws16-amd64 10 windows-install   fail REGR. vs. 144861
 test-amd64-i386-qemuu-rhel6hvm-amd 10 redhat-install fail REGR. vs. 144861
 test-amd64-i386-libvirt-pair 21 guest-start/debian   fail REGR. vs. 144861
 test-amd64-amd64-libvirt 12 guest-start  fail REGR. vs. 144861
 test-amd64-i386-xl-qemuu-win7-amd64 10 windows-install   fail REGR. vs. 144861
 test-amd64-amd64-libvirt-xsm 12 guest-start  fail REGR. vs. 144861
 test-amd64-i386-xl-qemuu-debianhvm-amd64-shadow 10 debian-hvm-install fail 
REGR. vs. 144861
 test-amd64-amd64-qemuu-nested-intel 10 debian-hvm-install fail REGR. vs. 144861
 test-amd64-i386-xl-qemuu-ovmf-amd64 10 debian-hvm-install fail REGR. vs. 144861
 test-amd64-amd64-xl-qemuu-debianhvm-amd64-shadow 10 debian-hvm-install fail 
REGR. vs. 144861
 test-amd64-i386-freebsd10-amd64 11 guest-start   fail REGR. vs. 144861
 test-amd64-amd64-libvirt-pair 21 guest-start/debian  fail REGR. vs. 144861
 test-amd64-i386-libvirt-xsm  12 guest-start  fail REGR. vs. 144861
 test-amd64-i386-libvirt  12 guest-start  fail REGR. vs. 144861
 test-arm64-arm64-libvirt-xsm 12 guest-start  fail REGR. vs. 144861
 test-armhf-armhf-libvirt 12 guest-start  fail REGR. vs. 144861
 test-amd64-amd64-libvirt-vhd 10 debian-di-installfail REGR. vs. 144861
 test-amd64-amd64-xl-qcow210 debian-di-installfail REGR. vs. 144861
 test-armhf-armhf-xl-vhd  10 debian-di-installfail REGR. vs. 144861
 test-armhf-armhf-libvirt-raw 10 debian-di-installfail REGR. vs. 144861

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-dom0pvh-xl-intel 17 guest-saverestore.2 fail baseline untested
 test-amd64-i386-xl-pvshim12 guest-start  fail   never pass
 test-arm64-arm64-xl-seattle  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 13 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 14 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-credit2  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 14 

[Xen-devel] [xen-unstable-smoke test] 148983: tolerable all pass - PUSHED

2020-03-24 Thread osstest service owner
flight 148983 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/148983/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt 13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  14 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  0537d246f8db3ac0a1df2ce653b07e85cd887962
baseline version:
 xen  3ec1296ad3a823609eec479cb6c7ee493f6a888b

Last test of basis   148966  2020-03-24 10:00:45 Z0 days
Testing same since   148983  2020-03-24 17:00:40 Z0 days1 attempts


People who touched revisions under test:
  Juergen Gross 
  Julien Grall 
  Paul Durrant 
  Tamas K Lengyel 

jobs:
 build-arm64-xsm  pass
 build-amd64  pass
 build-armhf  pass
 build-amd64-libvirt  pass
 test-armhf-armhf-xl  pass
 test-arm64-arm64-xl-xsm  pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/xen.git
   3ec1296ad3..0537d246f8  0537d246f8db3ac0a1df2ce653b07e85cd887962 -> smoke



Re: [Xen-devel] Moving Forward on XenSummit

2020-03-24 Thread Rich Persaud
On Mar 24, 2020, at 14:03, George Dunlap  wrote:
> 
> I wanted to let everyone know that the XenProject is moving forward with 
> plans to hold XenSummit this year, one way or another.
> 
> There are two basic approaches the Advisory Board has been considering:  
> Postponing the even until later in the year, or holding a virtual event 
> during the same timeframe.  Additionally, if we hold a virtual event during 
> the same timeframe, the Board wants to keep the option open of having a 
> smaller, in-person event later in the year, if circumstances permit.

Due to variation in scope/timing of geo and company restrictions on travel, 
could some speakers present remotely for the in-person event?  

Could the Xen Summit CFP be re-opened for those who can present virtually, who 
may not have submitted due to travel restrictions?

Rich




Re: [Xen-devel] [PATCH] Revert "domctl: improve locking during domain destruction"

2020-03-24 Thread Julien Grall




On 24/03/2020 15:21, Hongyan Xia wrote:

From: Hongyan Xia 

Unfortunately, even though that commit dropped the domctl lock and
allowed other domctl to continue, it created severe lock contention
within domain destructions themselves. Multiple domain destructions in
parallel now spin for the global heap lock when freeing memory and could
spend a long time before the next hypercall continuation. In contrast,
after dropping that commit, parallel domain destructions will just fail
to take the domctl lock, creating a hypercall continuation and backing
off immediately, allowing the thread that holds the lock to destroy a
domain much more quickly and allowing backed-off threads to process
events and irqs.

On a 144-core server with 4TiB of memory, destroying 32 guests (each
with 4 vcpus and 122GiB memory) simultaneously takes:

before the revert: 29 minutes
after the revert: 6 minutes

This is timed between the first page and the very last page of all 32
guests is released back to the heap.

This reverts commit 228ab9992ffb1d8f9d2475f2581e68b2913acb88.

Signed-off-by: Hongyan Xia 


Reviewed-by: Julien Grall 


---
  xen/common/domain.c | 11 +--
  xen/common/domctl.c |  5 +
  2 files changed, 2 insertions(+), 14 deletions(-)

diff --git a/xen/common/domain.c b/xen/common/domain.c
index b4eb476a9c..7b02f5ead7 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -698,20 +698,11 @@ int domain_kill(struct domain *d)
  if ( d == current->domain )
  return -EINVAL;
  
-/* Protected by d->domain_lock. */

+/* Protected by domctl_lock. */
  switch ( d->is_dying )
  {
  case DOMDYING_alive:
-domain_unlock(d);
  domain_pause(d);
-domain_lock(d);
-/*
- * With the domain lock dropped, d->is_dying may have changed. Call
- * ourselves recursively if so, which is safe as then we won't come
- * back here.
- */
-if ( d->is_dying != DOMDYING_alive )
-return domain_kill(d);
  d->is_dying = DOMDYING_dying;
  argo_destroy(d);
  evtchn_destroy(d);
diff --git a/xen/common/domctl.c b/xen/common/domctl.c
index a69b3b59a8..e010079203 100644
--- a/xen/common/domctl.c
+++ b/xen/common/domctl.c
@@ -571,14 +571,11 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) 
u_domctl)
  break;
  
  case XEN_DOMCTL_destroydomain:

-domctl_lock_release();
-domain_lock(d);
  ret = domain_kill(d);
-domain_unlock(d);
  if ( ret == -ERESTART )
  ret = hypercall_create_continuation(
  __HYPERVISOR_domctl, "h", u_domctl);
-goto domctl_out_unlock_domonly;
+break;
  
  case XEN_DOMCTL_setnodeaffinity:

  {



--
Julien Grall



Re: [Xen-devel] [PATCH] Revert "domctl: improve locking during domain destruction"

2020-03-24 Thread Julien Grall




On 24/03/2020 16:13, Jan Beulich wrote:

On 24.03.2020 16:21, Hongyan Xia wrote:

From: Hongyan Xia 
In contrast,
after dropping that commit, parallel domain destructions will just fail
to take the domctl lock, creating a hypercall continuation and backing
off immediately, allowing the thread that holds the lock to destroy a
domain much more quickly and allowing backed-off threads to process
events and irqs.

On a 144-core server with 4TiB of memory, destroying 32 guests (each
with 4 vcpus and 122GiB memory) simultaneously takes:

before the revert: 29 minutes
after the revert: 6 minutes


This wants comparing against numbers demonstrating the bad effects of
the global domctl lock. Iirc they were quite a bit higher than 6 min,
perhaps depending on guest properties.


Your original commit message doesn't contain any clue in which cases the 
domctl lock was an issue. So please provide information on the setups 
you think it will make it worse.


Cheers,

--
Julien Grall



Re: [Xen-devel] [PATCH 2/2] xen/mm: Introduce PGC_state_uninitialised

2020-03-24 Thread Julien Grall




Hi David,

On 24/03/2020 17:55, David Woodhouse wrote:

On Tue, 2020-03-24 at 10:08 +, Julien Grall wrote:

Hi David,

On 23/03/2020 10:55, David Woodhouse wrote:

On Mon, 2020-03-23 at 09:34 +, Julien Grall wrote:

For liveupdate, we will need a way to initialize a page but mark it as
already inuse (i.e in the same state as they would be if allocated
normally).


I am unconvinced of the veracity of this claim.

We don't want to turn specific details of the current Xen buddy
allocator part into of the implicit ABI of live update. That goes for
the power-of-two zone boundaries, amongst other things.


Why would you to do that? Marking the page as already used is no
different to "PGC_state_unitialized" except the "struct page_info" and
the internal of the buddy allocator would be properly setup for start
rather than at free.


The internals of the buddy allocator *cannot* be set up properly for a
page which it would not actually give out — like the zero page.

We *could* do some tricks to allocate the zone structures for zones
which *do* exist but contain only "pre-allocated" pages so the buddy
allocator has never seen those zones... yet.



I think using "PGC_state_unitialised" for preserved page is an abuse. I
understand this is existing in other part of Xen (particularly on x86),
but I would rather not try to add more.



I am perfectly happy to try avoiding PGC_state_uninitialised for pages
which are "in use" at boot time due to live update.

All I insist on is that you explicitly describe the ABI constraints
that it imposes, if any.


Agreed.



For example, that hack which stops the buddy allocator from giving out
page zero: Could we have live updated from a Xen without that hack, to
a Xen which has it? With page zero already given out to a domain?


The buddy allocator could never have given out page 0 on x86 because it 
is part of the first MB of the RAM (see arch_init_memory() in 
arch/x86/mm.c). AFAIK, the first MB cannot be freed..


The change in the buddy allocator was to address the Arm side and also 
make clear this was a problem this is a weakness of the allocator.



What's yours? How would we cope with a situation like that, over LU?


When do you expect the pages to be carved out from the buddy allocator?

I can see only two situations:
1) Workaround a bug in the allocator.
2) A CPU errata requiring to not use memory.

In the case of 1), it is still safe for a domain to run with that page. 
However, we don't want to give it back to the page allocator. A solution 
is to mark them as "offlining/broken". Alternatively, you could remove 
the swap the page (see more below).


In the case of 2), it is not safe for a domain to run with that page. So 
it is probably best to swap the pages with a new one. For HVM domain 
(including the P2M), it should be easy. For PV domain, I am not entirely 
sure if that's feasible.


Cheers,

--
Julien Grall



[Xen-devel] Moving Forward on XenSummit

2020-03-24 Thread George Dunlap
I wanted to let everyone know that the XenProject is moving forward with plans 
to hold XenSummit this year, one way or another.

There are two basic approaches the Advisory Board has been considering:  
Postponing the even until later in the year, or holding a virtual event during 
the same timeframe.  Additionally, if we hold a virtual event during the same 
timeframe, the Board wants to keep the option open of having a smaller, 
in-person event later in the year, if circumstances permit.

Because the University of Bucharest has been very flexible, there is no rush to 
make a decision.  As a result, the Advisory Board has recommended that we spend 
time looking into the options in detail, and make a final decision around 
mid-April, 6 weeks before the originally scheduled event.

(As a side effect, the event webpage will have dates and places for the 
schedule as though we were still holding the event in Bucharest.  These will be 
updated when we know what we’re planning to do instead.)

# Physical and Virtual

The XenSummit is an important event for our community.  Some visible things 
that happen include:

* To allow members of the community to communicate to everyone else what 
they've been working on in the previous year, and what they plan to work on in 
the future

* To allow people to hash out technical issues in design sessions

Just as critical, the XenSummit allows an innumerable number of small 
"hallway-track" conversations, as well as plain social interaction -- filling 
out email addresses with faces and personalities, allowing the community to run 
much more smoothly during the rest of the year.

It is very clear that holding a virtual event will not be nearly as effective 
at those things as an in-person event.  However, given the current uncertainty, 
it's not clear that the world will be ready for travel later in the Fall 
either.  And if it were, there's a risk that many such postponed event will 
collide with other postponed events, reducing attendance.  Additionally, we 
would need to either coordinate with the University term time, or find another 
venue, which could be much more expensive.

Having a virtual Summit is much better than having no XenSummit at all.  So the 
decision to be made will be to weigh the lower effectiveness of having a 
virtual Summit against the risk that a postponed event will turn out not to be 
possible.

In the mean time, we are brainstorming ways to try to get as much of the 
benefits of an in-person summit as possible.  If you have any thoughts or 
concrete suggestions along these lines -- in particular, things that have been 
tried and worked well or poorly in other virtual events that you've 
participated in -- then please let us know.

To be clear, there is no thought of continuing to hold virtual events after the 
current pandemic has passed.  We fully expect to have an in-person event in 
2021.

As always, if you have any thoughts or suggestions, please feel free to share 
them with me.

Stay safe everyone, and look forward to seeing you all in person when things 
have returned to normal.

 -George Dunlap

Re: [Xen-devel] [PATCH 2/2] xen/mm: Introduce PGC_state_uninitialised

2020-03-24 Thread David Woodhouse
On Tue, 2020-03-24 at 10:08 +, Julien Grall wrote:
> Hi David,
> 
> On 23/03/2020 10:55, David Woodhouse wrote:
> > On Mon, 2020-03-23 at 09:34 +, Julien Grall wrote:
> > > For liveupdate, we will need a way to initialize a page but mark it as
> > > already inuse (i.e in the same state as they would be if allocated
> > > normally).
> > 
> > I am unconvinced of the veracity of this claim.
> > 
> > We don't want to turn specific details of the current Xen buddy
> > allocator part into of the implicit ABI of live update. That goes for
> > the power-of-two zone boundaries, amongst other things.
> 
> Why would you to do that? Marking the page as already used is no 
> different to "PGC_state_unitialized" except the "struct page_info" and 
> the internal of the buddy allocator would be properly setup for start 
> rather than at free.

The internals of the buddy allocator *cannot* be set up properly for a
page which it would not actually give out — like the zero page.

We *could* do some tricks to allocate the zone structures for zones
which *do* exist but contain only "pre-allocated" pages so the buddy
allocator has never seen those zones... yet.


> I think using "PGC_state_unitialised" for preserved page is an abuse. I 
> understand this is existing in other part of Xen (particularly on x86), 
> but I would rather not try to add more.


I am perfectly happy to try avoiding PGC_state_uninitialised for pages
which are "in use" at boot time due to live update.

All I insist on is that you explicitly describe the ABI constraints
that it imposes, if any.

For example, that hack which stops the buddy allocator from giving out
page zero: Could we have live updated from a Xen without that hack, to
a Xen which has it? With page zero already given out to a domain?

With PGC_state_initialised and passing the page through
init_heap_pages() if/when it ever gets freed, my answer would be 'yes'.

What's yours? How would we cope with a situation like that, over LU?



smime.p7s
Description: S/MIME cryptographic signature


Re: [Xen-devel] [XEN PATCH v3 15/23] xen/build: have the root Makefile generates the CFLAGS

2020-03-24 Thread Anthony PERARD
On Mon, Mar 23, 2020 at 04:11:53PM +0100, Roger Pau Monné wrote:
> On Thu, Mar 19, 2020 at 04:24:12PM +, Anthony PERARD wrote:
> > So, testing for the -Wa,--strip-local-absolute flags turns out to be
> > more complicated than I though it would be.
> >  - cc-option-add doesn't work because it doesn't test with the current list
> >of CFLAGS. And if I add the CFLAGS, clang says the option is unused,
> >it doesn't matter if -no-integrated-as is present or not.
> 
> Oh, that seems like completely bogus clang behavior. It's my
> understanding (from reading the manual) that whatever gets appended to
> -Wa, is just passed to the assembler, in which case the compiler
> has no idea whether it's used by it or not.
> 
> Which version of clang did you use to test it?

Probably 9.0.1, I don't think I've upgraded since my tests.

-- 
Anthony PERARD



[Xen-devel] [PATCH 7/7] x86emul: support SYSRET

2020-03-24 Thread Jan Beulich
This is to augment SYSCALL, which has been supported for quite some
time.

Signed-off-by: Jan Beulich 

--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -5975,6 +5975,60 @@ x86_emulate(
 goto done;
 break;
 
+case X86EMUL_OPC(0x0f, 0x07): /* sysret */
+vcpu_must_have(syscall);
+/* Inject #UD if syscall/sysret are disabled. */
+fail_if(!ops->read_msr);
+if ( (rc = ops->read_msr(MSR_EFER, _val, ctxt)) != X86EMUL_OKAY )
+goto done;
+generate_exception_if((msr_val & EFER_SCE) == 0, EXC_UD);
+generate_exception_if(!amd_like(ctxt) && !mode_64bit(), EXC_UD);
+generate_exception_if(!mode_ring0(), EXC_GP, 0);
+generate_exception_if(!in_protmode(ctxt, ops), EXC_GP, 0);
+
+if ( (rc = ops->read_msr(MSR_STAR, _val, ctxt)) != X86EMUL_OKAY )
+goto done;
+
+sreg.sel = ((msr_val >> 48) + 8) | 3; /* SELECTOR_RPL_MASK */
+cs.sel = op_bytes == 8 ? sreg.sel + 8 : sreg.sel - 8;
+
+cs.base = sreg.base = 0; /* flat segment */
+cs.limit = sreg.limit = ~0u; /* 4GB limit */
+cs.attr = 0xcfb; /* G+DB+P+DPL3+S+Code */
+sreg.attr = 0xcf3; /* G+DB+P+DPL3+S+Data */
+
+#ifdef __x86_64__
+if ( mode_64bit() )
+{
+if ( op_bytes == 8 )
+{
+cs.attr = 0xafb; /* L+DB+P+DPL3+S+Code */
+generate_exception_if(!is_canonical_address(_regs.rcx) &&
+  !amd_like(ctxt), EXC_GP, 0);
+_regs.rip = _regs.rcx;
+}
+else
+_regs.rip = _regs.ecx;
+
+_regs.eflags = _regs.r11 & ~(X86_EFLAGS_RF | X86_EFLAGS_VM);
+}
+else
+#endif
+{
+_regs.r(ip) = _regs.ecx;
+_regs.eflags |= X86_EFLAGS_IF;
+}
+
+fail_if(!ops->write_segment);
+if ( (rc = ops->write_segment(x86_seg_cs, , ctxt)) != X86EMUL_OKAY 
||
+ (!amd_like(ctxt) &&
+  (rc = ops->write_segment(x86_seg_ss, ,
+   ctxt)) != X86EMUL_OKAY) )
+goto done;
+
+singlestep = _regs.eflags & X86_EFLAGS_TF;
+break;
+
 case X86EMUL_OPC(0x0f, 0x08): /* invd */
 case X86EMUL_OPC(0x0f, 0x09): /* wbinvd / wbnoinvd */
 generate_exception_if(!mode_ring0(), EXC_GP, 0);




[Xen-devel] [PATCH 6/7] x86emul: vendor specific SYSCALL behavior

2020-03-24 Thread Jan Beulich
AMD CPUs permit the insn everywhere (even outside of protected mode),
while Intel ones restrict it to 64-bit mode. While at it also add the
so far missing CPUID bit check.

Signed-off-by: Jan Beulich 

--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -1870,6 +1870,7 @@ amd_like(const struct x86_emulate_ctxt *
 #define vcpu_has_f16c()(ctxt->cpuid->basic.f16c)
 #define vcpu_has_rdrand()  (ctxt->cpuid->basic.rdrand)
 
+#define vcpu_has_syscall() (ctxt->cpuid->extd.syscall)
 #define vcpu_has_mmxext()  (ctxt->cpuid->extd.mmxext || vcpu_has_sse())
 #define vcpu_has_3dnow_ext()   (ctxt->cpuid->extd._3dnowext)
 #define vcpu_has_3dnow()   (ctxt->cpuid->extd._3dnow)
@@ -5897,13 +5898,13 @@ x86_emulate(
 break;
 
 case X86EMUL_OPC(0x0f, 0x05): /* syscall */
-generate_exception_if(!in_protmode(ctxt, ops), EXC_UD);
-
+vcpu_must_have(syscall);
 /* Inject #UD if syscall/sysret are disabled. */
 fail_if(ops->read_msr == NULL);
 if ( (rc = ops->read_msr(MSR_EFER, _val, ctxt)) != X86EMUL_OKAY )
 goto done;
 generate_exception_if((msr_val & EFER_SCE) == 0, EXC_UD);
+generate_exception_if(!amd_like(ctxt) && !mode_64bit(), EXC_UD);
 
 if ( (rc = ops->read_msr(MSR_STAR, _val, ctxt)) != X86EMUL_OKAY )
 goto done;




[Xen-devel] [PATCH 5/7] x86emul: vendor specific SYSENTER/SYSEXIT behavior in long mode

2020-03-24 Thread Jan Beulich
Intel CPUs permit both insns there while AMD ones don't.

While at it also
- drop the ring 0 check from SYSENTER handling - neither Intel's nor
  AMD's insn pages have any indication of #GP(0) getting raised when
  executed from ring 0, and trying it out in practice also confirms
  the check shouldn't be there,
- move SYSENTER segment register writing until after the (in principle
  able to fail) MSR reads.

Signed-off-by: Jan Beulich 

--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -6588,7 +6588,7 @@ x86_emulate(
 
 case X86EMUL_OPC(0x0f, 0x34): /* sysenter */
 vcpu_must_have(sep);
-generate_exception_if(mode_ring0(), EXC_GP, 0);
+generate_exception_if(amd_like(ctxt) && ctxt->lma, EXC_UD);
 generate_exception_if(!in_protmode(ctxt, ops), EXC_GP, 0);
 
 fail_if(ops->read_msr == NULL);
@@ -6611,11 +6611,6 @@ x86_emulate(
 sreg.limit = ~0u;  /* 4GB limit */
 sreg.attr = 0xc93; /* G+DB+P+S+Data */
 
-fail_if(ops->write_segment == NULL);
-if ( (rc = ops->write_segment(x86_seg_cs, , ctxt)) != 0 ||
- (rc = ops->write_segment(x86_seg_ss, , ctxt)) != 0 )
-goto done;
-
 if ( (rc = ops->read_msr(MSR_IA32_SYSENTER_EIP,
  _val, ctxt)) != X86EMUL_OKAY )
 goto done;
@@ -6626,11 +6621,19 @@ x86_emulate(
 goto done;
 _regs.r(sp) = ctxt->lma ? msr_val : (uint32_t)msr_val;
 
+fail_if(!ops->write_segment);
+if ( (rc = ops->write_segment(x86_seg_cs, ,
+  ctxt)) != X86EMUL_OKAY ||
+ (rc = ops->write_segment(x86_seg_ss, ,
+  ctxt)) != X86EMUL_OKAY )
+goto done;
+
 singlestep = _regs.eflags & X86_EFLAGS_TF;
 break;
 
 case X86EMUL_OPC(0x0f, 0x35): /* sysexit */
 vcpu_must_have(sep);
+generate_exception_if(amd_like(ctxt) && ctxt->lma, EXC_UD);
 generate_exception_if(!mode_ring0(), EXC_GP, 0);
 generate_exception_if(!in_protmode(ctxt, ops), EXC_GP, 0);
 



[Xen-devel] [PATCH 4/7] x86emul: vendor specific near indirect branch behavior in 64-bit mode

2020-03-24 Thread Jan Beulich
Intel CPUs ignore operand size overrides here, while AMD ones don't.

Signed-off-by: Jan Beulich 

--- a/tools/tests/x86_emulator/test_x86_emulator.c
+++ b/tools/tests/x86_emulator/test_x86_emulator.c
@@ -813,6 +813,17 @@ static const struct {
 .opcode = { 0x66, 0x67, 0xe3, 0x10 },
 .opc_len = { 4, 4 },
 .disp = { 4 + 16 - MMAP_ADDR, 4 + 16 },
+}, {
+.descr = "jmpw *(%rsp)",
+.opcode = { 0x66, 0xff, 0x24, 0x24 },
+.opc_len = { 4, 4 },
+.disp = { STKVAL_DISP - MMAP_ADDR, STKVAL_DISP },
+}, {
+.descr = "callw *(%rsp)",
+.opcode = { 0x66, 0xff, 0x14, 0x24 },
+.opc_len = { 4, 4 },
+.stkoff = { -2, -8 },
+.disp = { STKVAL_DISP - MMAP_ADDR, STKVAL_DISP },
 },
 };
 #endif
--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -2524,8 +2524,7 @@ x86_decode_onebyte(
 {
 case 2: /* call (near) */
 case 4: /* jmp (near) */
-case 6: /* push */
-if ( mode_64bit() && op_bytes == 4 )
+if ( mode_64bit() && (op_bytes == 4 || !amd_like(ctxt)) )
 op_bytes = 8;
 state->desc = DstNone | SrcMem | Mov;
 break;
@@ -2537,6 +2536,12 @@ x86_decode_onebyte(
 op_bytes = 4;
 state->desc = DstNone | SrcMem | Mov;
 break;
+
+case 6: /* push */
+if ( mode_64bit() && op_bytes == 4 )
+op_bytes = 8;
+state->desc = DstNone | SrcMem | Mov;
+break;
 }
 break;
 }




[Xen-devel] [PATCH 3/7] x86emul: vendor specific direct branch behavior in 64-bit mode

2020-03-24 Thread Jan Beulich
Intel CPUs ignore operand size overrides here, while AMD ones don't.

Signed-off-by: Jan Beulich 

--- a/tools/tests/x86_emulator/test_x86_emulator.c
+++ b/tools/tests/x86_emulator/test_x86_emulator.c
@@ -757,6 +757,62 @@ static const struct {
 .opc_len = { 4, 4 },
 .stkoff = { 2 + 16, 8 + 16 },
 .disp = { STKVAL_DISP - MMAP_ADDR, STKVAL_DISP },
+}, {
+.descr = "jmpw .+16",
+.opcode = { 0x66, 0xeb, 0x10 },
+.opc_len = { 3, 3 },
+.disp = { 3 + 16 - MMAP_ADDR, 3 + 16 },
+}, {
+.descr = "jmpw .+128",
+.opcode = { 0x66, 0xe9, 0x80, 0x00, 0x00, 0x00 },
+.opc_len = { 4, 6 },
+.disp = { 4 + 128 - MMAP_ADDR, 6 + 128 },
+}, {
+.descr = "callw .+16",
+.opcode = { 0x66, 0xe8, 0x10, 0x00, 0x00, 0x00 },
+.opc_len = { 4, 6 },
+.stkoff = { -2, -8 },
+.disp = { 4 + 16 - MMAP_ADDR, 6 + 16 },
+}, {
+.descr = "jzw .+16",
+.opcode = { 0x66, 0x74, 0x10 },
+.opc_len = { 3, 3 },
+.disp = { 3, 3 },
+}, {
+.descr = "jzw .+128",
+.opcode = { 0x66, 0x0f, 0x84, 0x80, 0x00, 0x00, 0x00 },
+.opc_len = { 5, 7 },
+.disp = { 5, 7 },
+}, {
+.descr = "jnzw .+16",
+.opcode = { 0x66, 0x75, 0x10 },
+.opc_len = { 3, 3 },
+.disp = { 3 + 16 - MMAP_ADDR, 3 + 16 },
+}, {
+.descr = "jnzw .+128",
+.opcode = { 0x66, 0x0f, 0x85, 0x80, 0x00, 0x00, 0x00 },
+.opc_len = { 5, 7 },
+.disp = { 5 + 128 - MMAP_ADDR, 7 + 128 },
+}, {
+.descr = "loopqw .+16 (RCX>1)",
+.opcode = { 0x66, 0xe0, 0x10 },
+.opc_len = { 3, 3 },
+.disp = { 3 + 16 - MMAP_ADDR, 3 + 16 },
+}, {
+.descr = "looplw .+16 (ECX=1)",
+.opcode = { 0x66, 0x67, 0xe0, 0x10 },
+.opc_len = { 4, 4 },
+.disp = { 4, 4 },
+}, {
+.descr = "jrcxzw .+16 (RCX>0)",
+.opcode = { 0x66, 0xe3, 0x10 },
+.opc_len = { 3, 3 },
+.disp = { 3, 3 },
+}, {
+.descr = "jecxzw .+16 (ECX=0)",
+.opcode = { 0x66, 0x67, 0xe3, 0x10 },
+.opc_len = { 4, 4 },
+.disp = { 4 + 16 - MMAP_ADDR, 4 + 16 },
 },
 };
 #endif
@@ -1361,6 +1417,7 @@ int main(int argc, char **argv)
 const char *vendor = cp.x86_vendor == X86_VENDOR_INTEL ? "Intel" : 
"AMD";
 uint64_t *stk = (void *)res + MMAP_SZ - 16;
 
+regs.rcx = 2;
 for ( i = 0; i < ARRAY_SIZE(vendor_tests); ++i )
 {
 printf("%-*s",
@@ -1370,6 +1427,7 @@ int main(int argc, char **argv)
 regs.eflags = EFLAGS_ALWAYS_SET;
 regs.rip= (unsigned long)instr;
 regs.rsp= (unsigned long)stk;
+regs.rcx   |= 0x87654321UL;
 stk[0]  = regs.rip + STKVAL_DISP;
 rc = x86_emulate(, );
 if ( (rc != X86EMUL_OKAY) ||
@@ -1379,6 +1437,16 @@ int main(int argc, char **argv)
?: vendor_tests[i].opc_len[v])) ||
  (regs.rsp != (unsigned long)stk + vendor_tests[i].stkoff[v]) )
 goto fail;
+/* For now only call insns push something onto the stack. */
+if ( regs.rsp < (unsigned long)stk )
+{
+unsigned long opc_end = (unsigned long)instr +
+vendor_tests[i].opc_len[v];
+
+if ( memcmp(_end, (void *)regs.rsp,
+min((unsigned long)stk - regs.rsp, 8UL)) )
+goto fail;
+}
 printf("okay\n");
 }
 
--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -1273,7 +1273,7 @@ do {
 #define jmp_rel(rel)\
 do {\
 unsigned long ip = _regs.r(ip) + (int)(rel);\
-if ( op_bytes == 2 )\
+if ( op_bytes == 2 && (amd_like(ctxt) || !mode_64bit()) )   \
 ip = (uint16_t)ip;  \
 else if ( !mode_64bit() )   \
 ip = (uint32_t)ip;  \
@@ -3392,7 +3392,13 @@ x86_decode(
 
 case SrcImm:
 if ( !(d & ByteOp) )
+{
+if ( mode_64bit() && !amd_like(ctxt) &&
+ ((ext == ext_none && (b | 1) == 0xe9) /* call / jmp */ ||
+  (ext == ext_0f && (b | 0xf) == 0x8f) /* jcc */ ) )
+op_bytes = 4;
 bytes = op_bytes != 8 ? op_bytes : 4;
+}
 else
 {
 case SrcImmByte:



[Xen-devel] [PATCH 1/7] x86emul: add wrappers to check for AMD-like behavior

2020-03-24 Thread Jan Beulich
These are to aid readbility at their use sites, in particular because
we're going to gain more of them.

Suggested-by: Andrew Cooper 
Signed-off-by: Jan Beulich 

--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -1836,6 +1836,18 @@ in_protmode(
 return !(in_realmode(ctxt, ops) || (ctxt->regs->eflags & X86_EFLAGS_VM));
 }
 
+static bool
+_amd_like(const struct cpuid_policy *cp)
+{
+return cp->x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON);
+}
+
+static bool
+amd_like(const struct x86_emulate_ctxt *ctxt)
+{
+return _amd_like(ctxt->cpuid);
+}
+
 #define vcpu_has_fpu() (ctxt->cpuid->basic.fpu)
 #define vcpu_has_sep() (ctxt->cpuid->basic.sep)
 #define vcpu_has_cx8() (ctxt->cpuid->basic.cx8)
@@ -1995,7 +2007,7 @@ protmode_load_seg(
 case x86_seg_tr:
 goto raise_exn;
 }
-if ( !(cp->x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON)) ||
+if ( !_amd_like(cp) ||
  !ops->read_segment ||
  ops->read_segment(seg, sreg, ctxt) != X86EMUL_OKAY )
 memset(sreg, 0, sizeof(*sreg));
@@ -2122,9 +2134,7 @@ protmode_load_seg(
  *   - all 16 bytes read with the high 8 bytes ignored on AMD.
  */
 bool wide = desc.b & 0x1000
-? false : (desc.b & 0xf00) != 0xc00 &&
-   !(cp->x86_vendor &
- (X86_VENDOR_AMD | X86_VENDOR_HYGON))
+? false : (desc.b & 0xf00) != 0xc00 && !_amd_like(cp)
? mode_64bit() : ctxt->lma;
 
 if ( wide )
@@ -2142,9 +2152,7 @@ protmode_load_seg(
 default:
 return rc;
 }
-if ( !mode_64bit() &&
- (cp->x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON)) &&
- (desc.b & 0xf00) != 0xc00 )
+if ( !mode_64bit() && _amd_like(cp) && (desc.b & 0xf00) != 0xc00 )
 desc_hi.b = desc_hi.a = 0;
 if ( (desc_hi.b & 0x1f00) ||
  (seg != x86_seg_none &&
@@ -2525,9 +2533,7 @@ x86_decode_onebyte(
 case 3: /* call (far, absolute indirect) */
 case 5: /* jmp (far, absolute indirect) */
 /* REX.W ignored on a vendor-dependent basis. */
-if ( op_bytes == 8 &&
- (ctxt->cpuid->x86_vendor &
-  (X86_VENDOR_AMD | X86_VENDOR_HYGON)) )
+if ( op_bytes == 8 && amd_like(ctxt) )
 op_bytes = 4;
 state->desc = DstNone | SrcMem | Mov;
 break;
@@ -2651,8 +2657,7 @@ x86_decode_twobyte(
 case 0xb4: /* lfs */
 case 0xb5: /* lgs */
 /* REX.W ignored on a vendor-dependent basis. */
-if ( op_bytes == 8 &&
- (ctxt->cpuid->x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON)) )
+if ( op_bytes == 8 && amd_like(ctxt) )
 op_bytes = 4;
 break;
 
@@ -4068,9 +4073,7 @@ x86_emulate(
 if ( ea.type == OP_REG )
 src.val = *ea.reg;
 else if ( (rc = read_ulong(ea.mem.seg, ea.mem.off, ,
-   (op_bytes == 2 &&
-!(ctxt->cpuid->x86_vendor &
-  (X86_VENDOR_AMD | X86_VENDOR_HYGON))
+   (op_bytes == 2 && !amd_like(ctxt)
 ? 2 : 4),
ctxt, ops)) )
 goto done;




[Xen-devel] [PATCH 2/7] x86emul: vendor specific near RET behavior in 64-bit mode

2020-03-24 Thread Jan Beulich
Intel CPUs ignore operand size overrides here, while AMD ones don't.

Signed-off-by: Jan Beulich 

--- a/tools/tests/x86_emulator/test_x86_emulator.c
+++ b/tools/tests/x86_emulator/test_x86_emulator.c
@@ -733,6 +733,34 @@ static struct x86_emulate_ops emulops =
 #define EFLAGS_ALWAYS_SET (X86_EFLAGS_IF | X86_EFLAGS_MBS)
 #define EFLAGS_MASK (X86_EFLAGS_ARITH_MASK | EFLAGS_ALWAYS_SET)
 
+#define MMAP_ADDR 0x10
+
+#ifdef __x86_64__
+# define STKVAL_DISP 64
+static const struct {
+const char *descr;
+uint8_t opcode[8];
+/* Index 0: AMD, index 1: Intel. */
+uint8_t opc_len[2];
+int8_t stkoff[2];
+int32_t disp[2];
+} vendor_tests[] = {
+{
+.descr = "retw",
+.opcode = { 0x66, 0xc3 },
+.opc_len = { 2, 2 },
+.stkoff = { 2, 8 },
+.disp = { STKVAL_DISP - MMAP_ADDR, STKVAL_DISP },
+}, {
+.descr = "retw $16",
+.opcode = { 0x66, 0xc2, 0x10, 0x00 },
+.opc_len = { 4, 4 },
+.stkoff = { 2 + 16, 8 + 16 },
+.disp = { STKVAL_DISP - MMAP_ADDR, STKVAL_DISP },
+},
+};
+#endif
+
 int main(int argc, char **argv)
 {
 struct x86_emulate_ctxt ctxt;
@@ -741,7 +769,9 @@ int main(int argc, char **argv)
 unsigned int *res, i, j;
 bool stack_exec;
 int rc;
-#ifndef __x86_64__
+#ifdef __x86_64__
+unsigned int vendor_native;
+#else
 unsigned int bcdres_native, bcdres_emul;
 #endif
 
@@ -755,7 +785,7 @@ int main(int argc, char **argv)
 ctxt.addr_size = 8 * sizeof(void *);
 ctxt.sp_size   = 8 * sizeof(void *);
 
-res = mmap((void *)0x10, MMAP_SZ, PROT_READ|PROT_WRITE|PROT_EXEC,
+res = mmap((void *)MMAP_ADDR, MMAP_SZ, PROT_READ|PROT_WRITE|PROT_EXEC,
MAP_FIXED|MAP_PRIVATE|MAP_ANONYMOUS, 0, 0);
 if ( res == MAP_FAILED )
 {
@@ -1323,7 +1353,41 @@ int main(int argc, char **argv)
  (regs.eip != (unsigned long)[3]) )
 goto fail;
 printf("okay\n");
-#endif
+
+vendor_native = cp.x86_vendor;
+for ( cp.x86_vendor = X86_VENDOR_AMD; ; )
+{
+unsigned int v = cp.x86_vendor == X86_VENDOR_INTEL;
+const char *vendor = cp.x86_vendor == X86_VENDOR_INTEL ? "Intel" : 
"AMD";
+uint64_t *stk = (void *)res + MMAP_SZ - 16;
+
+for ( i = 0; i < ARRAY_SIZE(vendor_tests); ++i )
+{
+printf("%-*s",
+   40 - printf("Testing %s [%s]", vendor_tests[i].descr, 
vendor),
+   "...");
+memcpy(instr, vendor_tests[i].opcode, vendor_tests[i].opc_len[v]);
+regs.eflags = EFLAGS_ALWAYS_SET;
+regs.rip= (unsigned long)instr;
+regs.rsp= (unsigned long)stk;
+stk[0]  = regs.rip + STKVAL_DISP;
+rc = x86_emulate(, );
+if ( (rc != X86EMUL_OKAY) ||
+ (regs.eflags != EFLAGS_ALWAYS_SET) ||
+ (regs.rip != (unsigned long)instr +
+  (vendor_tests[i].disp[v]
+   ?: vendor_tests[i].opc_len[v])) ||
+ (regs.rsp != (unsigned long)stk + vendor_tests[i].stkoff[v]) )
+goto fail;
+printf("okay\n");
+}
+
+if ( cp.x86_vendor == X86_VENDOR_INTEL )
+break;
+cp.x86_vendor = X86_VENDOR_INTEL;
+}
+cp.x86_vendor = vendor_native;
+#endif /* x86-64 */
 
 printf("%-40s", "Testing shld $1,%ecx,(%edx)...");
 res[0]  = 0x12345678;
--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -4611,7 +4611,8 @@ x86_emulate(
 
 case 0xc2: /* ret imm16 (near) */
 case 0xc3: /* ret (near) */
-op_bytes = ((op_bytes == 4) && mode_64bit()) ? 8 : op_bytes;
+op_bytes = (op_bytes == 4 || !amd_like(ctxt)) && mode_64bit()
+   ? 8 : op_bytes;
 if ( (rc = read_ulong(x86_seg_ss, sp_post_inc(op_bytes + src.val),
   , op_bytes, ctxt, ops)) != 0 ||
  (rc = ops->insn_fetch(x86_seg_cs, dst.val, NULL, 0, ctxt)) )




[Xen-devel] [PATCH 0/7] x86emul: (mainly) vendor specific behavior adjustments

2020-03-24 Thread Jan Beulich
There are quite a few more vendor differences than we currently support,
in particular in 64-bit mode. Now that I've made some progress on the
binutils side I felt more confident in making the changes here as well.

1: add wrappers to check for AMD-like behavior
2: vendor specific near RET behavior in 64-bit mode
3: vendor specific direct branch behavior in 64-bit mode
4: vendor specific near indirect branch behavior in 64-bit mode
5: vendor specific SYSENTER/SYSEXIT behavior in long mode
6: vendor specific SYSCALL behavior
7: support SYSRET

Jan



[Xen-devel] [xen-unstable test] 148925: tolerable trouble: fail/pass/starved

2020-03-24 Thread osstest service owner
flight 148925 xen-unstable real [real]
http://logs.test-lab.xenproject.org/osstest/logs/148925/

Failures :-/ but no regressions.

Tests which are failing intermittently (not blocking):
 test-amd64-amd64-dom0pvh-xl-intel 17 guest-saverestore.2   fail pass in 148873
 test-arm64-arm64-xl  12 guest-startfail pass in 148873

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-xl-rtds 16 guest-start/debian.repeat fail in 148873 like 
148826
 test-arm64-arm64-xl 13 migrate-support-check fail in 148873 never pass
 test-arm64-arm64-xl 14 saverestore-support-check fail in 148873 never pass
 test-amd64-amd64-qemuu-nested-intel 17 debian-hvm-install/l1/l2 fail like 
148873
 test-armhf-armhf-libvirt-raw 13 saverestore-support-checkfail  like 148873
 test-amd64-i386-xl-qemut-win7-amd64 17 guest-stop fail like 148873
 test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stopfail like 148873
 test-amd64-amd64-xl-qemut-win7-amd64 17 guest-stopfail like 148873
 test-armhf-armhf-libvirt 14 saverestore-support-checkfail  like 148873
 test-amd64-amd64-xl-qemuu-ws16-amd64 17 guest-stopfail like 148873
 test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop fail like 148873
 test-amd64-amd64-xl-qemut-ws16-amd64 17 guest-stopfail like 148873
 test-amd64-i386-xl-qemuu-ws16-amd64 17 guest-stop fail like 148873
 test-arm64-arm64-xl-seattle  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  14 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt 13 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-amd64-i386-xl-pvshim12 guest-start  fail   never pass
 test-amd64-i386-libvirt-xsm  13 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  13 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2  fail never pass
 test-arm64-arm64-xl-xsm  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 14 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 13 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 14 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-rtds 13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 14 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 13 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 14 saverestore-support-checkfail never pass
 test-armhf-armhf-xl-credit2  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 13 migrate-support-checkfail   never pass
 test-amd64-i386-xl-qemut-ws16-amd64 17 guest-stop  fail never pass
 test-amd64-amd64-dom0pvh-xl-amd  2 hosts-allocate   starved  n/a

version targeted for testing:
 xen  60d6ba1916dce0622a53b00dbae3c01d0761057e
baseline version:
 xen  60d6ba1916dce0622a53b00dbae3c01d0761057e

Last test of basis   148925  2020-03-23 17:36:41 Z0 days
Testing same 

Re: [Xen-devel] [PATCH] Revert "domctl: improve locking during domain destruction"

2020-03-24 Thread Jan Beulich
On 24.03.2020 16:21, Hongyan Xia wrote:
> From: Hongyan Xia 
> 
> Unfortunately, even though that commit dropped the domctl lock and
> allowed other domctl to continue, it created severe lock contention
> within domain destructions themselves. Multiple domain destructions in
> parallel now spin for the global heap lock when freeing memory and could
> spend a long time before the next hypercall continuation.

I'm not at all happy to see this reverted; instead I was hoping that
we could drop the domctl lock in further cases. If a lack of
continuations is the problem, did you try forcing them to occur more
frequently?

> In contrast,
> after dropping that commit, parallel domain destructions will just fail
> to take the domctl lock, creating a hypercall continuation and backing
> off immediately, allowing the thread that holds the lock to destroy a
> domain much more quickly and allowing backed-off threads to process
> events and irqs.
> 
> On a 144-core server with 4TiB of memory, destroying 32 guests (each
> with 4 vcpus and 122GiB memory) simultaneously takes:
> 
> before the revert: 29 minutes
> after the revert: 6 minutes

This wants comparing against numbers demonstrating the bad effects of
the global domctl lock. Iirc they were quite a bit higher than 6 min,
perhaps depending on guest properties.

Jan



[Xen-devel] [libvirt test] 148954: regressions - FAIL

2020-03-24 Thread osstest service owner
flight 148954 libvirt real [real]
http://logs.test-lab.xenproject.org/osstest/logs/148954/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-amd64-libvirt   6 libvirt-buildfail REGR. vs. 146182
 build-i386-libvirt6 libvirt-buildfail REGR. vs. 146182
 build-arm64-libvirt   6 libvirt-buildfail REGR. vs. 146182
 build-armhf-libvirt   6 libvirt-buildfail REGR. vs. 146182

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt-pair  1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-vhd  1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt-raw  1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-xsm   1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt   1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-xsm  1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt-qcow2  1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-pair  1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt-xsm  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked n/a
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked n/a
 test-arm64-arm64-libvirt  1 build-check(1)   blocked  n/a

version targeted for testing:
 libvirt  75c386985e02ef0d430d968a5c7093549507efba
baseline version:
 libvirt  a1cd25b919509be2645dbe6f952d5263e0d4e4e5

Last test of basis   146182  2020-01-17 06:00:23 Z   67 days
Failing since146211  2020-01-18 04:18:52 Z   66 days   63 attempts
Testing same since   148954  2020-03-24 04:19:05 Z0 days1 attempts


People who touched revisions under test:
  Andrea Bolognani 
  Arnaud Patard 
  Boris Fiuczynski 
  Christian Ehrhardt 
  Collin Walling 
  Daniel Henrique Barboza 
  Daniel P. Berrangé 
  Daniel Veillard 
  Dario Faggioli 
  Erik Skultety 
  Gaurav Agrawal 
  Han Han 
  Jim Fehlig 
  Jiri Denemark 
  Jonathon Jongsma 
  Julio Faracco 
  Ján Tomko 
  Laine Stump 
  Lin Ma 
  Marek Marczykowski-Górecki 
  Michal Privoznik 
  Nikolay Shirokovskiy 
  Pavel Hrdina 
  Pavel Mores 
  Peter Krempa 
  Pino Toscano 
  Rafael Fonseca 
  Richard W.M. Jones 
  Rikard Falkeborn 
  Ryan Moeller 
  Sahid Orentino Ferdjaoui 
  Sebastian Mitterle 
  Seeteena Thoufeek 
  Stefan Berger 
  Stefan Berger 
  Stefan Hajnoczi 
  Thomas Huth 
  Wu Qingliang 
  Your Name 
  Zhang Bo 
  zhenwei pi 
  Zhimin Feng 

jobs:
 build-amd64-xsm  pass
 build-arm64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-arm64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-libvirt  fail
 build-arm64-libvirt  fail
 build-armhf-libvirt  fail
 build-i386-libvirt   fail
 build-amd64-pvopspass
 build-arm64-pvopspass
 build-armhf-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm   blocked 
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsmblocked 
 test-amd64-amd64-libvirt-xsm blocked 
 test-arm64-arm64-libvirt-xsm blocked 
 test-amd64-i386-libvirt-xsm  blocked 
 test-amd64-amd64-libvirt blocked 
 test-arm64-arm64-libvirt blocked 
 test-armhf-armhf-libvirt blocked 
 test-amd64-i386-libvirt  blocked 
 test-amd64-amd64-libvirt-pairblocked 
 test-amd64-i386-libvirt-pair blocked 
 test-arm64-arm64-libvirt-qcow2   blocked 
 test-armhf-armhf-libvirt-raw blocked 
 test-amd64-amd64-libvirt-vhd blocked 



[Xen-devel] [ovmf test] 148946: all pass - PUSHED

2020-03-24 Thread osstest service owner
flight 148946 ovmf real [real]
http://logs.test-lab.xenproject.org/osstest/logs/148946/

Perfect :-)
All tests in this flight passed as required
version targeted for testing:
 ovmf 2f524a745e23e1b4c73ea22b058492bfe4af84c2
baseline version:
 ovmf 0c8ea9fe1adbbee230ee0c68f28b68ca2b0534bc

Last test of basis   148761  2020-03-19 17:39:22 Z4 days
Testing same since   148946  2020-03-24 02:46:56 Z0 days1 attempts


People who touched revisions under test:
  Fan, ZhijuX 
  Zhiju.Fan 

jobs:
 build-amd64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 pass
 test-amd64-i386-xl-qemuu-ovmf-amd64  pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/osstest/ovmf.git
   0c8ea9fe1a..2f524a745e  2f524a745e23e1b4c73ea22b058492bfe4af84c2 -> 
xen-tested-master



[Xen-devel] [PATCH] Revert "domctl: improve locking during domain destruction"

2020-03-24 Thread Hongyan Xia
From: Hongyan Xia 

Unfortunately, even though that commit dropped the domctl lock and
allowed other domctl to continue, it created severe lock contention
within domain destructions themselves. Multiple domain destructions in
parallel now spin for the global heap lock when freeing memory and could
spend a long time before the next hypercall continuation. In contrast,
after dropping that commit, parallel domain destructions will just fail
to take the domctl lock, creating a hypercall continuation and backing
off immediately, allowing the thread that holds the lock to destroy a
domain much more quickly and allowing backed-off threads to process
events and irqs.

On a 144-core server with 4TiB of memory, destroying 32 guests (each
with 4 vcpus and 122GiB memory) simultaneously takes:

before the revert: 29 minutes
after the revert: 6 minutes

This is timed between the first page and the very last page of all 32
guests is released back to the heap.

This reverts commit 228ab9992ffb1d8f9d2475f2581e68b2913acb88.

Signed-off-by: Hongyan Xia 
---
 xen/common/domain.c | 11 +--
 xen/common/domctl.c |  5 +
 2 files changed, 2 insertions(+), 14 deletions(-)

diff --git a/xen/common/domain.c b/xen/common/domain.c
index b4eb476a9c..7b02f5ead7 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -698,20 +698,11 @@ int domain_kill(struct domain *d)
 if ( d == current->domain )
 return -EINVAL;
 
-/* Protected by d->domain_lock. */
+/* Protected by domctl_lock. */
 switch ( d->is_dying )
 {
 case DOMDYING_alive:
-domain_unlock(d);
 domain_pause(d);
-domain_lock(d);
-/*
- * With the domain lock dropped, d->is_dying may have changed. Call
- * ourselves recursively if so, which is safe as then we won't come
- * back here.
- */
-if ( d->is_dying != DOMDYING_alive )
-return domain_kill(d);
 d->is_dying = DOMDYING_dying;
 argo_destroy(d);
 evtchn_destroy(d);
diff --git a/xen/common/domctl.c b/xen/common/domctl.c
index a69b3b59a8..e010079203 100644
--- a/xen/common/domctl.c
+++ b/xen/common/domctl.c
@@ -571,14 +571,11 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) 
u_domctl)
 break;
 
 case XEN_DOMCTL_destroydomain:
-domctl_lock_release();
-domain_lock(d);
 ret = domain_kill(d);
-domain_unlock(d);
 if ( ret == -ERESTART )
 ret = hypercall_create_continuation(
 __HYPERVISOR_domctl, "h", u_domctl);
-goto domctl_out_unlock_domonly;
+break;
 
 case XEN_DOMCTL_setnodeaffinity:
 {
-- 
2.17.1




Re: [Xen-devel] [PATCH 1/2] xen: expand BALLOON_MEMORY_HOTPLUG description

2020-03-24 Thread Jürgen Groß

On 24.03.20 16:18, Roger Pau Monné wrote:

On Tue, Mar 24, 2020 at 04:13:48PM +0100, Jürgen Groß wrote:

On 24.03.20 16:00, Roger Pau Monne wrote:

To mention it's also useful for PVH or HVM domains that require
mapping foreign memory or grants.

Signed-off-by: Roger Pau Monné 
---
Cc: Boris Ostrovsky 
Cc: Juergen Gross 
Cc: Stefano Stabellini 
Cc: xen-devel@lists.xenproject.org
---
   drivers/xen/Kconfig | 4 
   1 file changed, 4 insertions(+)

diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig
index 61212fc7f0c7..57ddd6f4b729 100644
--- a/drivers/xen/Kconfig
+++ b/drivers/xen/Kconfig
@@ -19,6 +19,10 @@ config XEN_BALLOON_MEMORY_HOTPLUG
  It is very useful on critical systems which require long
  run without rebooting.
+ It's also very useful for translated domains (PVH or HVM) to obtain


I'd rather say "(non PV)" or "(PVH, HVM or Arm)".


I'm fine with any of the variants. Would you mind adjusting when
picking it up or would you like me to resend?


No need to resend. I'll use the "non PV" variant.


Juergen




Re: [Xen-devel] [PATCH 1/2] xen: expand BALLOON_MEMORY_HOTPLUG description

2020-03-24 Thread Roger Pau Monné
On Tue, Mar 24, 2020 at 04:13:48PM +0100, Jürgen Groß wrote:
> On 24.03.20 16:00, Roger Pau Monne wrote:
> > To mention it's also useful for PVH or HVM domains that require
> > mapping foreign memory or grants.
> > 
> > Signed-off-by: Roger Pau Monné 
> > ---
> > Cc: Boris Ostrovsky 
> > Cc: Juergen Gross 
> > Cc: Stefano Stabellini 
> > Cc: xen-devel@lists.xenproject.org
> > ---
> >   drivers/xen/Kconfig | 4 
> >   1 file changed, 4 insertions(+)
> > 
> > diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig
> > index 61212fc7f0c7..57ddd6f4b729 100644
> > --- a/drivers/xen/Kconfig
> > +++ b/drivers/xen/Kconfig
> > @@ -19,6 +19,10 @@ config XEN_BALLOON_MEMORY_HOTPLUG
> >   It is very useful on critical systems which require long
> >   run without rebooting.
> > + It's also very useful for translated domains (PVH or HVM) to obtain
> 
> I'd rather say "(non PV)" or "(PVH, HVM or Arm)".

I'm fine with any of the variants. Would you mind adjusting when
picking it up or would you like me to resend?

> > + unpopulated physical memory ranges to use in order to map foreign
> > + memory or grants.
> > +
> >   Memory could be hotplugged in following steps:
> > 1) target domain: ensure that memory auto online policy is in
> > 
> 
> With that:
> 
> Reviewed-by: Juergen Gross 

Thanks!



Re: [Xen-devel] [PATCH 2/2] xen: enable BALLOON_MEMORY_HOTPLUG by default

2020-03-24 Thread Roger Pau Monné
On Tue, Mar 24, 2020 at 04:09:35PM +0100, Jürgen Groß wrote:
> On 24.03.20 16:00, Roger Pau Monne wrote:
> > Without it a PVH dom0 is mostly useless, as it would balloon down huge
> > amounts of RAM in order get physical address space to map foreign
> > memory and grants, ultimately leading to an out of memory situation.
> > 
> > Such option is also needed for HVM or PVH driver domains, since they
> > also require mapping grants into physical memory regions.
> > 
> > Suggested-by: Ian Jackson 
> > Signed-off-by: Roger Pau Monné 
> > ---
> > Cc: Boris Ostrovsky 
> > Cc: Juergen Gross 
> > Cc: Stefano Stabellini 
> > Cc: xen-devel@lists.xenproject.org
> > ---
> >   drivers/xen/Kconfig | 1 +
> >   1 file changed, 1 insertion(+)
> > 
> > diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig
> > index 57ddd6f4b729..c344bcffd89d 100644
> > --- a/drivers/xen/Kconfig
> > +++ b/drivers/xen/Kconfig
> > @@ -13,6 +13,7 @@ config XEN_BALLOON
> >   config XEN_BALLOON_MEMORY_HOTPLUG
> > bool "Memory hotplug support for Xen balloon driver"
> > depends on XEN_BALLOON && MEMORY_HOTPLUG
> > +   default y
> > help
> >   Memory hotplug support for Xen balloon driver allows expanding memory
> >   available for the system above limit declared at system startup.
> > 
> 
> Another variant would be to set: default XEN_BACKEND
> 
> This would match the reasoning for switching it on.

I would rather have it always on if possible, as gntdev or privcmd
(when used to map foreign pages from user-space) will also require it,
and they are not gated on XEN_BACKEND AFAICT.

> Either way would be fine with me, so you can add
> 
> Reviewed-by: Juergen Gross 

Thanks!

Roger.



Re: [Xen-devel] [PATCH 1/2] xen: expand BALLOON_MEMORY_HOTPLUG description

2020-03-24 Thread Jürgen Groß

On 24.03.20 16:00, Roger Pau Monne wrote:

To mention it's also useful for PVH or HVM domains that require
mapping foreign memory or grants.

Signed-off-by: Roger Pau Monné 
---
Cc: Boris Ostrovsky 
Cc: Juergen Gross 
Cc: Stefano Stabellini 
Cc: xen-devel@lists.xenproject.org
---
  drivers/xen/Kconfig | 4 
  1 file changed, 4 insertions(+)

diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig
index 61212fc7f0c7..57ddd6f4b729 100644
--- a/drivers/xen/Kconfig
+++ b/drivers/xen/Kconfig
@@ -19,6 +19,10 @@ config XEN_BALLOON_MEMORY_HOTPLUG
  It is very useful on critical systems which require long
  run without rebooting.
  
+	  It's also very useful for translated domains (PVH or HVM) to obtain


I'd rather say "(non PV)" or "(PVH, HVM or Arm)".


+ unpopulated physical memory ranges to use in order to map foreign
+ memory or grants.
+
  Memory could be hotplugged in following steps:
  
  	1) target domain: ensure that memory auto online policy is in




With that:

Reviewed-by: Juergen Gross 


Juergen



Re: [Xen-devel] [PATCH v2] xen/sched: fix onlining cpu with core scheduling active

2020-03-24 Thread Dario Faggioli
On Tue, 2020-03-10 at 10:06 +0100, Juergen Gross wrote:
> When onlining a cpu cpupool_cpu_add() checks whether all siblings of
> the new cpu are free in order to decide whether to add it to
> cpupool0.
> In case the added cpu is not the last sibling to be onlined this test
> is wrong as it only checks for all online siblings to be free. The
> test should include the check for the number of siblings having
> reached the scheduling granularity of cpupool0, too.
> 
> Signed-off-by: Juergen Gross 
>
Reviewed-by: Dario Faggioli 

Regards
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
---
<> (Raistlin Majere)



signature.asc
Description: This is a digitally signed message part


Re: [Xen-devel] [PATCH 2/2] xen: enable BALLOON_MEMORY_HOTPLUG by default

2020-03-24 Thread Jürgen Groß

On 24.03.20 16:00, Roger Pau Monne wrote:

Without it a PVH dom0 is mostly useless, as it would balloon down huge
amounts of RAM in order get physical address space to map foreign
memory and grants, ultimately leading to an out of memory situation.

Such option is also needed for HVM or PVH driver domains, since they
also require mapping grants into physical memory regions.

Suggested-by: Ian Jackson 
Signed-off-by: Roger Pau Monné 
---
Cc: Boris Ostrovsky 
Cc: Juergen Gross 
Cc: Stefano Stabellini 
Cc: xen-devel@lists.xenproject.org
---
  drivers/xen/Kconfig | 1 +
  1 file changed, 1 insertion(+)

diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig
index 57ddd6f4b729..c344bcffd89d 100644
--- a/drivers/xen/Kconfig
+++ b/drivers/xen/Kconfig
@@ -13,6 +13,7 @@ config XEN_BALLOON
  config XEN_BALLOON_MEMORY_HOTPLUG
bool "Memory hotplug support for Xen balloon driver"
depends on XEN_BALLOON && MEMORY_HOTPLUG
+   default y
help
  Memory hotplug support for Xen balloon driver allows expanding memory
  available for the system above limit declared at system startup.



Another variant would be to set: default XEN_BACKEND

This would match the reasoning for switching it on.

Either way would be fine with me, so you can add

Reviewed-by: Juergen Gross 


Juergen



[Xen-devel] [PATCH 2/2] xen: enable BALLOON_MEMORY_HOTPLUG by default

2020-03-24 Thread Roger Pau Monne
Without it a PVH dom0 is mostly useless, as it would balloon down huge
amounts of RAM in order get physical address space to map foreign
memory and grants, ultimately leading to an out of memory situation.

Such option is also needed for HVM or PVH driver domains, since they
also require mapping grants into physical memory regions.

Suggested-by: Ian Jackson 
Signed-off-by: Roger Pau Monné 
---
Cc: Boris Ostrovsky 
Cc: Juergen Gross 
Cc: Stefano Stabellini 
Cc: xen-devel@lists.xenproject.org
---
 drivers/xen/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig
index 57ddd6f4b729..c344bcffd89d 100644
--- a/drivers/xen/Kconfig
+++ b/drivers/xen/Kconfig
@@ -13,6 +13,7 @@ config XEN_BALLOON
 config XEN_BALLOON_MEMORY_HOTPLUG
bool "Memory hotplug support for Xen balloon driver"
depends on XEN_BALLOON && MEMORY_HOTPLUG
+   default y
help
  Memory hotplug support for Xen balloon driver allows expanding memory
  available for the system above limit declared at system startup.
-- 
2.25.0




[Xen-devel] [PATCH 1/2] xen: expand BALLOON_MEMORY_HOTPLUG description

2020-03-24 Thread Roger Pau Monne
To mention it's also useful for PVH or HVM domains that require
mapping foreign memory or grants.

Signed-off-by: Roger Pau Monné 
---
Cc: Boris Ostrovsky 
Cc: Juergen Gross 
Cc: Stefano Stabellini 
Cc: xen-devel@lists.xenproject.org
---
 drivers/xen/Kconfig | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig
index 61212fc7f0c7..57ddd6f4b729 100644
--- a/drivers/xen/Kconfig
+++ b/drivers/xen/Kconfig
@@ -19,6 +19,10 @@ config XEN_BALLOON_MEMORY_HOTPLUG
  It is very useful on critical systems which require long
  run without rebooting.
 
+ It's also very useful for translated domains (PVH or HVM) to obtain
+ unpopulated physical memory ranges to use in order to map foreign
+ memory or grants.
+
  Memory could be hotplugged in following steps:
 
1) target domain: ensure that memory auto online policy is in
-- 
2.25.0




Re: [Xen-devel] xl vcpu-pin peculiarities in core scheduling mode

2020-03-24 Thread Dario Faggioli
On Tue, 2020-03-24 at 15:22 +0100, Jürgen Groß wrote:
> On 24.03.20 14:34, Sergey Dyasli wrote:
> > I did some experiments and noticed
> > the following
> > inconsistencies:
> > 
> >1. xl vcpu-pin 5 0 0
> >   Windows 10 (64-bit) (1)  5 00   
> > -b-1644.0  0 / all
> >   Windows 10 (64-bit) (1)  5 11   
> > -b-1650.1  0 / all
> >   ^
> >   ^
> >   CPU 1 doesn't match reported hard-affinity of 0. Should this
> > command set
> >   hard-affinity of vCPU 1 to 1? Or should it be 0-1 for both
> > vCPUs instead?
> > 
I think this is fine. For improving how this is reported back to users,
I'd go for the solution nr 3 proposed by Juergen (below).

> >2. xl vcpu-pin 5 0 1
> >   libxl: error: libxl_sched.c:62:libxl__set_vcpuaffinity:
> > Domain 5:Setting vcpu affinity: Invalid argument
> >   This is expected but perhaps needs documenting somewhere?
> > 
Not against more clear error reporting. It would mean that libxl must
have a way to tell that pinning failed because pinning was not being
done to a "master CPU".

I guess it's doable, but perhaps it's not the top priority, assuming we
have (and we put in place, if we still don't) good documentation on how
pinning works in this operational mode.

That would make a good article/blog post, I think.

> >3. xl vcpu-pin 5 0 1-2
> >   Windows 10 (64-bit) (1)  5 02   
> > -b-1646.7  1-2 / all
> >   Windows 10 (64-bit) (1)  5 13   
> > -b-1651.6  1-2 / all
> >   ^
> >   ^^^
> >   Here is a CPU / affinity mismatch again, but the more
> > interesting fact
> >   is that setting 1-2 is allowed at all, I'd expect CPU would
> > never be set
> >   to 1 with such settings.
> > 
This is the situation I'm most concerned of. Mostly, because I think a
user might be surprised to see the command (1) not failing and (2)
having the effect that it has.

I think that, in this case, we should either fail, or adjust the
affinity to 2-3. If we do the latter, we should inform the user about
that. There's something similar in libxl already (related to soft and
hard affinity, where we set a mask, then we check what's been actually
setup by Xen and act accordingly).

Thoughts?

I'd go for a mix of 1 and 3, i.e., I'd do:

> 1. As today, documenting the output.
> Not very nice IMO, but the least effort.
> 
This, i.e., we definitely need more documentation and we need to make
sure it's visible enough.

> 2. Just print one line for each virtual cpu/core/socket, like:
> Windows 10 (64-bit) (1)5 0-1   0-1   -b-1646.7  0-1 /
> all
> This has the disadvantage of dropping the per-vcpu time in favor
> of
> per-vcore time, OTOH this is reflecting reality.
> 
> 3. Print the effective pinnings:
> Windows 10 (64-bit) (1)5 0 0 -b-1646.7  0   /
> all
> Windows 10 (64-bit) (1)5 1 1 -b-1646.7  1   /
> all
> Should be rather easy to do.
> 
And this: i.e., I'd always report the effective mapping.

I actually would go as far as changing the mapping we've been given and
store the effective one(s) in `cpu_hard_affinity`, etc, in Xen. Of
course, as said above, we'd need to inform the user that this has
happened.

Regards
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
---
<> (Raistlin Majere)



signature.asc
Description: This is a digitally signed message part


Re: [Xen-devel] [XEN PATCH v3 2/2] xen/arm: Configure early printk via Kconfig

2020-03-24 Thread Anthony PERARD
On Tue, Mar 17, 2020 at 10:51:34AM -0700, Stefano Stabellini wrote:
> On Tue, 17 Mar 2020, Julien Grall wrote:
> > I noticed below you added "depends on ARM_64" on the Xilinx SoC. In 
> > general, platform specific options are tied to either arm32 or arm64, 
> > even if the UART "driver" is arch agnostic.
> > 
> > You could technically boot Xen on Arm 32-bit on Armv8 HW provided they 
> > support 32-bit at the hypervisor level, but we never supported this 
> > case. So I am wondering whether we should add depends on each 
> > earlyprintk. Stefano, any opinions?
> 
> Well spotted.
> 
> Xilinx doesn't support 32-bit Xen on their boards, "support" as in test,
> run or validate. So it would not be a problem from Xilinx point of view
> to add a "depends on ARM_64".
> 
> I take that you are suggesting adding "depends on ARM_64/32" under the
> legacy platform earlyprintk options, from EARLY_PRINTK_BRCM to
> EARLY_PRINTK_ZYNQMP right? If so, I am fine with it, and it seems like a
> good idea.

I don't have useful information on which Xen bitness each platform can
boot or support, so I can't really add those "depends on". But that
could be done in a follow-up.

> The other new generic earlyprintk options, the ones that only depend on
> the uart driver, from EARLY_UART_CHOICE_8250 to EARLY_UART_CHOICE_SCIF,
> it feels more natural to leave them without a specific arch dependency.

That would mean adding drivers for both arm32 and arm64. For example,
debug-cadence.inc is only available in arm64/. So if someone selects
arm32 and the cadence early uart driver, there's going to be a compile
error. That's the only reason on why I've added "depends on" on all
EARLY_UART_CHOICE_*.

Thanks,

-- 
Anthony PERARD



Re: [Xen-devel] [PATCH v6 5/5] domain: use PGC_extra domheap page for shared_info

2020-03-24 Thread Durrant, Paul
> -Original Message-
[snip]
> > Currently shared_info is a shared xenheap page but shared xenheap pages
> > complicate future plans for live-update of Xen so it is desirable to,
> > where possible, not use them [1]. This patch therefore converts shared_info
> > into a PGC_extra domheap page. This does entail freeing shared_info during
> > domain_relinquish_resources() rather than domain_destroy() so care is
> > needed to avoid de-referencing a NULL shared_info pointer hence some
> > extra checks of 'is_dying' are needed.
> >
> > NOTE: For Arm, the call to free_shared_info() in arch_domain_destroy() is
> >left in place since it is idempotent and called in the error path for
> >arch_domain_create().
> 
> The approach looks good to me. I have one comment below.
> 

Thanks.

> [...]
> 
> > diff --git a/xen/common/domain.c b/xen/common/domain.c
> > index 4ef0d3b21e..4f3266454f 100644
> > --- a/xen/common/domain.c
> > +++ b/xen/common/domain.c
> > @@ -1651,24 +1651,44 @@ int continue_hypercall_on_cpu(
> >
> >   int alloc_shared_info(struct domain *d, unsigned int memflags)
> >   {
> > -if ( (d->shared_info.virt = alloc_xenheap_pages(0, memflags)) == NULL )
> > +struct page_info *pg;
> > +
> > +pg = alloc_domheap_page(d, MEMF_no_refcount | memflags);
> > +if ( !pg )
> >   return -ENOMEM;
> >
> > -d->shared_info.mfn = virt_to_mfn(d->shared_info.virt);
> > +if ( !get_page_and_type(pg, d, PGT_writable_page) )
> > +{
> > +/*
> > + * The domain should not be running at this point so there is
> > + * no way we should reach this error path.
> > + */
> > +ASSERT_UNREACHABLE();
> > +return -ENODATA;
> > +}
> > +
> > +d->shared_info.mfn = page_to_mfn(pg);
> > +d->shared_info.virt = __map_domain_page_global(pg);
> 
> __map_domain_page_global() is not guaranteed to succeed. For instance,
> on Arm32 this will be a call to vmap().
> 
> So you want to check the return.
> 

Ok, I'll add a check.

As Jan discovered, I messed up the version numbering so there is already a v7 
series where I dropped this patch (until I can group it with conversion of 
other shared xenheap pages).

  Paul

> Cheers,
> 
> --
> Julien Grall


Re: [Xen-devel] [PATCH v4 1/3] mm: keep PGC_extra pages on a separate list

2020-03-24 Thread Julien Grall




On 18/03/2020 17:32, Paul Durrant wrote:

This patch adds a new page_list_head into struct domain to hold PGC_extra
pages. This avoids them getting confused with 'normal' domheap pages where
the domain's page_list is walked.

A new dump loop is also added to dump_pageframe_info() to unconditionally
dump the 'extra page list'.

Signed-off-by: Paul Durrant 
Reviewed-by: Jan Beulich 


Acked-by: Julien Grall 


---
Cc: Andrew Cooper 
Cc: George Dunlap 
Cc: Ian Jackson 
Cc: Julien Grall 
Cc: Stefano Stabellini 
Cc: Wei Liu 
Cc: "Roger Pau Monné" 

v7:
  - Cosmetic changes

v6:
  - New in v6
---
  xen/arch/x86/domain.c|  9 +
  xen/common/domain.c  |  1 +
  xen/common/page_alloc.c  |  2 +-
  xen/include/asm-x86/mm.h |  6 ++
  xen/include/xen/mm.h |  5 ++---
  xen/include/xen/sched.h  | 13 +
  6 files changed, 28 insertions(+), 8 deletions(-)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index caf2ecad7e..683bc619aa 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -251,12 +251,21 @@ void dump_pageframe_info(struct domain *d)
  p2m_pod_dump_data(d);
  
  spin_lock(>page_alloc_lock);

+
  page_list_for_each ( page, >xenpage_list )
  {
  printk("XenPage %p: caf=%08lx, taf=%" PRtype_info "\n",
 _p(mfn_x(page_to_mfn(page))),
 page->count_info, page->u.inuse.type_info);
  }
+
+page_list_for_each ( page, >extra_page_list )
+{
+printk("ExtraPage %p: caf=%08lx, taf=%" PRtype_info "\n",
+   _p(mfn_x(page_to_mfn(page))),
+   page->count_info, page->u.inuse.type_info);
+}
+
  spin_unlock(>page_alloc_lock);
  }
  
diff --git a/xen/common/domain.c b/xen/common/domain.c

index b4eb476a9c..3dcd73f67c 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -403,6 +403,7 @@ struct domain *domain_create(domid_t domid,
  spin_lock_init_prof(d, page_alloc_lock);
  spin_lock_init(>hypercall_deadlock_mutex);
  INIT_PAGE_LIST_HEAD(>page_list);
+INIT_PAGE_LIST_HEAD(>extra_page_list);
  INIT_PAGE_LIST_HEAD(>xenpage_list);
  
  spin_lock_init(>node_affinity_lock);

diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index 76d37226df..10b7aeca48 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -2314,7 +2314,7 @@ int assign_pages(
  smp_wmb(); /* Domain pointer must be visible before updating refcnt. 
*/
  pg[i].count_info =
  (pg[i].count_info & PGC_extra) | PGC_allocated | 1;
-page_list_add_tail([i], >page_list);
+page_list_add_tail([i], page_to_list(d, [i]));
  }
  
   out:

diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h
index a06b2fb81f..1fa334b306 100644
--- a/xen/include/asm-x86/mm.h
+++ b/xen/include/asm-x86/mm.h
@@ -629,10 +629,8 @@ typedef struct mm_rwlock {
  const char*locker_function; /* func that took it */
  } mm_rwlock_t;
  
-#define arch_free_heap_page(d, pg)  \

-page_list_del2(pg, is_xen_heap_page(pg) ?   \
-   &(d)->xenpage_list : &(d)->page_list,\
-   &(d)->arch.relmem_list)
+#define arch_free_heap_page(d, pg) \
+page_list_del2(pg, page_to_list(d, pg), &(d)->arch.relmem_list)
  
  extern const char zero_page[];
  
diff --git a/xen/include/xen/mm.h b/xen/include/xen/mm.h

index d0d095d9c7..a163c201e2 100644
--- a/xen/include/xen/mm.h
+++ b/xen/include/xen/mm.h
@@ -583,9 +583,8 @@ static inline unsigned int get_order_from_pages(unsigned 
long nr_pages)
  void scrub_one_page(struct page_info *);
  
  #ifndef arch_free_heap_page

-#define arch_free_heap_page(d, pg)  \
-page_list_del(pg, is_xen_heap_page(pg) ?\
-  &(d)->xenpage_list : &(d)->page_list)
+#define arch_free_heap_page(d, pg) \
+page_list_del(pg, page_to_list(d, pg))
  #endif
  
  int xenmem_add_to_physmap_one(struct domain *d, unsigned int space,

diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index e6813288ab..4b78291d51 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -329,6 +329,7 @@ struct domain
  
  spinlock_t   page_alloc_lock; /* protects all the following fields  */

  struct page_list_head page_list;  /* linked list */
+struct page_list_head extra_page_list; /* linked list (size extra_pages) */
  struct page_list_head xenpage_list; /* linked list (size xenheap_pages) */
  
  /*

@@ -512,6 +513,18 @@ struct domain
  #endif
  };
  
+static inline struct page_list_head *page_to_list(

+struct domain *d, const struct page_info *pg)
+{
+if ( is_xen_heap_page(pg) )
+return >xenpage_list;
+
+if ( pg->count_info & PGC_extra )
+return >extra_page_list;
+
+return >page_list;
+}
+
  /* Return number of pages currently posessed by the domain */
  static inline unsigned int 

Re: [Xen-devel] [PATCH v4 3/3] mm: add 'is_special_page' inline function...

2020-03-24 Thread Julien Grall




On 18/03/2020 17:32, Paul Durrant wrote:

From: Paul Durrant 

... to cover xenheap and PGC_extra pages.

PGC_extra pages are intended to hold data structures that are associated
with a domain and may be mapped by that domain. They should not be treated
as 'normal' guest pages (i.e. RAM or page tables). Hence, in many cases
where code currently tests is_xen_heap_page() it should also check for
the PGC_extra bit in 'count_info'.

This patch therefore defines is_special_page() to cover both cases and
converts tests of is_xen_heap_page() (or open coded tests of PGC_xen_heap)
to is_special_page() where the page is assigned to a domain.

Signed-off-by: Paul Durrant 
Acked-by: Tamas K Lengyel 

Acked-by: Julien Grall 


---
Cc: Jan Beulich 
Cc: Andrew Cooper 
Cc: Wei Liu 
Cc: "Roger Pau Monné" 
Cc: George Dunlap 
Cc: Ian Jackson 
Cc: Julien Grall 
Cc: Konrad Rzeszutek Wilk 
Cc: Stefano Stabellini 
Cc: Tim Deegan 

v7:
  - Fixed some uses of is_xen_heap_mfn() that I'd missed
  - Updated commit comment to point out that only tests on assigned xenheap
pages are candidates for conversion

v6:
  - Convert open-coded checks of PGC_xen_heap to use is_special_page()
where appropriate

v4:
  - Use inline function instead of macro
  - Add missing conversions from is_xen_heap_page()

v3:
  - Delete obsolete comment.

v2:
  - New in v2
---
  xen/arch/x86/domctl.c   |  2 +-
  xen/arch/x86/mm.c   | 13 ++---
  xen/arch/x86/mm/altp2m.c|  2 +-
  xen/arch/x86/mm/mem_sharing.c   |  3 +--
  xen/arch/x86/mm/p2m-pod.c   | 12 +++-
  xen/arch/x86/mm/p2m.c   |  4 ++--
  xen/arch/x86/mm/shadow/common.c | 13 -
  xen/arch/x86/mm/shadow/multi.c  |  3 ++-
  xen/arch/x86/tboot.c|  4 ++--
  xen/include/xen/mm.h|  5 +
  10 files changed, 35 insertions(+), 26 deletions(-)

diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
index ed86762fa6..add70126b9 100644
--- a/xen/arch/x86/domctl.c
+++ b/xen/arch/x86/domctl.c
@@ -394,7 +394,7 @@ long arch_do_domctl(
  page = get_page_from_gfn(d, gfn, , P2M_ALLOC);
  
  if ( unlikely(!page) ||

- unlikely(is_xen_heap_page(page)) )
+ unlikely(is_special_page(page)) )
  {
  if ( unlikely(p2m_is_broken(t)) )
  type = XEN_DOMCTL_PFINFO_BROKEN;
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 62507ca651..2fac67ad57 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -1014,7 +1014,7 @@ get_page_from_l1e(
  unsigned long cacheattr = pte_flags_to_cacheattr(l1f);
  int err;
  
-if ( is_xen_heap_page(page) )

+if ( is_special_page(page) )
  {
  if ( write )
  put_page_type(page);
@@ -2447,7 +2447,7 @@ static int cleanup_page_mappings(struct page_info *page)
  {
  page->count_info &= ~PGC_cacheattr_mask;
  
-BUG_ON(is_xen_heap_page(page));

+BUG_ON(is_special_page(page));
  
  rc = update_xen_mappings(mfn, 0);

  }
@@ -2477,7 +2477,7 @@ static int cleanup_page_mappings(struct page_info *page)
  rc = rc2;
  }
  
-if ( likely(!is_xen_heap_page(page)) )

+if ( likely(!is_special_page(page)) )
  {
  ASSERT((page->u.inuse.type_info &
  (PGT_type_mask | PGT_count_mask)) == PGT_writable_page);
@@ -4216,8 +4216,7 @@ int steal_page(
  if ( !(owner = page_get_owner_and_reference(page)) )
  goto fail;
  
-if ( owner != d || is_xen_heap_page(page) ||

- (page->count_info & PGC_extra) )
+if ( owner != d || is_special_page(page) )
  goto fail_put;
  
  /*

@@ -4580,8 +4579,8 @@ int xenmem_add_to_physmap_one(
  prev_mfn = get_gfn(d, gfn_x(gpfn), );
  if ( mfn_valid(prev_mfn) )
  {
-if ( is_xen_heap_mfn(prev_mfn) )
-/* Xen heap frames are simply unhooked from this phys slot. */
+if ( is_special_page(mfn_to_page(prev_mfn)) )
+/* Special pages are simply unhooked from this phys slot. */
  rc = guest_physmap_remove_page(d, gpfn, prev_mfn, PAGE_ORDER_4K);
  else
  /* Normal domain memory is freed, to avoid leaking memory. */
diff --git a/xen/arch/x86/mm/altp2m.c b/xen/arch/x86/mm/altp2m.c
index 50768f2547..c091b03ea3 100644
--- a/xen/arch/x86/mm/altp2m.c
+++ b/xen/arch/x86/mm/altp2m.c
@@ -77,7 +77,7 @@ int altp2m_vcpu_enable_ve(struct vcpu *v, gfn_t gfn)
   * pageable() predicate for this, due to it having the same properties
   * that we want.
   */
-if ( !p2m_is_pageable(p2mt) || is_xen_heap_page(pg) )
+if ( !p2m_is_pageable(p2mt) || is_special_page(pg) )
  {
  rc = -EINVAL;
  goto err;
diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
index 3835bc928f..f49f27a3ef 100644
--- a/xen/arch/x86/mm/mem_sharing.c
+++ 

Re: [Xen-devel] [PATCH v6 5/5] domain: use PGC_extra domheap page for shared_info

2020-03-24 Thread Julien Grall

Hi Paul,

On 10/03/2020 17:49, p...@xen.org wrote:

From: Paul Durrant 

Currently shared_info is a shared xenheap page but shared xenheap pages
complicate future plans for live-update of Xen so it is desirable to,
where possible, not use them [1]. This patch therefore converts shared_info
into a PGC_extra domheap page. This does entail freeing shared_info during
domain_relinquish_resources() rather than domain_destroy() so care is
needed to avoid de-referencing a NULL shared_info pointer hence some
extra checks of 'is_dying' are needed.

NOTE: For Arm, the call to free_shared_info() in arch_domain_destroy() is
   left in place since it is idempotent and called in the error path for
   arch_domain_create().


The approach looks good to me. I have one comment below.

[...]


diff --git a/xen/common/domain.c b/xen/common/domain.c
index 4ef0d3b21e..4f3266454f 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -1651,24 +1651,44 @@ int continue_hypercall_on_cpu(
  
  int alloc_shared_info(struct domain *d, unsigned int memflags)

  {
-if ( (d->shared_info.virt = alloc_xenheap_pages(0, memflags)) == NULL )
+struct page_info *pg;
+
+pg = alloc_domheap_page(d, MEMF_no_refcount | memflags);
+if ( !pg )
  return -ENOMEM;
  
-d->shared_info.mfn = virt_to_mfn(d->shared_info.virt);

+if ( !get_page_and_type(pg, d, PGT_writable_page) )
+{
+/*
+ * The domain should not be running at this point so there is
+ * no way we should reach this error path.
+ */
+ASSERT_UNREACHABLE();
+return -ENODATA;
+}
+
+d->shared_info.mfn = page_to_mfn(pg);
+d->shared_info.virt = __map_domain_page_global(pg);


__map_domain_page_global() is not guaranteed to succeed. For instance, 
on Arm32 this will be a call to vmap().


So you want to check the return.

Cheers,

--
Julien Grall



Re: [Xen-devel] xl vcpu-pin peculiarities in core scheduling mode

2020-03-24 Thread Jürgen Groß

On 24.03.20 14:34, Sergey Dyasli wrote:

Hi Juergen,

I've notived there is no documentation about how vcpu-pin is supposed to work
with core scheduling enabled. I did some experiments and noticed the following
inconsistencies:

   1. xl vcpu-pin 5 0 0
  Windows 10 (64-bit) (1)  5 00   -b-1644.0  0 / all
  Windows 10 (64-bit) (1)  5 11   -b-1650.1  0 / all
  ^  ^
  CPU 1 doesn't match reported hard-affinity of 0. Should this command set
  hard-affinity of vCPU 1 to 1? Or should it be 0-1 for both vCPUs instead?


   2. xl vcpu-pin 5 0 1
  libxl: error: libxl_sched.c:62:libxl__set_vcpuaffinity: Domain 5:Setting 
vcpu affinity: Invalid argument
  This is expected but perhaps needs documenting somewhere?


   3. xl vcpu-pin 5 0 1-2
  Windows 10 (64-bit) (1)  5 02   -b-1646.7  1-2 / 
all
  Windows 10 (64-bit) (1)  5 13   -b-1651.6  1-2 / 
all
  ^  ^^^
  Here is a CPU / affinity mismatch again, but the more interesting fact
  is that setting 1-2 is allowed at all, I'd expect CPU would never be set
  to 1 with such settings.

Please let me know what you think about the above cases.


I think all of the effects can be explained by the way how pinning with
core scheduling is implemented. This does not mean that the information
presented to the user shouldn't be adapted.

Basically pinning of any vcpu will just affect the "master"-vcpu of a
virtual core (sibling 0). It will happily accept any setting as long as
any "master"-cpu of a core is in the resulting set of cpus.

All vcpus of a virtual core share the same pinnings.

I think this explains all of the above scenarios.

IMO there are the following possibilities for reporting those pinnings
to the user:

1. As today, documenting the output.
   Not very nice IMO, but the least effort.

2. Just print one line for each virtual cpu/core/socket, like:
   Windows 10 (64-bit) (1)5 0-1   0-1   -b-1646.7  0-1 / all
   This has the disadvantage of dropping the per-vcpu time in favor of
   per-vcore time, OTOH this is reflecting reality.

3. Print the effective pinnings:
   Windows 10 (64-bit) (1)5 0 0 -b-1646.7  0   / all
   Windows 10 (64-bit) (1)5 1 1 -b-1646.7  1   / all
   Should be rather easy to do.

Thoughts?


Juergen



[Xen-devel] xl vcpu-pin peculiarities in core scheduling mode

2020-03-24 Thread Sergey Dyasli
Hi Juergen,

I've notived there is no documentation about how vcpu-pin is supposed to work
with core scheduling enabled. I did some experiments and noticed the following
inconsistencies:

  1. xl vcpu-pin 5 0 0
 Windows 10 (64-bit) (1)  5 00   -b-1644.0  0 / all
 Windows 10 (64-bit) (1)  5 11   -b-1650.1  0 / all
 ^  ^
 CPU 1 doesn't match reported hard-affinity of 0. Should this command set
 hard-affinity of vCPU 1 to 1? Or should it be 0-1 for both vCPUs instead?


  2. xl vcpu-pin 5 0 1
 libxl: error: libxl_sched.c:62:libxl__set_vcpuaffinity: Domain 5:Setting 
vcpu affinity: Invalid argument
 This is expected but perhaps needs documenting somewhere?


  3. xl vcpu-pin 5 0 1-2
 Windows 10 (64-bit) (1)  5 02   -b-1646.7  1-2 / 
all
 Windows 10 (64-bit) (1)  5 13   -b-1651.6  1-2 / 
all
 ^  ^^^
 Here is a CPU / affinity mismatch again, but the more interesting fact
 is that setting 1-2 is allowed at all, I'd expect CPU would never be set
 to 1 with such settings.

Please let me know what you think about the above cases.

--
Thanks,
Sergey



Re: [Xen-devel] [PATCH OSSTEST] kernel-build: enable XEN_BALLOON_MEMORY_HOTPLUG

2020-03-24 Thread Jürgen Groß

On 24.03.20 13:00, Roger Pau Monné wrote:

Adding Juergen and Boris for feedback.

On Tue, Mar 24, 2020 at 11:57:48AM +, Ian Jackson wrote:

Ian Jackson writes ("Re: [PATCH OSSTEST] kernel-build: enable 
XEN_BALLOON_MEMORY_HOTPLUG"):

Roger Pau Monne writes ("[PATCH OSSTEST] kernel-build: enable 
XEN_BALLOON_MEMORY_HOTPLUG"):

This allows a PVH/HVM domain to use unpopulated memory ranges to map
foreign memory or grants, and is required for a PVH dom0 to function
properly.

Signed-off-by: Roger Pau Monné 


Acked-by: Ian Jackson 

I will push this to pretest immediately.


Now done.  Would you consider whether the default should be changed
in Linux and prepare a patch to do so if appropriate ?


DYK if there's any reason why this is not on by default?


AFAIK the default has been off since introduction in 2011.

I don't know the reason for that.


Juergen



Re: [Xen-devel] [PATCH v5 00/10] x86emul: further work

2020-03-24 Thread Jan Beulich
Paul,On 24.03.2020 13:26, Jan Beulich wrote:
> Some of the later patches are still at least partly RFC, for
> varying reasons (see there). I'd appreciate though if at least
> some of the earlier ones could go in rather sooner than later.
> 
> Patch 1 functionally (for the test harness) depends on
> "libx86/CPUID: fix (not just) leaf 7 processing", while at
> least patch 2 contextually depends on "x86emul: disable
> FPU/MMX/SIMD insn emulation when !HVM".
> 
>  1: x86emul: support AVX512_BF16 insns

I should note that I also have a VP2INTERSECT patch ready, but the
just released SDE segfaults when trying to test it. I'll be holding
this back for some more time, I guess.

>  2: x86emul: support MOVDIRI insn
>  3: x86: determine HAVE_AS_* just once
>  4: x86: move back clang no integrated assembler tests
>  5: x86emul: support MOVDIR64B insn
>  6: x86emul: support ENQCMD insn
>  7: x86/HVM: scale MPERF values reported to guests (on AMD)
>  8: x86emul: support RDPRU
>  9: x86/HVM: don't needlessly intercept APERF/MPERF/TSC MSR reads
> 10: x86emul: support MCOMMIT

Paul, I should also note that I mistakenly Cc-ed your old Citrix
address. I'd like to avoid re-posting the series - do you perhaps
nevertheless get the xen-devel copies?

Jan



[Xen-devel] [PATCH v5 10/10] x86emul: support MCOMMIT

2020-03-24 Thread Jan Beulich
The dependency on a new EFER bit implies that we need to set that bit
ourselves in order to be able to successfully invoke the insn.

Also once again introduce the SVM related constants at this occasion.

Signed-off-by: Jan Beulich 
---
RFC: The exact meaning of the PM stating "any errors encountered by
 those stores have been signaled to associated error logging
 resources" is unclear. Depending on what this entails, blindly
 enabling EFER.MCOMMIT may not be a good idea. Hence the RFC.
---
v5: Re-base.
v4: New.

--- a/tools/libxl/libxl_cpuid.c
+++ b/tools/libxl/libxl_cpuid.c
@@ -261,6 +261,7 @@ int libxl_cpuid_parse_config(libxl_cpuid
 {"clzero",   0x8008, NA, CPUID_REG_EBX,  0,  1},
 {"rstr-fp-err-ptrs", 0x8008, NA, CPUID_REG_EBX, 2, 1},
 {"rdpru",0x8008, NA, CPUID_REG_EBX,  4,  1},
+{"mcommit",  0x8008, NA, CPUID_REG_EBX,  8,  1},
 {"wbnoinvd", 0x8008, NA, CPUID_REG_EBX,  9,  1},
 {"ibpb", 0x8008, NA, CPUID_REG_EBX, 12,  1},
 {"ppin", 0x8008, NA, CPUID_REG_EBX, 23,  1},
--- a/tools/misc/xen-cpuid.c
+++ b/tools/misc/xen-cpuid.c
@@ -149,7 +149,7 @@ static const char *const str_e8b[32] =
 
 [ 4] = "rdpru",
 
-/* [ 8] */[ 9] = "wbnoinvd",
+[ 8] = "mcommit",  [ 9] = "wbnoinvd",
 
 [12] = "ibpb",
 
--- a/xen/arch/x86/cpu/amd.c
+++ b/xen/arch/x86/cpu/amd.c
@@ -798,6 +798,9 @@ static void init_amd(struct cpuinfo_x86
wrmsr(MSR_K7_HWCR, l, h);
}
 
+   if (cpu_has(c, X86_FEATURE_MCOMMIT))
+   write_efer(read_efer() | EFER_MCOMMIT);
+
/* Prevent TSC drift in non single-processor, single-core platforms. */
if ((smp_processor_id() == 1) && !cpu_has(c, X86_FEATURE_ITSC))
disable_c1_ramping();
--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -1877,6 +1877,7 @@ in_protmode(
 #define vcpu_has_fma4()(ctxt->cpuid->extd.fma4)
 #define vcpu_has_tbm() (ctxt->cpuid->extd.tbm)
 #define vcpu_has_clzero()  (ctxt->cpuid->extd.clzero)
+#define vcpu_has_mcommit() (ctxt->cpuid->extd.mcommit)
 #define vcpu_has_rdpru()   (ctxt->cpuid->extd.rdpru)
 #define vcpu_has_wbnoinvd()(ctxt->cpuid->extd.wbnoinvd)
 
@@ -5676,6 +5677,28 @@ x86_emulate(
 _regs.r(cx) = (uint32_t)msr_val;
 goto rdtsc;
 
+case 0xfa: /* monitorx / mcommit */
+if ( vex.pfx == vex_f3 )
+{
+bool cf;
+
+host_and_vcpu_must_have(mcommit);
+if ( !ops->read_msr ||
+ ops->read_msr(MSR_EFER, _val, ctxt) != X86EMUL_OKAY )
+msr_val = 0;
+generate_exception_if(!(msr_val & EFER_MCOMMIT), EXC_UD);
+memcpy(get_stub(stub),
+   ((uint8_t[]){ 0xf3, 0x0f, 0x01, 0xfa, 0xc3 }), 5);
+_regs.eflags &= ~EFLAGS_MASK;
+invoke_stub("", ASM_FLAG_OUT(, "setc %[cf]"),
+[cf] ASM_FLAG_OUT("=@ccc", "=qm") (cf) : "i" (0));
+if ( cf )
+_regs.eflags |= X86_EFLAGS_CF;
+put_stub(stub);
+goto done;
+}
+goto unrecognized_insn;
+
 case 0xfc: /* clzero */
 {
 unsigned long zero = 0;
--- a/xen/include/asm-x86/cpufeature.h
+++ b/xen/include/asm-x86/cpufeature.h
@@ -131,6 +131,9 @@
 #define cpu_has_avx512_4fmaps   boot_cpu_has(X86_FEATURE_AVX512_4FMAPS)
 #define cpu_has_tsx_force_abort boot_cpu_has(X86_FEATURE_TSX_FORCE_ABORT)
 
+/* CPUID level 0x8008.ebx */
+#define cpu_has_mcommit boot_cpu_has(X86_FEATURE_MCOMMIT)
+
 /* CPUID level 0x0007:1.eax */
 #define cpu_has_avx512_bf16 boot_cpu_has(X86_FEATURE_AVX512_BF16)
 
--- a/xen/include/asm-x86/hvm/svm/vmcb.h
+++ b/xen/include/asm-x86/hvm/svm/vmcb.h
@@ -78,6 +78,11 @@ enum GenericIntercept2bits
 GENERAL2_INTERCEPT_RDPRU   = 1 << 14,
 };
 
+/* general 3 intercepts */
+enum GenericIntercept3bits
+{
+GENERAL3_INTERCEPT_MCOMMIT = 1 << 3,
+};
 
 /* control register intercepts */
 enum CRInterceptBits
@@ -300,6 +305,7 @@ enum VMEXIT_EXITCODE
 VMEXIT_MWAIT_CONDITIONAL= 140, /* 0x8c */
 VMEXIT_XSETBV   = 141, /* 0x8d */
 VMEXIT_RDPRU= 142, /* 0x8e */
+VMEXIT_MCOMMIT  = 163, /* 0xa3 */
 VMEXIT_NPF  = 1024, /* 0x400, nested paging fault */
 VMEXIT_INVALID  =  -1
 };
@@ -406,7 +412,8 @@ struct vmcb_struct {
 u32 _exception_intercepts;  /* offset 0x08 - cleanbit 0 */
 u32 _general1_intercepts;   /* offset 0x0C - cleanbit 0 */
 u32 _general2_intercepts;   /* offset 0x10 - cleanbit 0 */
-u32 res01[10];
+u32 _general3_intercepts;   /* offset 0x14 - cleanbit 0 */
+u32 res01[9];
 u16 _pause_filter_thresh;   /* offset 0x3C - cleanbit 0 */
 u16 _pause_filter_count;/* offset 

[Xen-devel] [PATCH v5 09/10] x86/HVM: don't needlessly intercept APERF/MPERF/TSC MSR reads

2020-03-24 Thread Jan Beulich
If the hardware can handle accesses, we should allow it to do so. This
way we can expose EFRO to HVM guests, and "all" that's left for exposing
APERF/MPERF is to figure out how to handle writes to these MSRs. (Note
that the leaf 6 guest CPUID checks will evaluate to false for now, as
recalculate_misc() zaps the entire leaf for now.)

For TSC the intercepts are made mirror the RDTSC ones.

Signed-off-by: Jan Beulich 
---
v4: Make TSC intercepts mirror RDTSC ones. Re-base.
v3: New.

--- a/xen/arch/x86/hvm/svm/svm.c
+++ b/xen/arch/x86/hvm/svm/svm.c
@@ -595,6 +595,7 @@ static void svm_cpuid_policy_changed(str
 struct vmcb_struct *vmcb = svm->vmcb;
 const struct cpuid_policy *cp = v->domain->arch.cpuid;
 u32 bitmap = vmcb_get_exception_intercepts(vmcb);
+unsigned int mode;
 
 if ( opt_hvm_fep ||
  (v->domain->arch.cpuid->x86_vendor != boot_cpu_data.x86_vendor) )
@@ -607,6 +608,17 @@ static void svm_cpuid_policy_changed(str
 /* Give access to MSR_PRED_CMD if the guest has been told about it. */
 svm_intercept_msr(v, MSR_PRED_CMD,
   cp->extd.ibpb ? MSR_INTERCEPT_NONE : MSR_INTERCEPT_RW);
+
+/* Allow direct reads from APERF/MPERF if permitted by the policy. */
+mode = cp->basic.raw[6].c & CPUID6_ECX_APERFMPERF_CAPABILITY
+   ? MSR_INTERCEPT_WRITE : MSR_INTERCEPT_RW;
+svm_intercept_msr(v, MSR_IA32_APERF, mode);
+svm_intercept_msr(v, MSR_IA32_MPERF, mode);
+
+/* Allow direct access to their r/o counterparts if permitted. */
+mode = cp->extd.efro ? MSR_INTERCEPT_NONE : MSR_INTERCEPT_RW;
+svm_intercept_msr(v, MSR_APERF_RD_ONLY, mode);
+svm_intercept_msr(v, MSR_MPERF_RD_ONLY, mode);
 }
 
 void svm_sync_vmcb(struct vcpu *v, enum vmcb_sync_state new_state)
@@ -860,7 +872,10 @@ static void svm_set_rdtsc_exiting(struct
 {
 general1_intercepts |= GENERAL1_INTERCEPT_RDTSC;
 general2_intercepts |= GENERAL2_INTERCEPT_RDTSCP;
+svm_enable_intercept_for_msr(v, MSR_IA32_TSC);
 }
+else
+svm_intercept_msr(v, MSR_IA32_TSC, MSR_INTERCEPT_WRITE);
 
 vmcb_set_general1_intercepts(vmcb, general1_intercepts);
 vmcb_set_general2_intercepts(vmcb, general2_intercepts);
--- a/xen/arch/x86/hvm/svm/vmcb.c
+++ b/xen/arch/x86/hvm/svm/vmcb.c
@@ -108,6 +108,7 @@ static int construct_vmcb(struct vcpu *v
 {
 vmcb->_general1_intercepts |= GENERAL1_INTERCEPT_RDTSC;
 vmcb->_general2_intercepts |= GENERAL2_INTERCEPT_RDTSCP;
+svm_intercept_msr(v, MSR_IA32_TSC, MSR_INTERCEPT_WRITE);
 }
 
 /* Guest segment limits. */
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -1141,8 +1141,13 @@ static int construct_vmcs(struct vcpu *v
 vmx_clear_msr_intercept(v, MSR_IA32_SYSENTER_CS, VMX_MSR_RW);
 vmx_clear_msr_intercept(v, MSR_IA32_SYSENTER_ESP, VMX_MSR_RW);
 vmx_clear_msr_intercept(v, MSR_IA32_SYSENTER_EIP, VMX_MSR_RW);
+
+if ( !(v->arch.hvm.vmx.exec_control & CPU_BASED_RDTSC_EXITING) )
+vmx_clear_msr_intercept(v, MSR_IA32_TSC, VMX_MSR_R);
+
 if ( paging_mode_hap(d) && (!is_iommu_enabled(d) || iommu_snoop) )
 vmx_clear_msr_intercept(v, MSR_IA32_CR_PAT, VMX_MSR_RW);
+
 if ( (vmexit_ctl & VM_EXIT_CLEAR_BNDCFGS) &&
  (vmentry_ctl & VM_ENTRY_LOAD_BNDCFGS) )
 vmx_clear_msr_intercept(v, MSR_IA32_BNDCFGS, VMX_MSR_RW);
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -585,6 +585,18 @@ static void vmx_cpuid_policy_changed(str
 vmx_clear_msr_intercept(v, MSR_FLUSH_CMD, VMX_MSR_RW);
 else
 vmx_set_msr_intercept(v, MSR_FLUSH_CMD, VMX_MSR_RW);
+
+/* Allow direct reads from APERF/MPERF if permitted by the policy. */
+if ( cp->basic.raw[6].c & CPUID6_ECX_APERFMPERF_CAPABILITY )
+{
+vmx_clear_msr_intercept(v, MSR_IA32_APERF, VMX_MSR_R);
+vmx_clear_msr_intercept(v, MSR_IA32_MPERF, VMX_MSR_R);
+}
+else
+{
+vmx_set_msr_intercept(v, MSR_IA32_APERF, VMX_MSR_R);
+vmx_set_msr_intercept(v, MSR_IA32_MPERF, VMX_MSR_R);
+}
 }
 
 int vmx_guest_x86_mode(struct vcpu *v)
@@ -1250,7 +1262,12 @@ static void vmx_set_rdtsc_exiting(struct
 vmx_vmcs_enter(v);
 v->arch.hvm.vmx.exec_control &= ~CPU_BASED_RDTSC_EXITING;
 if ( enable )
+{
 v->arch.hvm.vmx.exec_control |= CPU_BASED_RDTSC_EXITING;
+vmx_set_msr_intercept(v, MSR_IA32_TSC, VMX_MSR_R);
+}
+else
+vmx_clear_msr_intercept(v, MSR_IA32_TSC, VMX_MSR_R);
 vmx_update_cpu_exec_control(v);
 vmx_vmcs_exit(v);
 }
--- a/xen/include/public/arch-x86/cpufeatureset.h
+++ b/xen/include/public/arch-x86/cpufeatureset.h
@@ -243,7 +243,7 @@ XEN_CPUFEATURE(ENQCMD,6*32+29) /
 
 /* AMD-defined CPU features, CPUID level 0x8007.edx, word 7 */
 XEN_CPUFEATURE(ITSC,  7*32+ 8) /*   Invariant TSC */
-XEN_CPUFEATURE(EFRO,  7*32+10) /*   APERF/MPERF Read Only interface */
+XEN_CPUFEATURE(EFRO,   

[Xen-devel] [xen-unstable-smoke test] 148966: tolerable all pass - PUSHED

2020-03-24 Thread osstest service owner
flight 148966 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/148966/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt 13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  14 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  3ec1296ad3a823609eec479cb6c7ee493f6a888b
baseline version:
 xen  60d6ba1916dce0622a53b00dbae3c01d0761057e

Last test of basis   148813  2020-03-21 17:00:59 Z2 days
Testing same since   148966  2020-03-24 10:00:45 Z0 days1 attempts


People who touched revisions under test:
  Andrew Cooper 
  David Woodhouse 
  Hongyan Xia 
  Ian Jackson 
  Pu Wen 
  Yan Yankovskyi 

jobs:
 build-arm64-xsm  pass
 build-amd64  pass
 build-armhf  pass
 build-amd64-libvirt  pass
 test-armhf-armhf-xl  pass
 test-arm64-arm64-xl-xsm  pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/xen.git
   60d6ba1916..3ec1296ad3  3ec1296ad3a823609eec479cb6c7ee493f6a888b -> smoke



[Xen-devel] [PATCH v5 08/10] x86emul: support RDPRU

2020-03-24 Thread Jan Beulich
While the PM doesn't say so, this assumes that the MPERF value read this
way gets scaled similarly to its reading through RDMSR.

Also introduce the SVM related constants at this occasion.

Signed-off-by: Jan Beulich 
---
v5: The CPUID field used is just 8 bits wide.
v4: Add GENERAL2_INTERCEPT_RDPRU and VMEXIT_RDPRU enumerators. Fold
handling of out of bounds indexes into switch(). Avoid
recalculate_misc() clobbering what recalculate_cpu_policy() has
done. Re-base.
v3: New.
---
RFC: Andrew promised to take care of the CPUID side of this; re-base
 over his work once available.

--- a/tools/libxl/libxl_cpuid.c
+++ b/tools/libxl/libxl_cpuid.c
@@ -260,6 +260,7 @@ int libxl_cpuid_parse_config(libxl_cpuid
 
 {"clzero",   0x8008, NA, CPUID_REG_EBX,  0,  1},
 {"rstr-fp-err-ptrs", 0x8008, NA, CPUID_REG_EBX, 2, 1},
+{"rdpru",0x8008, NA, CPUID_REG_EBX,  4,  1},
 {"wbnoinvd", 0x8008, NA, CPUID_REG_EBX,  9,  1},
 {"ibpb", 0x8008, NA, CPUID_REG_EBX, 12,  1},
 {"ppin", 0x8008, NA, CPUID_REG_EBX, 23,  1},
--- a/tools/misc/xen-cpuid.c
+++ b/tools/misc/xen-cpuid.c
@@ -147,6 +147,8 @@ static const char *const str_e8b[32] =
 [ 0] = "clzero",
 [ 2] = "rstr-fp-err-ptrs",
 
+[ 4] = "rdpru",
+
 /* [ 8] */[ 9] = "wbnoinvd",
 
 [12] = "ibpb",
--- a/tools/tests/x86_emulator/test_x86_emulator.c
+++ b/tools/tests/x86_emulator/test_x86_emulator.c
@@ -683,6 +683,13 @@ static int read_msr(
 {
 switch ( reg )
 {
+case 0x00e8: /* APERF */
+case 0xc0e8: /* APERF_RD_ONLY */
+#define APERF_LO_VALUE 0xAEAEAEAE
+#define APERF_HI_VALUE 0xEAEAEAEA
+*val = ((uint64_t)APERF_HI_VALUE << 32) | APERF_LO_VALUE;
+return X86EMUL_OKAY;
+
 case 0xc080: /* EFER */
 *val = ctxt->addr_size > 32 ? 0x500 /* LME|LMA */ : 0;
 return X86EMUL_OKAY;
@@ -2249,6 +2256,30 @@ int main(int argc, char **argv)
 else
 printf("skipped\n");
 
+printf("%-40s", "Testing rdpru...");
+instr[0] = 0x0f; instr[1] = 0x01; instr[2] = 0xfd;
+regs.eip = (unsigned long)[0];
+regs.ecx = 1;
+regs.eflags = EFLAGS_ALWAYS_SET;
+rc = x86_emulate(, );
+if ( (rc != X86EMUL_OKAY) ||
+ (regs.eax != APERF_LO_VALUE) || (regs.edx != APERF_HI_VALUE) ||
+ !(regs.eflags & X86_EFLAGS_CF) ||
+ (regs.eip != (unsigned long)[3]) )
+goto fail;
+if ( ctxt.cpuid->extd.rdpru_max < 0x )
+{
+regs.eip = (unsigned long)[0];
+regs.ecx = ctxt.cpuid->extd.rdpru_max + 1;
+regs.eflags = EFLAGS_ALWAYS_SET | X86_EFLAGS_CF;
+rc = x86_emulate(, );
+if ( (rc != X86EMUL_OKAY) || regs.eax || regs.edx ||
+ (regs.eflags & X86_EFLAGS_CF) ||
+ (regs.eip != (unsigned long)[3]) )
+goto fail;
+}
+printf("okay\n");
+
 printf("%-40s", "Testing movq %mm3,(%ecx)...");
 if ( stack_exec && cpu_has_mmx )
 {
--- a/tools/tests/x86_emulator/x86-emulate.c
+++ b/tools/tests/x86_emulator/x86-emulate.c
@@ -78,6 +78,8 @@ bool emul_test_init(void)
 cp.feat.rdpid = true;
 cp.feat.movdiri = true;
 cp.extd.clzero = true;
+cp.extd.rdpru = true;
+cp.extd.rdpru_max = 1;
 
 if ( cpu_has_xsave )
 {
@@ -150,11 +152,11 @@ int emul_test_cpuid(
 }
 
 /*
- * The emulator doesn't itself use CLZERO, so we can always run the
+ * The emulator doesn't itself use CLZERO/RDPRU, so we can always run the
  * respective test(s).
  */
 if ( leaf == 0x8008 )
-res->b |= 1U << 0;
+res->b |= (1U << 0) | (1U << 4);
 
 return X86EMUL_OKAY;
 }
--- a/xen/arch/x86/cpuid.c
+++ b/xen/arch/x86/cpuid.c
@@ -243,8 +243,6 @@ static void recalculate_misc(struct cpui
 /* Most of Power/RAS hidden from guests. */
 p->extd.raw[0x7].a = p->extd.raw[0x7].b = p->extd.raw[0x7].c = 0;
 
-p->extd.raw[0x8].d = 0;
-
 switch ( p->x86_vendor )
 {
 case X86_VENDOR_INTEL:
@@ -263,6 +261,7 @@ static void recalculate_misc(struct cpui
 
 p->extd.raw[0x8].a &= 0x;
 p->extd.raw[0x8].c = 0;
+p->extd.raw[0x8].d = 0;
 break;
 
 case X86_VENDOR_AMD:
@@ -281,6 +280,7 @@ static void recalculate_misc(struct cpui
 
 p->extd.raw[0x8].a &= 0x; /* GuestMaxPhysAddr hidden. */
 p->extd.raw[0x8].c &= 0x0003f0ff;
+p->extd.raw[0x8].d &= 0x;
 
 p->extd.raw[0x9] = EMPTY_LEAF;
 
@@ -643,6 +643,11 @@ void recalculate_cpuid_policy(struct dom
 
 p->extd.maxlinaddr = p->extd.lm ? 48 : 32;
 
+if ( p->extd.rdpru )
+p->extd.rdpru_max = min(p->extd.rdpru_max, max->extd.rdpru_max);
+else
+p->extd.rdpru_max = 0;
+
 recalculate_xstate(p);
 recalculate_misc(p);
 
--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -1877,6 +1877,7 @@ in_protmode(
 #define vcpu_has_fma4()

[Xen-devel] [PATCH v5 07/10] x86/HVM: scale MPERF values reported to guests (on AMD)

2020-03-24 Thread Jan Beulich
AMD's PM specifies that MPERF (and its r/o counterpart) reads are
affected by the TSC ratio. Hence when processing such reads in software
we too should scale the values. While we don't currently (yet) expose
the underlying feature flags, besides us allowing the MSRs to be read
nevertheless, RDPRU is going to expose the values even to user space.

Furthermore, due to the not exposed feature flags, this change has the
effect of making properly inaccessible (for reads) the two MSRs.

Note that writes to MPERF (and APERF) continue to be unsupported.

Signed-off-by: Jan Beulich 
---
v3: New.
---
I did consider whether to put the code in guest_rdmsr() instead, but
decided that it's better to have it next to TSC handling.

--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -3456,6 +3456,22 @@ int hvm_msr_read_intercept(unsigned int
 *msr_content = v->arch.hvm.msr_tsc_adjust;
 break;
 
+case MSR_MPERF_RD_ONLY:
+if ( !d->arch.cpuid->extd.efro )
+{
+goto gp_fault;
+
+case MSR_IA32_MPERF:
+if ( !(d->arch.cpuid->basic.raw[6].c &
+   CPUID6_ECX_APERFMPERF_CAPABILITY) )
+goto gp_fault;
+}
+if ( rdmsr_safe(msr, *msr_content) )
+goto gp_fault;
+if ( d->arch.cpuid->x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON) )
+*msr_content = hvm_get_guest_tsc_fixed(v, *msr_content);
+break;
+
 case MSR_APIC_BASE:
 *msr_content = vcpu_vlapic(v)->hw.apic_base_msr;
 break;
--- a/xen/include/asm-x86/msr-index.h
+++ b/xen/include/asm-x86/msr-index.h
@@ -397,6 +397,9 @@
 #define MSR_IA32_MPERF 0x00e7
 #define MSR_IA32_APERF 0x00e8
 
+#define MSR_MPERF_RD_ONLY  0xc0e7
+#define MSR_APERF_RD_ONLY  0xc0e8
+
 #define MSR_IA32_THERM_CONTROL 0x019a
 #define MSR_IA32_THERM_INTERRUPT   0x019b
 #define MSR_IA32_THERM_STATUS  0x019c




[Xen-devel] [PATCH v5 06/10] x86emul: support ENQCMD insn

2020-03-24 Thread Jan Beulich
Introduce a new blk() hook, paralleling the rmw() on in certain way, but
being intended for larger data sizes, and hence its HVM intermediate
handling function doesn't fall back to splitting the operation if the
requested virtual address can't be mapped.

Note that the ISA extensions document revision 037 doesn't specify
exception behavior for ModRM.mod == 0b11; assuming #UD here.

No tests are being added to the harness - this would be quite hard,
we can't just issue the insns against RAM. Their similarity with
MOVDIR64B should have the test case there be god enough to cover any
fundamental flaws.

Signed-off-by: Jan Beulich 
---
TBD: This doesn't (can't) consult PASID translation tables yet, as we
 have no VMX code for this so far. I guess for this we will want to
 replace the direct ->read_msr(MSR_IA32_PASID, ...) with a new
 ->read_pasid() hook.
---
v5: New.

--- a/xen/arch/x86/Makefile
+++ b/xen/arch/x86/Makefile
@@ -250,13 +250,14 @@ $(BASEDIR)/include/asm-x86/asm-macros.h:
 # sure we pick up changes when the compiler used has changed.)
 ifeq ($(MAKECMDGOALS),asm-offsets.s)
 
-as-ISA-list := CLWB EPT FSGSBASE INVPCID MOVDIR64B RDRAND RDSEED SSE4_2 VMX 
XSAVEOPT
+as-ISA-list := CLWB ENQCMD EPT FSGSBASE INVPCID MOVDIR64B RDRAND RDSEED SSE4_2 
VMX XSAVEOPT
 
 CLWB-insn  := clwb (%rax)
 EPT-insn   := invept (%rax),%rax
 FSGSBASE-insn  := rdfsbase %rax
 INVPCID-insn   := invpcid (%rax),%rax
 MOVDIR64B-insn := movdir64b (%rax),%rax
+ENQCMD-insn:= enqcmd (%rax),%rax
 RDRAND-insn:= rdrand %eax
 RDSEED-insn:= rdseed %eax
 SSE4_2-insn:= crc32 %eax,%eax
--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -854,6 +854,7 @@ struct x86_emulate_state {
 rmw_xor,
 } rmw;
 enum {
+blk_enqcmd,
 blk_movdir,
 } blk;
 uint8_t modrm, modrm_mod, modrm_reg, modrm_rm;
@@ -900,6 +901,7 @@ typedef union {
 uint64_t __attribute__ ((aligned(16))) xmm[2];
 uint64_t __attribute__ ((aligned(32))) ymm[4];
 uint64_t __attribute__ ((aligned(64))) zmm[8];
+uint32_t data32[16];
 } mmval_t;
 
 /*
@@ -1909,6 +1911,7 @@ in_protmode(
 #define vcpu_has_rdpid()   (ctxt->cpuid->feat.rdpid)
 #define vcpu_has_movdiri() (ctxt->cpuid->feat.movdiri)
 #define vcpu_has_movdir64b()   (ctxt->cpuid->feat.movdir64b)
+#define vcpu_has_enqcmd()  (ctxt->cpuid->feat.enqcmd)
 #define vcpu_has_avx512_4vnniw() (ctxt->cpuid->feat.avx512_4vnniw)
 #define vcpu_has_avx512_4fmaps() (ctxt->cpuid->feat.avx512_4fmaps)
 #define vcpu_has_avx512_bf16() (ctxt->cpuid->feat.avx512_bf16)
@@ -10101,6 +10104,36 @@ x86_emulate(
 state->simd_size = simd_none;
 break;
 
+case X86EMUL_OPC_F2(0x0f38, 0xf8): /* enqcmd r,m512 */
+case X86EMUL_OPC_F3(0x0f38, 0xf8): /* enqcmds r,m512 */
+host_and_vcpu_must_have(enqcmd);
+generate_exception_if(ea.type != OP_MEM, EXC_UD);
+generate_exception_if(vex.pfx != vex_f2 && !mode_ring0(), EXC_GP, 0);
+src.val = truncate_ea(*dst.reg);
+generate_exception_if(!is_aligned(x86_seg_es, src.val, 64, ctxt, ops),
+  EXC_GP, 0);
+fail_if(!ops->blk);
+BUILD_BUG_ON(sizeof(*mmvalp) < 64);
+if ( (rc = ops->read(ea.mem.seg, ea.mem.off, mmvalp, 64,
+ ctxt)) != X86EMUL_OKAY )
+goto done;
+if ( vex.pfx == vex_f2 ) /* enqcmd */
+{
+fail_if(!ops->read_msr);
+if ( (rc = ops->read_msr(MSR_IA32_PASID,
+ _val, ctxt)) != X86EMUL_OKAY )
+goto done;
+generate_exception_if(!(msr_val & PASID_VALID), EXC_GP, 0);
+mmvalp->data32[0] = MASK_EXTR(msr_val, PASID_PASID_MASK);
+}
+mmvalp->data32[0] &= ~0x7ff0;
+state->blk = blk_enqcmd;
+if ( (rc = ops->blk(x86_seg_es, src.val, mmvalp, 64, &_regs.eflags,
+state, ctxt)) != X86EMUL_OKAY )
+goto done;
+state->simd_size = simd_none;
+break;
+
 case X86EMUL_OPC(0x0f38, 0xf9): /* movdiri mem,r */
 vcpu_must_have(movdiri);
 generate_exception_if(dst.type != OP_MEM, EXC_UD);
@@ -11378,11 +11411,36 @@ int x86_emul_blk(
 {
 switch ( state->blk )
 {
+bool zf;
+
 /*
  * Throughout this switch(), memory clobbers are used to compensate
  * that other operands may not properly express the (full) memory
  * ranges covered.
  */
+case blk_enqcmd:
+ASSERT(bytes == 64);
+if ( ((unsigned long)ptr & 0x3f) )
+{
+ASSERT_UNREACHABLE();
+return X86EMUL_UNHANDLEABLE;
+}
+*eflags &= ~EFLAGS_MASK;
+#ifdef HAVE_AS_ENQCMD
+asm ( "enqcmds (%[src]), %[dst]" ASM_FLAG_OUT(, "; setz %0")
+  : [zf] ASM_FLAG_OUT("=@ccz", "=qm") (zf)
+  : [src] "r" (data), [dst] "r" (ptr) : "memory" );
+#else
+   

[Xen-devel] [PATCH v5 05/10] x86emul: support MOVDIR64B insn

2020-03-24 Thread Jan Beulich
Introduce a new blk() hook, paralleling the rmw() on in certain way, but
being intended for larger data sizes, and hence its HVM intermediate
handling function doesn't fall back to splitting the operation if the
requested virtual address can't be mapped.

Note that SDM revision 071 doesn't specify exception behavior for
ModRM.mod == 0b11; assuming #UD here.

Signed-off-by: Jan Beulich 
---
TBD: If we want to avoid depending on correct MTRR settings,
 hvmemul_map_linear_addr() may need to gain a parameter to allow
 controlling cachability of the produced mapping(s). Of course the
 function will also need to be made capable of mapping at least
 p2m_mmio_direct pages for this and the two ENQCMD insns to be
 actually useful.
---
v5: Introduce/use ->blk() hook. Correct asm() operands.
v4: Split MOVDIRI and MOVDIR64B. Switch to using ->rmw(). Re-base.
v3: Update description.
---
(SDE: -tnt)

--- a/tools/tests/x86_emulator/test_x86_emulator.c
+++ b/tools/tests/x86_emulator/test_x86_emulator.c
@@ -652,6 +652,18 @@ static int cmpxchg(
 return X86EMUL_OKAY;
 }
 
+static int blk(
+enum x86_segment seg,
+unsigned long offset,
+void *p_data,
+unsigned int bytes,
+uint32_t *eflags,
+struct x86_emulate_state *state,
+struct x86_emulate_ctxt *ctxt)
+{
+return x86_emul_blk((void *)offset, p_data, bytes, eflags, state, ctxt);
+}
+
 static int read_segment(
 enum x86_segment seg,
 struct segment_register *reg,
@@ -2208,6 +2220,35 @@ int main(int argc, char **argv)
 goto fail;
 printf("okay\n");
 
+printf("%-40s", "Testing movdir64b 144(%edx),%ecx...");
+if ( stack_exec && cpu_has_movdir64b )
+{
+emulops.blk = blk;
+
+instr[0] = 0x66; instr[1] = 0x0f; instr[2] = 0x38; instr[3] = 0xf8;
+instr[4] = 0x8a; instr[5] = 0x90; instr[8] = instr[7] = instr[6] = 0;
+
+regs.eip = (unsigned long)[0];
+for ( i = 0; i < 64; ++i )
+res[i] = i - 20;
+regs.edx = (unsigned long)res;
+regs.ecx = (unsigned long)(res + 16);
+
+rc = x86_emulate(, );
+if ( (rc != X86EMUL_OKAY) ||
+ (regs.eip != (unsigned long)[9]) ||
+ res[15] != -5 || res[32] != 12 )
+goto fail;
+for ( i = 16; i < 32; ++i )
+if ( res[i] != i )
+goto fail;
+printf("okay\n");
+
+emulops.blk = NULL;
+}
+else
+printf("skipped\n");
+
 printf("%-40s", "Testing movq %mm3,(%ecx)...");
 if ( stack_exec && cpu_has_mmx )
 {
--- a/tools/tests/x86_emulator/x86-emulate.h
+++ b/tools/tests/x86_emulator/x86-emulate.h
@@ -154,6 +154,7 @@ static inline bool xcr0_mask(uint64_t ma
 #define cpu_has_avx512_vnni (cp.feat.avx512_vnni && xcr0_mask(0xe6))
 #define cpu_has_avx512_bitalg (cp.feat.avx512_bitalg && xcr0_mask(0xe6))
 #define cpu_has_avx512_vpopcntdq (cp.feat.avx512_vpopcntdq && xcr0_mask(0xe6))
+#define cpu_has_movdir64b  cp.feat.movdir64b
 #define cpu_has_avx512_4vnniw (cp.feat.avx512_4vnniw && xcr0_mask(0xe6))
 #define cpu_has_avx512_4fmaps (cp.feat.avx512_4fmaps && xcr0_mask(0xe6))
 #define cpu_has_avx512_bf16 (cp.feat.avx512_bf16 && xcr0_mask(0xe6))
--- a/xen/arch/x86/Makefile
+++ b/xen/arch/x86/Makefile
@@ -250,12 +250,13 @@ $(BASEDIR)/include/asm-x86/asm-macros.h:
 # sure we pick up changes when the compiler used has changed.)
 ifeq ($(MAKECMDGOALS),asm-offsets.s)
 
-as-ISA-list := CLWB EPT FSGSBASE INVPCID RDRAND RDSEED SSE4_2 VMX XSAVEOPT
+as-ISA-list := CLWB EPT FSGSBASE INVPCID MOVDIR64B RDRAND RDSEED SSE4_2 VMX 
XSAVEOPT
 
 CLWB-insn  := clwb (%rax)
 EPT-insn   := invept (%rax),%rax
 FSGSBASE-insn  := rdfsbase %rax
 INVPCID-insn   := invpcid (%rax),%rax
+MOVDIR64B-insn := movdir64b (%rax),%rax
 RDRAND-insn:= rdrand %eax
 RDSEED-insn:= rdseed %eax
 SSE4_2-insn:= crc32 %eax,%eax
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -1409,6 +1409,44 @@ static int hvmemul_rmw(
 return rc;
 }
 
+static int hvmemul_blk(
+enum x86_segment seg,
+unsigned long offset,
+void *p_data,
+unsigned int bytes,
+uint32_t *eflags,
+struct x86_emulate_state *state,
+struct x86_emulate_ctxt *ctxt)
+{
+struct hvm_emulate_ctxt *hvmemul_ctxt =
+container_of(ctxt, struct hvm_emulate_ctxt, ctxt);
+unsigned long addr;
+uint32_t pfec = PFEC_page_present | PFEC_write_access;
+int rc;
+void *mapping = NULL;
+
+rc = hvmemul_virtual_to_linear(
+seg, offset, bytes, NULL, hvm_access_write, hvmemul_ctxt, );
+if ( rc != X86EMUL_OKAY || !bytes )
+return rc;
+
+if ( is_x86_system_segment(seg) )
+pfec |= PFEC_implicit;
+else if ( hvmemul_ctxt->seg_reg[x86_seg_ss].dpl == 3 )
+pfec |= PFEC_user_mode;
+
+mapping = hvmemul_map_linear_addr(addr, bytes, pfec, hvmemul_ctxt);
+if ( IS_ERR(mapping) )
+return ~PTR_ERR(mapping);
+if ( !mapping )
+return X86EMUL_UNHANDLEABLE;
+
+

[Xen-devel] [PATCH v5 04/10] x86: move back clang no integrated assembler tests

2020-03-24 Thread Jan Beulich
This largely reverts f19af2f1138e ("x86: re-order clang no integrated
assembler tests"): Other CFLAGS setup would better happen first, in case
any of it affects the behavior of the integrated assembler. The comment
addition of course doesn't get undone. The only remaining as-option-add
invocation gets moved down in addition.

Signed-off-by: Jan Beulich 
Reviewed-by: Roger Pau Monné 
---
v5: Re-base.
v4: New.

--- a/xen/arch/x86/Rules.mk
+++ b/xen/arch/x86/Rules.mk
@@ -12,35 +12,8 @@ CFLAGS += '-D__OBJECT_LABEL__=$(subst /,
 # Prevent floating-point variables from creeping into Xen.
 CFLAGS += -msoft-float
 
-ifeq ($(CONFIG_CC_IS_CLANG),y)
-# Note: Any test which adds -no-integrated-as will cause subsequent tests to
-# succeed, and not trigger further additions.
-#
-# The tests to select whether the integrated assembler is usable need to happen
-# before testing any assembler features, or else the result of the tests would
-# be stale if the integrated assembler is not used.
-
-# Older clang's built-in assembler doesn't understand .skip with labels:
-# https://bugs.llvm.org/show_bug.cgi?id=27369
-$(call as-option-add,CFLAGS,CC,".L0: .L1: .skip (.L1 - .L0)",,\
- -no-integrated-as)
-
-# Check whether clang asm()-s support .include.
-$(call as-option-add,CFLAGS,CC,".include \"asm-x86/indirect_thunk_asm.h\"",,\
- -no-integrated-as)
-
-# Check whether clang keeps .macro-s between asm()-s:
-# https://bugs.llvm.org/show_bug.cgi?id=36110
-$(call as-option-add,CFLAGS,CC,\
- ".macro FOO;.endm"$$(close); asm volatile $$(open)".macro 
FOO;.endm",\
- -no-integrated-as)
-endif
-
 $(call cc-options-add,CFLAGS,CC,$(EMBEDDED_EXTRA_CFLAGS))
 $(call cc-option-add,CFLAGS,CC,-Wnested-externs)
-$(call as-option-add,CFLAGS,CC,".equ \"x\"$$(comma)1", \
- -U__OBJECT_LABEL__ -DHAVE_AS_QUOTED_SYM \
- '-D__OBJECT_LABEL__=$(subst $(BASEDIR)/,,$(CURDIR))/$$@')
 
 CFLAGS += -mno-red-zone -fpic -fno-asynchronous-unwind-tables
 
@@ -70,3 +43,30 @@ endif
 # Set up the assembler include path properly for older toolchains.
 CFLAGS += -Wa,-I$(BASEDIR)/include
 
+ifeq ($(CONFIG_CC_IS_CLANG),y)
+# Note: Any test which adds -no-integrated-as will cause subsequent tests to
+# succeed, and not trigger further additions.
+#
+# The tests to select whether the integrated assembler is usable need to happen
+# before testing any assembler features, or else the result of the tests would
+# be stale if the integrated assembler is not used.
+
+# Older clang's built-in assembler doesn't understand .skip with labels:
+# https://bugs.llvm.org/show_bug.cgi?id=27369
+$(call as-option-add,CFLAGS,CC,".L0: .L1: .skip (.L1 - .L0)",,\
+ -no-integrated-as)
+
+# Check whether clang asm()-s support .include.
+$(call as-option-add,CFLAGS,CC,".include \"asm-x86/indirect_thunk_asm.h\"",,\
+ -no-integrated-as)
+
+# Check whether clang keeps .macro-s between asm()-s:
+# https://bugs.llvm.org/show_bug.cgi?id=36110
+$(call as-option-add,CFLAGS,CC,\
+ ".macro FOO;.endm"$$(close); asm volatile $$(open)".macro 
FOO;.endm",\
+ -no-integrated-as)
+endif
+
+$(call as-option-add,CFLAGS,CC,".equ \"x\"$$(comma)1", \
+ -U__OBJECT_LABEL__ -DHAVE_AS_QUOTED_SYM \
+ '-D__OBJECT_LABEL__=$(subst $(BASEDIR)/,,$(CURDIR))/$$@')




[Xen-devel] [PATCH v5 03/10] x86: determine HAVE_AS_* just once

2020-03-24 Thread Jan Beulich
With the exception of HAVE_AS_QUOTED_SYM, populate the results into a
generated header instead of (at least once per [sub]directory) into
CFLAGS. This results in proper rebuilds (via make dependencies) in case
the compiler used changes between builds. It additionally eases
inspection of which assembler features were actually found usable.

Some trickery is needed to avoid header generation itself to try to
include the to-be/not-yet-generated header.

Since the definitions in generated/config.h, previously having been
command line options, might even affect xen/config.h or its descendants,
move adding of the -include option for the latter after inclusion of the
per-arch Rules.mk. Use the occasion to also move the most general -I
option to the common Rules.mk.

Signed-off-by: Jan Beulich 
---
v5: Re-base.
v4: New.
---
An alternative to the $(MAKECMDGOALS) trickery would be to make
generation of generated/config.h part of the asm-offsets.s rule, instead
of adding it as a dependency there. Not sure whether either is truly
better than the other.

--- a/xen/Rules.mk
+++ b/xen/Rules.mk
@@ -55,7 +55,7 @@ endif
 CFLAGS += -nostdinc -fno-builtin -fno-common
 CFLAGS += -Werror -Wredundant-decls -Wno-pointer-arith
 $(call cc-option-add,CFLAGS,CC,-Wvla)
-CFLAGS += -pipe -D__XEN__ -include $(BASEDIR)/include/xen/config.h
+CFLAGS += -pipe -D__XEN__ -I$(BASEDIR)/include
 CFLAGS-$(CONFIG_DEBUG_INFO) += -g
 CFLAGS += '-D__OBJECT_FILE__="$@"'
 
@@ -95,6 +95,9 @@ SPECIAL_DATA_SECTIONS := rodata $(foreac
 
 include $(BASEDIR)/arch/$(TARGET_ARCH)/Rules.mk
 
+# Allow the arch to use -include ahead of this one.
+CFLAGS += -include xen/config.h
+
 include Makefile
 
 define gendep
--- a/xen/arch/arm/Rules.mk
+++ b/xen/arch/arm/Rules.mk
@@ -6,8 +6,6 @@
 # 'make clean' before rebuilding.
 #
 
-CFLAGS += -I$(BASEDIR)/include
-
 $(call cc-options-add,CFLAGS,CC,$(EMBEDDED_EXTRA_CFLAGS))
 $(call cc-option-add,CFLAGS,CC,-Wnested-externs)
 
--- a/xen/arch/x86/Makefile
+++ b/xen/arch/x86/Makefile
@@ -225,7 +225,8 @@ endif
 efi/boot.init.o efi/runtime.o efi/compat.o efi/buildid.o efi/relocs-dummy.o: 
$(BASEDIR)/arch/x86/efi/built_in.o
 efi/boot.init.o efi/runtime.o efi/compat.o efi/buildid.o efi/relocs-dummy.o: ;
 
-asm-offsets.s: $(TARGET_SUBARCH)/asm-offsets.c 
$(BASEDIR)/include/asm-x86/asm-macros.h
+asm-offsets.s: $(TARGET_SUBARCH)/asm-offsets.c 
$(BASEDIR)/include/asm-x86/asm-macros.h \
+   $(BASEDIR)/include/generated/config.h
$(CC) $(filter-out -Wa$(comma)% -flto,$(CFLAGS)) -S -o $@ $<
 
 asm-macros.i: CFLAGS += -D__ASSEMBLY__ -P
@@ -241,6 +242,45 @@ $(BASEDIR)/include/asm-x86/asm-macros.h:
echo '#endif' >>$@.new
$(call move-if-changed,$@.new,$@)
 
+# There are multiple invocations of this Makefile, one each for asm-offset.s,
+# $(TARGET), built_in.o, and several more from the rules building $(TARGET)
+# and $(TARGET).efi. The 2nd and 3rd may race with one another, and we don't
+# want to re-generate config.h in that case anyway, so guard the logic
+# accordingly. (We do want to have the FORCE dependency on the rule, to be
+# sure we pick up changes when the compiler used has changed.)
+ifeq ($(MAKECMDGOALS),asm-offsets.s)
+
+as-ISA-list := CLWB EPT FSGSBASE INVPCID RDRAND RDSEED SSE4_2 VMX XSAVEOPT
+
+CLWB-insn  := clwb (%rax)
+EPT-insn   := invept (%rax),%rax
+FSGSBASE-insn  := rdfsbase %rax
+INVPCID-insn   := invpcid (%rax),%rax
+RDRAND-insn:= rdrand %eax
+RDSEED-insn:= rdseed %eax
+SSE4_2-insn:= crc32 %eax,%eax
+VMX-insn   := vmcall
+XSAVEOPT-insn  := xsaveopt (%rax)
+
+# GAS's idea of true is -1.  Clang's idea is 1.
+NEGATIVE_TRUE-insn := .if ((1 > 0) > 0); .error \"\"; .endif
+
+# Check to see whether the assembler supports the .nops directive.
+NOPS_DIRECTIVE-insn := .L1: .L2: .nops (.L2 - .L1),9
+
+as-features-list := $(as-ISA-list) NEGATIVE_TRUE NOPS_DIRECTIVE
+
+$(BASEDIR)/include/generated/config.h: FORCE
+   echo '/* Generated header, do not edit. */' >$@.new
+   $(foreach f,$(as-features-list), \
+ $(if $($(f)-insn),,$(error $(f)-insn is empty)) \
+ echo '#$(call as-insn,$(CC) $(CFLAGS),"$($(f)-insn)", \
+  define, \
+  undef) HAVE_AS_$(f) /* $($(f)-insn) */' >>$@.new;)
+   $(call move-if-changed,$@.new,$@)
+
+endif
+
 efi.lds: AFLAGS += -DEFI
 xen.lds efi.lds: xen.lds.S
$(CC) -P -E -Ui386 $(filter-out -Wa$(comma)%,$(AFLAGS)) -o $@ $<
--- a/xen/arch/x86/Rules.mk
+++ b/xen/arch/x86/Rules.mk
@@ -3,7 +3,7 @@
 
 XEN_IMG_OFFSET := 0x20
 
-CFLAGS += -I$(BASEDIR)/include
+CFLAGS += $(if $(filter asm-macros.% %/generated/config.h,$@),,-include 
generated/config.h)
 CFLAGS += -I$(BASEDIR)/include/asm-x86/mach-generic
 CFLAGS += -I$(BASEDIR)/include/asm-x86/mach-default
 CFLAGS += -DXEN_IMG_OFFSET=$(XEN_IMG_OFFSET)
@@ -38,26 +38,9 @@ endif
 
 $(call cc-options-add,CFLAGS,CC,$(EMBEDDED_EXTRA_CFLAGS))
 $(call cc-option-add,CFLAGS,CC,-Wnested-externs)
-$(call as-option-add,CFLAGS,CC,"vmcall",-DHAVE_AS_VMX)

[Xen-devel] [PATCH v5 01/10] x86emul: support AVX512_BF16 insns

2020-03-24 Thread Jan Beulich
Signed-off-by: Jan Beulich 
---
v5: New.
---
(SDE: -cpx)

--- a/tools/tests/x86_emulator/evex-disp8.c
+++ b/tools/tests/x86_emulator/evex-disp8.c
@@ -550,6 +550,12 @@ static const struct test avx512_4vnniw_5
 INSN(p4dpwssds, f2, 0f38, 53, el_4, d, vl),
 };
 
+static const struct test avx512_bf16_all[] = {
+INSN(vcvtne2ps2bf16, f2, 0f38, 72, vl, d, vl),
+INSN(vcvtneps2bf16,  f3, 0f38, 72, vl, d, vl),
+INSN(vdpbf16ps,  f3, 0f38, 52, vl, d, vl),
+};
+
 static const struct test avx512_bitalg_all[] = {
 INSN(popcnt,  66, 0f38, 54, vl, bw, vl),
 INSN(pshufbitqmb, 66, 0f38, 8f, vl,  b, vl),
@@ -984,6 +990,7 @@ void evex_disp8_test(void *instr, struct
 RUN(avx512pf, 512);
 RUN(avx512_4fmaps, 512);
 RUN(avx512_4vnniw, 512);
+RUN(avx512_bf16, all);
 RUN(avx512_bitalg, all);
 RUN(avx512_ifma, all);
 RUN(avx512_vbmi, all);
--- a/tools/tests/x86_emulator/test_x86_emulator.c
+++ b/tools/tests/x86_emulator/test_x86_emulator.c
@@ -4516,6 +4516,80 @@ int main(int argc, char **argv)
 else
 printf("skipped\n");
 
+if ( stack_exec && cpu_has_avx512_bf16 )
+{
+decl_insn(vcvtne2ps2bf16);
+decl_insn(vcvtneps2bf16);
+decl_insn(vdpbf16ps);
+static const struct {
+float f[16];
+} in1 = {{
+1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16
+}}, in2 = {{
+1, -2, 3, -4, 5, -6, 7, -8, 9, -10, 11, -12, 13, -14, 15, -16
+}}, out = {{
+1 * 1 + 2 * 2, 3 * 3 + 4 * 4,
+5 * 5 + 6 * 6, 7 * 7 + 8 * 8,
+9 * 9 + 10 * 10, 11 * 11 + 12 * 12,
+13 * 13 + 14 * 14, 15 * 15 + 16 * 16,
+1 * 1 - 2 * 2, 3 * 3 - 4 * 4,
+5 * 5 - 6 * 6, 7 * 7 - 8 * 8,
+9 * 9 - 10 * 10, 11 * 11 - 12 * 12,
+13 * 13 - 14 * 14, 15 * 15 - 16 * 16
+}};
+
+printf("%-40s", "Testing vcvtne2ps2bf16 64(%ecx),%zmm1,%zmm2...");
+asm volatile ( "vmovups %1, %%zmm1\n"
+   put_insn(vcvtne2ps2bf16,
+/* vcvtne2ps2bf16 64(%0), %%zmm1, %%zmm2 */
+".byte 0x62, 0xf2, 0x77, 0x48, 0x72, 0x51, 
0x01")
+   :: "c" (NULL), "m" (in2) );
+set_insn(vcvtne2ps2bf16);
+regs.ecx = (unsigned long) - 64;
+rc = x86_emulate(, );
+if ( rc != X86EMUL_OKAY || !check_eip(vcvtne2ps2bf16) )
+goto fail;
+printf("pending\n");
+
+printf("%-40s", "Testing vcvtneps2bf16 64(%ecx),%ymm3...");
+asm volatile ( put_insn(vcvtneps2bf16,
+/* vcvtneps2bf16 64(%0), %%ymm3 */
+".byte 0x62, 0xf2, 0x7e, 0x48, 0x72, 0x59, 
0x01")
+   :: "c" (NULL) );
+set_insn(vcvtneps2bf16);
+rc = x86_emulate(, );
+if ( rc != X86EMUL_OKAY || !check_eip(vcvtneps2bf16) )
+goto fail;
+asm ( "vmovdqa %%ymm2, %%ymm5\n\t"
+  "vpcmpeqd %%zmm3, %%zmm5, %%k0\n\t"
+  "kmovw %%k0, %0"
+  : "=g" (rc) : "m" (out) );
+if ( rc != 0x )
+goto fail;
+printf("pending\n");
+
+printf("%-40s", "Testing vdpbf16ps 128(%ecx),%zmm2,%zmm4...");
+asm volatile ( "vmovdqa %%ymm3, %0\n\t"
+   "vmovdqa %%ymm3, %1\n"
+   put_insn(vdpbf16ps,
+/* vdpbf16ps 128(%2), %%zmm2, %%zmm4 */
+".byte 0x62, 0xf2, 0x6e, 0x48, 0x52, 0x61, 
0x02")
+   : "=" (res[0]), "=" (res[8])
+   : "c" (NULL)
+   : "memory" );
+set_insn(vdpbf16ps);
+regs.ecx = (unsigned long)res - 128;
+rc = x86_emulate(, );
+if ( rc != X86EMUL_OKAY || !check_eip(vdpbf16ps) )
+goto fail;
+asm ( "vcmpeqps %1, %%zmm4, %%k0\n\t"
+  "kmovw %%k0, %0"
+  : "=g" (rc) : "m" (out) );
+if ( rc != 0x )
+goto fail;
+printf("okay\n");
+}
+
 printf("%-40s", "Testing invpcid 16(%ecx),%%edx...");
 if ( stack_exec )
 {
--- a/tools/tests/x86_emulator/x86-emulate.h
+++ b/tools/tests/x86_emulator/x86-emulate.h
@@ -156,6 +156,7 @@ static inline bool xcr0_mask(uint64_t ma
 #define cpu_has_avx512_vpopcntdq (cp.feat.avx512_vpopcntdq && xcr0_mask(0xe6))
 #define cpu_has_avx512_4vnniw (cp.feat.avx512_4vnniw && xcr0_mask(0xe6))
 #define cpu_has_avx512_4fmaps (cp.feat.avx512_4fmaps && xcr0_mask(0xe6))
+#define cpu_has_avx512_bf16 (cp.feat.avx512_bf16 && xcr0_mask(0xe6))
 
 #define cpu_has_xgetbv1   (cpu_has_xsave && cp.xstate.xgetbv1)
 
--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -1904,6 +1904,7 @@ in_protmode(
 #define vcpu_has_rdpid()   (ctxt->cpuid->feat.rdpid)
 #define vcpu_has_avx512_4vnniw() (ctxt->cpuid->feat.avx512_4vnniw)
 #define 

[Xen-devel] [PATCH v5 02/10] x86emul: support MOVDIRI insn

2020-03-24 Thread Jan Beulich
Note that SDM revision 070 doesn't specify exception behavior for
ModRM.mod == 0b11; assuming #UD here.

Signed-off-by: Jan Beulich 
---
v4: Split MOVDIRI and MOVDIR64B and move this one ahead. Re-base.
v3: Update description.

--- a/tools/tests/x86_emulator/test_x86_emulator.c
+++ b/tools/tests/x86_emulator/test_x86_emulator.c
@@ -2196,6 +2196,18 @@ int main(int argc, char **argv)
 goto fail;
 printf("okay\n");
 
+printf("%-40s", "Testing movdiri %edx,(%ecx)...");
+instr[0] = 0x0f; instr[1] = 0x38; instr[2] = 0xf9; instr[3] = 0x11;
+regs.eip = (unsigned long)[0];
+regs.ecx = (unsigned long)memset(res, -1, 16);
+regs.edx = 0x44332211;
+rc = x86_emulate(, );
+if ( (rc != X86EMUL_OKAY) ||
+ (regs.eip != (unsigned long)[4]) ||
+ res[0] != 0x44332211 || ~res[1] )
+goto fail;
+printf("okay\n");
+
 printf("%-40s", "Testing movq %mm3,(%ecx)...");
 if ( stack_exec && cpu_has_mmx )
 {
--- a/tools/tests/x86_emulator/x86-emulate.c
+++ b/tools/tests/x86_emulator/x86-emulate.c
@@ -76,6 +76,7 @@ bool emul_test_init(void)
 cp.feat.adx = true;
 cp.feat.avx512pf = cp.feat.avx512f;
 cp.feat.rdpid = true;
+cp.feat.movdiri = true;
 cp.extd.clzero = true;
 
 if ( cpu_has_xsave )
@@ -137,15 +138,15 @@ int emul_test_cpuid(
 res->c |= 1U << 22;
 
 /*
- * The emulator doesn't itself use ADCX/ADOX/RDPID nor the S/G prefetch
- * insns, so we can always run the respective tests.
+ * The emulator doesn't itself use ADCX/ADOX/RDPID/MOVDIRI nor the S/G
+ * prefetch insns, so we can always run the respective tests.
  */
 if ( leaf == 7 && subleaf == 0 )
 {
 res->b |= (1U << 10) | (1U << 19);
 if ( res->b & (1U << 16) )
 res->b |= 1U << 26;
-res->c |= 1U << 22;
+res->c |= (1U << 22) | (1U << 27);
 }
 
 /*
--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -548,6 +548,7 @@ static const struct ext0f38_table {
 [0xf1] = { .to_mem = 1, .two_op = 1 },
 [0xf2 ... 0xf3] = {},
 [0xf5 ... 0xf7] = {},
+[0xf9] = { .to_mem = 1 },
 };
 
 /* Shift values between src and dst sizes of pmov{s,z}x{b,w,d}{w,d,q}. */
@@ -1902,6 +1903,7 @@ in_protmode(
 #define vcpu_has_avx512_bitalg() (ctxt->cpuid->feat.avx512_bitalg)
 #define vcpu_has_avx512_vpopcntdq() (ctxt->cpuid->feat.avx512_vpopcntdq)
 #define vcpu_has_rdpid()   (ctxt->cpuid->feat.rdpid)
+#define vcpu_has_movdiri() (ctxt->cpuid->feat.movdiri)
 #define vcpu_has_avx512_4vnniw() (ctxt->cpuid->feat.avx512_4vnniw)
 #define vcpu_has_avx512_4fmaps() (ctxt->cpuid->feat.avx512_4fmaps)
 #define vcpu_has_avx512_bf16() (ctxt->cpuid->feat.avx512_bf16)
@@ -2713,10 +2715,12 @@ x86_decode_0f38(
 {
 case 0x00 ... 0xef:
 case 0xf2 ... 0xf5:
-case 0xf7 ... 0xff:
+case 0xf7 ... 0xf8:
+case 0xfa ... 0xff:
 op_bytes = 0;
 /* fall through */
 case 0xf6: /* adcx / adox */
+case 0xf9: /* movdiri */
 ctxt->opcode |= MASK_INSR(vex.pfx, X86EMUL_OPC_PFX_MASK);
 break;
 
@@ -10075,6 +10079,14 @@ x86_emulate(
 : "0" ((uint32_t)src.val), "rm" (_regs.edx) );
 break;
 
+case X86EMUL_OPC(0x0f38, 0xf9): /* movdiri mem,r */
+vcpu_must_have(movdiri);
+generate_exception_if(dst.type != OP_MEM, EXC_UD);
+/* Ignore the non-temporal behavior for now. */
+dst.val = src.val;
+sfence = true;
+break;
+
 #ifndef X86EMUL_NO_SIMD
 
 case X86EMUL_OPC_VEX_66(0x0f3a, 0x00): /* vpermq $imm8,ymm/m256,ymm */
--- a/xen/include/public/arch-x86/cpufeatureset.h
+++ b/xen/include/public/arch-x86/cpufeatureset.h
@@ -237,6 +237,7 @@ XEN_CPUFEATURE(AVX512_BITALG, 6*32+12) /
 XEN_CPUFEATURE(AVX512_VPOPCNTDQ, 6*32+14) /*A  POPCNT for vectors of DW/QW */
 XEN_CPUFEATURE(RDPID, 6*32+22) /*A  RDPID instruction */
 XEN_CPUFEATURE(CLDEMOTE,  6*32+25) /*A  CLDEMOTE instruction */
+XEN_CPUFEATURE(MOVDIRI,   6*32+27) /*A  MOVDIRI instruction */
 
 /* AMD-defined CPU features, CPUID level 0x8007.edx, word 7 */
 XEN_CPUFEATURE(ITSC,  7*32+ 8) /*   Invariant TSC */




Re: [Xen-devel] [PATCH v2] SVM: Add union intstat_t for offset 68h in vmcb struct

2020-03-24 Thread Andrew Cooper
On 24/03/2020 10:37, Pu Wen wrote:
> According to chapter "Appendix B Layout of VMCB" in the new version
> (v3.32) AMD64 APM[1], bit 1 of the VMCB offset 68h is defined as
> GUEST_INTERRUPT_MASK.
>
> In current xen codes, it use whole u64 interrupt_shadow to setup
> interrupt shadow, which will misuse other bit in VMCB offset 68h
> as part of interrupt_shadow.
>
> Add union intstat_t for VMCB offset 68h and fix codes to only use
> bit 0 as intr_shadow according to the new APM description.
>
> Reference:
> [1] https://www.amd.com/system/files/TechDocs/24593.pdf
>
> Signed-off-by: Pu Wen 

Hmm - this field doesn't appear to be part of AVIC, which makes me
wonder what we're doing without it.

It appears to be a shadow copy of EFLAGS.IF which is only written on
vmexit, and never consumed, but this is based on Appendix B which is the
only reference I can find to the field at all.  Neither the
VMRUN/#VMEXIT descriptions discuss it at all.

Given its position next to the (ambiguous) INTERRUPT_SHADOW, it just
might actually distinguish the STI shadow from the MovSS shadow, but it
could only do that by not behaving as described, and being asymmetric
with EFLAGS.  I don't have time to investigate this right now.

We need the field described in Xen to set it appropriately for virtual
vmexit, but I think that is the extent of what we need to do.

~Andrew



[Xen-devel] [PATCH v5 02/10] x86emul: support AVX512_BF16 insns

2020-03-24 Thread Jan Beulich
Signed-off-by: Jan Beulich 
---
v5: New.
---
(SDE: -cpx)

--- a/tools/tests/x86_emulator/evex-disp8.c
+++ b/tools/tests/x86_emulator/evex-disp8.c
@@ -550,6 +550,12 @@ static const struct test avx512_4vnniw_5
 INSN(p4dpwssds, f2, 0f38, 53, el_4, d, vl),
 };
 
+static const struct test avx512_bf16_all[] = {
+INSN(vcvtne2ps2bf16, f2, 0f38, 72, vl, d, vl),
+INSN(vcvtneps2bf16,  f3, 0f38, 72, vl, d, vl),
+INSN(vdpbf16ps,  f3, 0f38, 52, vl, d, vl),
+};
+
 static const struct test avx512_bitalg_all[] = {
 INSN(popcnt,  66, 0f38, 54, vl, bw, vl),
 INSN(pshufbitqmb, 66, 0f38, 8f, vl,  b, vl),
@@ -984,6 +990,7 @@ void evex_disp8_test(void *instr, struct
 RUN(avx512pf, 512);
 RUN(avx512_4fmaps, 512);
 RUN(avx512_4vnniw, 512);
+RUN(avx512_bf16, all);
 RUN(avx512_bitalg, all);
 RUN(avx512_ifma, all);
 RUN(avx512_vbmi, all);
--- a/tools/tests/x86_emulator/test_x86_emulator.c
+++ b/tools/tests/x86_emulator/test_x86_emulator.c
@@ -4516,6 +4516,80 @@ int main(int argc, char **argv)
 else
 printf("skipped\n");
 
+if ( stack_exec && cpu_has_avx512_bf16 )
+{
+decl_insn(vcvtne2ps2bf16);
+decl_insn(vcvtneps2bf16);
+decl_insn(vdpbf16ps);
+static const struct {
+float f[16];
+} in1 = {{
+1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16
+}}, in2 = {{
+1, -2, 3, -4, 5, -6, 7, -8, 9, -10, 11, -12, 13, -14, 15, -16
+}}, out = {{
+1 * 1 + 2 * 2, 3 * 3 + 4 * 4,
+5 * 5 + 6 * 6, 7 * 7 + 8 * 8,
+9 * 9 + 10 * 10, 11 * 11 + 12 * 12,
+13 * 13 + 14 * 14, 15 * 15 + 16 * 16,
+1 * 1 - 2 * 2, 3 * 3 - 4 * 4,
+5 * 5 - 6 * 6, 7 * 7 - 8 * 8,
+9 * 9 - 10 * 10, 11 * 11 - 12 * 12,
+13 * 13 - 14 * 14, 15 * 15 - 16 * 16
+}};
+
+printf("%-40s", "Testing vcvtne2ps2bf16 64(%ecx),%zmm1,%zmm2...");
+asm volatile ( "vmovups %1, %%zmm1\n"
+   put_insn(vcvtne2ps2bf16,
+/* vcvtne2ps2bf16 64(%0), %%zmm1, %%zmm2 */
+".byte 0x62, 0xf2, 0x77, 0x48, 0x72, 0x51, 
0x01")
+   :: "c" (NULL), "m" (in2) );
+set_insn(vcvtne2ps2bf16);
+regs.ecx = (unsigned long) - 64;
+rc = x86_emulate(, );
+if ( rc != X86EMUL_OKAY || !check_eip(vcvtne2ps2bf16) )
+goto fail;
+printf("pending\n");
+
+printf("%-40s", "Testing vcvtneps2bf16 64(%ecx),%ymm3...");
+asm volatile ( put_insn(vcvtneps2bf16,
+/* vcvtneps2bf16 64(%0), %%ymm3 */
+".byte 0x62, 0xf2, 0x7e, 0x48, 0x72, 0x59, 
0x01")
+   :: "c" (NULL) );
+set_insn(vcvtneps2bf16);
+rc = x86_emulate(, );
+if ( rc != X86EMUL_OKAY || !check_eip(vcvtneps2bf16) )
+goto fail;
+asm ( "vmovdqa %%ymm2, %%ymm5\n\t"
+  "vpcmpeqd %%zmm3, %%zmm5, %%k0\n\t"
+  "kmovw %%k0, %0"
+  : "=g" (rc) : "m" (out) );
+if ( rc != 0x )
+goto fail;
+printf("pending\n");
+
+printf("%-40s", "Testing vdpbf16ps 128(%ecx),%zmm2,%zmm4...");
+asm volatile ( "vmovdqa %%ymm3, %0\n\t"
+   "vmovdqa %%ymm3, %1\n"
+   put_insn(vdpbf16ps,
+/* vdpbf16ps 128(%2), %%zmm2, %%zmm4 */
+".byte 0x62, 0xf2, 0x6e, 0x48, 0x52, 0x61, 
0x02")
+   : "=" (res[0]), "=" (res[8])
+   : "c" (NULL)
+   : "memory" );
+set_insn(vdpbf16ps);
+regs.ecx = (unsigned long)res - 128;
+rc = x86_emulate(, );
+if ( rc != X86EMUL_OKAY || !check_eip(vdpbf16ps) )
+goto fail;
+asm ( "vcmpeqps %1, %%zmm4, %%k0\n\t"
+  "kmovw %%k0, %0"
+  : "=g" (rc) : "m" (out) );
+if ( rc != 0x )
+goto fail;
+printf("okay\n");
+}
+
 printf("%-40s", "Testing invpcid 16(%ecx),%%edx...");
 if ( stack_exec )
 {
--- a/tools/tests/x86_emulator/x86-emulate.h
+++ b/tools/tests/x86_emulator/x86-emulate.h
@@ -156,6 +156,7 @@ static inline bool xcr0_mask(uint64_t ma
 #define cpu_has_avx512_vpopcntdq (cp.feat.avx512_vpopcntdq && xcr0_mask(0xe6))
 #define cpu_has_avx512_4vnniw (cp.feat.avx512_4vnniw && xcr0_mask(0xe6))
 #define cpu_has_avx512_4fmaps (cp.feat.avx512_4fmaps && xcr0_mask(0xe6))
+#define cpu_has_avx512_bf16 (cp.feat.avx512_bf16 && xcr0_mask(0xe6))
 
 #define cpu_has_xgetbv1   (cpu_has_xsave && cp.xstate.xgetbv1)
 
--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -1904,6 +1904,7 @@ in_protmode(
 #define vcpu_has_rdpid()   (ctxt->cpuid->feat.rdpid)
 #define vcpu_has_avx512_4vnniw() (ctxt->cpuid->feat.avx512_4vnniw)
 #define 

[Xen-devel] [PATCH v5 00/10] x86emul: further work

2020-03-24 Thread Jan Beulich
Some of the later patches are still at least partly RFC, for
varying reasons (see there). I'd appreciate though if at least
some of the earlier ones could go in rather sooner than later.

Patch 1 functionally (for the test harness) depends on
"libx86/CPUID: fix (not just) leaf 7 processing", while at
least patch 2 contextually depends on "x86emul: disable
FPU/MMX/SIMD insn emulation when !HVM".

 1: x86emul: support AVX512_BF16 insns
 2: x86emul: support MOVDIRI insn
 3: x86: determine HAVE_AS_* just once
 4: x86: move back clang no integrated assembler tests
 5: x86emul: support MOVDIR64B insn
 6: x86emul: support ENQCMD insn
 7: x86/HVM: scale MPERF values reported to guests (on AMD)
 8: x86emul: support RDPRU
 9: x86/HVM: don't needlessly intercept APERF/MPERF/TSC MSR reads
10: x86emul: support MCOMMIT

See individual patches for changes from v4 (which was mistakenly
sent out with a v3 tag).

Jan



Re: [Xen-devel] [PATCH 2/3] x86/nvmx: clarify and fix usage of nvmx_update_apicv

2020-03-24 Thread Roger Pau Monné
On Tue, Mar 24, 2020 at 11:33:00AM +, Tian, Kevin wrote:
> > From: Roger Pau Monné 
> > Sent: Tuesday, March 24, 2020 7:23 PM
> > 
> > On Tue, Mar 24, 2020 at 10:11:15AM +, Tian, Kevin wrote:
> > > > From: Roger Pau Monné 
> > > > Sent: Tuesday, March 24, 2020 5:51 PM
> > > >
> > > > On Tue, Mar 24, 2020 at 06:03:28AM +, Tian, Kevin wrote:
> > > > > > From: Roger Pau Monne 
> > > > > > Sent: Saturday, March 21, 2020 3:08 AM
> > > > > >
> > > > > > The current usage of nvmx_update_apicv is not clear: it is deeply
> > > > > > intertwined with the Ack interrupt on exit VMEXIT control.
> > > > > >
> > > > > > The code in nvmx_update_apicv should update the SVI (in service
> > > > interrupt)
> > > > > > field of the guest interrupt status only when the Ack interrupt on
> > > > > > exit is set, as it is used to record that the interrupt being
> > > > > > serviced is signaled in a vmcs field, and hence hasn't been injected
> > > > > > as on native. It's important to record the current in service
> > > > > > interrupt on the guest interrupt status field, or else further
> > > > > > interrupts won't respect the priority of the in service one.
> > > > > >
> > > > > > While clarifying the usage make sure that the SVI is only updated
> > when
> > > > > > Ack on exit is set and the nested vmcs interrupt info field is 
> > > > > > valid. Or
> > > > > > else a guest not using the Ack on exit feature would loose 
> > > > > > interrupts
> > as
> > > > > > they would be signaled as being in service on the guest interrupt
> > > > > > status field but won't actually be recorded on the interrupt info 
> > > > > > vmcs
> > > > > > field, neither injected in any other way.
> > > > >
> > > > > It is insufficient. You also need to update RVI to enable virtual 
> > > > > injection
> > > > > when Ack on exit is cleared.
> > > >
> > > > But RVI should be updated in vmx_intr_assist in that case, since
> > > > nvmx_intr_intercept shouldn't intercept the interrupt, as it should be
> > > > handled normally.
> > >
> > > As we discussed before, vmx_intr_assist is invoked before
> > nvmx_switch_guest.
> > >
> > > It is incorrectly to update RVI at that time since it might be still 
> > > vmcs02
> > being
> > > active (if no pending softirq to make it invoked again).
> > >
> > > Also nvmx_intr_intercept does intercept Ack-on-exit=0 case:
> > >
> > > if ( intack.source == hvm_intsrc_pic ||
> > >  intack.source == hvm_intsrc_lapic )

I've realized that nvmx_intr_intercept will return 1 for interrupts
originating from the lapic or the pic, while nvmx_update_apicv will
only update GUEST_INTR_STATUS for interrupts originating from the
lapic. Is this correct?

Shouldn't both be in sync and handle the same interrupt sources?

Thanks, Roger.



[Xen-devel] Ping: [PATCH 5/5] x86emul: disable FPU/MMX/SIMD insn emulation when !HVM

2020-03-24 Thread Jan Beulich
On 20.12.2019 14:41, Jan Beulich wrote:
> In a pure PV environment (the PV shim in particular) we don't really
> need emulation of all these. To limit #ifdef-ary utilize some of the
> CASE_*() macros we have, by providing variants expanding to
> (effectively) nothing (really a label, which in turn requires passing
> -Wno-unused-label to the compiler when build such configurations).
> 
> Due to the mixture of macro and #ifdef use, the placement of some of
> the #ifdef-s is a little arbitrary.
> 
> The resulting object file's .text is less than half the size of the
> original, and looks to also be compiling a little more quickly.
> 
> This is meant as a first step; more parts can likely be disabled down
> the road.
> 
> Suggested-by: Andrew Cooper 
> Signed-off-by: Jan Beulich 

Ping?

> ---
> I'll be happy to take suggestions allowing to avoid -Wno-unused-label.
> 
> --- a/xen/arch/x86/Makefile
> +++ b/xen/arch/x86/Makefile
> @@ -79,6 +79,9 @@ obj-y += hpet.o
>  obj-y += vm_event.o
>  obj-y += xstate.o
>  
> +ifneq ($(CONFIG_HVM),y)
> +x86_emulate.o: CFLAGS += -Wno-unused-label
> +endif
>  x86_emulate.o: x86_emulate/x86_emulate.c x86_emulate/x86_emulate.h
>  
>  efi-y := $(shell if [ ! -r $(BASEDIR)/include/xen/compile.h -o \
> --- a/xen/arch/x86/x86_emulate.c
> +++ b/xen/arch/x86/x86_emulate.c
> @@ -42,6 +42,12 @@
>  }  \
>  })
>  
> +#ifndef CONFIG_HVM
> +# define X86EMUL_NO_FPU
> +# define X86EMUL_NO_MMX
> +# define X86EMUL_NO_SIMD
> +#endif
> +
>  #include "x86_emulate/x86_emulate.c"
>  
>  int x86emul_read_xcr(unsigned int reg, uint64_t *val,
> --- a/xen/arch/x86/x86_emulate/x86_emulate.c
> +++ b/xen/arch/x86/x86_emulate/x86_emulate.c
> @@ -3476,6 +3476,7 @@ x86_decode(
>  op_bytes = 4;
>  break;
>  
> +#ifndef X86EMUL_NO_SIMD
>  case simd_packed_int:
>  switch ( vex.pfx )
>  {
> @@ -3541,6 +3542,7 @@ x86_decode(
>  case simd_256:
>  op_bytes = 32;
>  break;
> +#endif /* !X86EMUL_NO_SIMD */
>  
>  default:
>  op_bytes = 0;
> @@ -3695,6 +3697,7 @@ x86_emulate(
>  break;
>  }
>  
> +#ifndef X86EMUL_NO_SIMD
>  /* With a memory operand, fetch the mask register in use (if any). */
>  if ( ea.type == OP_MEM && evex.opmsk &&
>   _get_fpu(fpu_type = X86EMUL_FPU_opmask, ctxt, ops) == X86EMUL_OKAY )
> @@ -3725,6 +3728,7 @@ x86_emulate(
>  put_fpu(X86EMUL_FPU_opmask, false, state, ctxt, ops);
>  fpu_type = X86EMUL_FPU_none;
>  }
> +#endif /* !X86EMUL_NO_SIMD */
>  
>  /* Decode (but don't fetch) the destination operand: register or memory. 
> */
>  switch ( d & DstMask )
> @@ -4372,11 +4376,13 @@ x86_emulate(
>  singlestep = _regs.eflags & X86_EFLAGS_TF;
>  break;
>  
> +#ifndef X86EMUL_NO_FPU
>  case 0x9b:  /* wait/fwait */
>  host_and_vcpu_must_have(fpu);
>  get_fpu(X86EMUL_FPU_wait);
>  emulate_fpu_insn_stub(b);
>  break;
> +#endif
>  
>  case 0x9c: /* pushf */
>  if ( (_regs.eflags & X86_EFLAGS_VM) &&
> @@ -4785,6 +4791,7 @@ x86_emulate(
>  break;
>  }
>  
> +#ifndef X86EMUL_NO_FPU
>  case 0xd8: /* FPU 0xd8 */
>  host_and_vcpu_must_have(fpu);
>  get_fpu(X86EMUL_FPU_fpu);
> @@ -5119,6 +5126,7 @@ x86_emulate(
>  }
>  }
>  break;
> +#endif /* !X86EMUL_NO_FPU */
>  
>  case 0xe0 ... 0xe2: /* loop{,z,nz} */ {
>  unsigned long count = get_loop_count(&_regs, ad_bytes);
> @@ -5983,6 +5991,8 @@ x86_emulate(
>  case X86EMUL_OPC(0x0f, 0x19) ... X86EMUL_OPC(0x0f, 0x1f): /* nop */
>  break;
>  
> +#ifndef X86EMUL_NO_MMX
> +
>  case X86EMUL_OPC(0x0f, 0x0e): /* femms */
>  host_and_vcpu_must_have(3dnow);
>  asm volatile ( "femms" );
> @@ -6003,39 +6013,71 @@ x86_emulate(
>  state->simd_size = simd_other;
>  goto simd_0f_imm8;
>  
> -#define CASE_SIMD_PACKED_INT(pfx, opc)   \
> +#endif /* !X86EMUL_NO_MMX */
> +
> +#if !defined(X86EMUL_NO_SIMD) && !defined(X86EMUL_NO_MMX)
> +# define CASE_SIMD_PACKED_INT(pfx, opc)  \
>  case X86EMUL_OPC(pfx, opc):  \
>  case X86EMUL_OPC_66(pfx, opc)
> -#define CASE_SIMD_PACKED_INT_VEX(pfx, opc)   \
> +#elif !defined(X86EMUL_NO_SIMD)
> +# define CASE_SIMD_PACKED_INT(pfx, opc)  \
> +case X86EMUL_OPC_66(pfx, opc)
> +#elif !defined(X86EMUL_NO_MMX)
> +# define CASE_SIMD_PACKED_INT(pfx, opc)  \
> +case X86EMUL_OPC(pfx, opc)
> +#else
> +# define CASE_SIMD_PACKED_INT(pfx, opc) C##pfx##_##opc
> +#endif
> +
> +#ifndef X86EMUL_NO_SIMD
> +
> +# define CASE_SIMD_PACKED_INT_VEX(pfx, opc)  \
>  CASE_SIMD_PACKED_INT(pfx, opc):  \
>  case X86EMUL_OPC_VEX_66(pfx, opc)
>  
> -#define CASE_SIMD_ALL_FP(kind, pfx, opc) \
> +# define CASE_SIMD_ALL_FP(kind, pfx, opc)\
>  CASE_SIMD_PACKED_FP(kind, pfx, opc): \
>  CASE_SIMD_SCALAR_FP(kind, pfx, opc)
> -#define 

Re: [Xen-devel] [PATCH OSSTEST] kernel-build: enable XEN_BALLOON_MEMORY_HOTPLUG

2020-03-24 Thread Roger Pau Monné
Adding Juergen and Boris for feedback.

On Tue, Mar 24, 2020 at 11:57:48AM +, Ian Jackson wrote:
> Ian Jackson writes ("Re: [PATCH OSSTEST] kernel-build: enable 
> XEN_BALLOON_MEMORY_HOTPLUG"):
> > Roger Pau Monne writes ("[PATCH OSSTEST] kernel-build: enable 
> > XEN_BALLOON_MEMORY_HOTPLUG"):
> > > This allows a PVH/HVM domain to use unpopulated memory ranges to map
> > > foreign memory or grants, and is required for a PVH dom0 to function
> > > properly.
> > > 
> > > Signed-off-by: Roger Pau Monné 
> > 
> > Acked-by: Ian Jackson 
> > 
> > I will push this to pretest immediately.
> 
> Now done.  Would you consider whether the default should be changed
> in Linux and prepare a patch to do so if appropriate ?

DYK if there's any reason why this is not on by default?

Thanks, Roger.



Re: [Xen-devel] [PATCH v2] SVM: Add union intstat_t for offset 68h in vmcb struct

2020-03-24 Thread Jan Beulich
On 24.03.2020 11:37, Pu Wen wrote:
> According to chapter "Appendix B Layout of VMCB" in the new version
> (v3.32) AMD64 APM[1], bit 1 of the VMCB offset 68h is defined as
> GUEST_INTERRUPT_MASK.
> 
> In current xen codes, it use whole u64 interrupt_shadow to setup
> interrupt shadow, which will misuse other bit in VMCB offset 68h
> as part of interrupt_shadow.
> 
> Add union intstat_t for VMCB offset 68h and fix codes to only use
> bit 0 as intr_shadow according to the new APM description.
> 
> Reference:
> [1] https://www.amd.com/system/files/TechDocs/24593.pdf
> 
> Signed-off-by: Pu Wen 

Looks okay now to me (with one further nit, see below), but ...

> v1->v2:
>   - Copy the whole int_stat in nsvm_vmcb_prepare4vmrun() and
> nsvm_vmcb_prepare4vmexit().

... in particular this part I'd prefer to wait a little to
whether Andrew or anyone else has a specific opinion one or
the other way.

> --- a/xen/include/asm-x86/hvm/svm/vmcb.h
> +++ b/xen/include/asm-x86/hvm/svm/vmcb.h
> @@ -316,6 +316,17 @@ typedef union
>  uint64_t raw;
>  } intinfo_t;
>  
> +typedef union
> +{
> +struct
> +{

Nit: The brace should be on the same line as "struct"; can be
taken care of while committing.

Jan



Re: [Xen-devel] [PATCH OSSTEST] kernel-build: enable XEN_BALLOON_MEMORY_HOTPLUG

2020-03-24 Thread Ian Jackson
Roger Pau Monne writes ("[PATCH OSSTEST] kernel-build: enable 
XEN_BALLOON_MEMORY_HOTPLUG"):
> This allows a PVH/HVM domain to use unpopulated memory ranges to map
> foreign memory or grants, and is required for a PVH dom0 to function
> properly.
> 
> Signed-off-by: Roger Pau Monné 

Acked-by: Ian Jackson 

I will push this to pretest immediately.

Thanks,
Ian.



[Xen-devel] [qemu-mainline bisection] complete test-amd64-amd64-xl-qemuu-ovmf-amd64

2020-03-24 Thread osstest service owner
branch xen-unstable
xenbranch xen-unstable
job test-amd64-amd64-xl-qemuu-ovmf-amd64
testid debian-hvm-install

Tree: linux git://xenbits.xen.org/linux-pvops.git
Tree: linuxfirmware git://xenbits.xen.org/osstest/linux-firmware.git
Tree: ovmf git://xenbits.xen.org/osstest/ovmf.git
Tree: qemu git://xenbits.xen.org/qemu-xen-traditional.git
Tree: qemuu git://git.qemu.org/qemu.git
Tree: seabios git://xenbits.xen.org/osstest/seabios.git
Tree: xen git://xenbits.xen.org/xen.git

*** Found and reproduced problem changeset ***

  Bug is in tree:  qemuu git://git.qemu.org/qemu.git
  Bug introduced:  ca6155c0f2bd39b4b4162533be401c98bd960820
  Bug not present: c220cdec4845f305034330f80ce297f1f997f2d3
  Last fail repro: http://logs.test-lab.xenproject.org/osstest/logs/148968/


  (Revision log too long, omitted.)


For bisection revision-tuple graph see:
   
http://logs.test-lab.xenproject.org/osstest/results/bisect/qemu-mainline/test-amd64-amd64-xl-qemuu-ovmf-amd64.debian-hvm-install.html
Revision IDs in each graph node refer, respectively, to the Trees above.


Running cs-bisection-step 
--graph-out=/home/logs/results/bisect/qemu-mainline/test-amd64-amd64-xl-qemuu-ovmf-amd64.debian-hvm-install
 --summary-out=tmp/148968.bisection-summary --basis-template=144861 
--blessings=real,real-bisect qemu-mainline test-amd64-amd64-xl-qemuu-ovmf-amd64 
debian-hvm-install
Searching for failure / basis pass:
 148879 fail [host=albana1] / 147546 ok.
Failure / basis pass flights: 148879 / 147546
(tree with no url: minios)
Tree: linux git://xenbits.xen.org/linux-pvops.git
Tree: linuxfirmware git://xenbits.xen.org/osstest/linux-firmware.git
Tree: ovmf git://xenbits.xen.org/osstest/ovmf.git
Tree: qemu git://xenbits.xen.org/qemu-xen-traditional.git
Tree: qemuu git://git.qemu.org/qemu.git
Tree: seabios git://xenbits.xen.org/osstest/seabios.git
Tree: xen git://xenbits.xen.org/xen.git
Latest c3038e718a19fc596f7b1baba0f83d5146dc7784 
c530a75c1e6a472b0eb9558310b518f0dfcd8860 
0c8ea9fe1adbbee230ee0c68f28b68ca2b0534bc 
d0d8ad39ecb51cd7497cd524484fe09f50876798 
9b26a610936deaf436af9b7e39e4b7f0a35e4409 
066a9956097b54530888b88ab9aa1ea02e42af5a 
d094e95fb7c61c5f46d8e446b4bdc028438dea1c
Basis pass c3038e718a19fc596f7b1baba0f83d5146dc7784 
c530a75c1e6a472b0eb9558310b518f0dfcd8860 
70911f1f4aee0366b6122f2b90d367ec0f066beb 
d0d8ad39ecb51cd7497cd524484fe09f50876798 
c1e667d2598b9b3ce62b8e89ed22dd38dfe9f57f 
76551856b28d227cb0386a1ab0e774329b941f7d 
c47984aabead53918e5ba6d43cdb3f1467452739
Generating revisions with ./adhoc-revtuple-generator  
git://xenbits.xen.org/linux-pvops.git#c3038e718a19fc596f7b1baba0f83d5146dc7784-c3038e718a19fc596f7b1baba0f83d5146dc7784
 
git://xenbits.xen.org/osstest/linux-firmware.git#c530a75c1e6a472b0eb9558310b518f0dfcd8860-c530a75c1e6a472b0eb9558310b518f0dfcd8860
 
git://xenbits.xen.org/osstest/ovmf.git#70911f1f4aee0366b6122f2b90d367ec0f066beb-0c8ea9fe1adbbee230ee0c68f28b68ca2b0534bc
 git://xenbits.xen.org/qemu-xen-traditional.git#d0d8ad39ecb51cd7497cd524484\
 fe09f50876798-d0d8ad39ecb51cd7497cd524484fe09f50876798 
git://git.qemu.org/qemu.git#c1e667d2598b9b3ce62b8e89ed22dd38dfe9f57f-9b26a610936deaf436af9b7e39e4b7f0a35e4409
 
git://xenbits.xen.org/osstest/seabios.git#76551856b28d227cb0386a1ab0e774329b941f7d-066a9956097b54530888b88ab9aa1ea02e42af5a
 
git://xenbits.xen.org/xen.git#c47984aabead53918e5ba6d43cdb3f1467452739-d094e95fb7c61c5f46d8e446b4bdc028438dea1c
>From git://cache:9419/git://git.qemu.org/qemu
   f1e748d279..09a98dd988  master -> origin/master
Use of uninitialized value $parents in array dereference at 
./adhoc-revtuple-generator line 465.
Use of uninitialized value in concatenation (.) or string at 
./adhoc-revtuple-generator line 465.
Use of uninitialized value $parents in array dereference at 
./adhoc-revtuple-generator line 465.
Use of uninitialized value in concatenation (.) or string at 
./adhoc-revtuple-generator line 465.
Use of uninitialized value $parents in array dereference at 
./adhoc-revtuple-generator line 465.
Use of uninitialized value in concatenation (.) or string at 
./adhoc-revtuple-generator line 465.
Loaded 30209 nodes in revision graph
Searching for test results:
 147546 pass c3038e718a19fc596f7b1baba0f83d5146dc7784 
c530a75c1e6a472b0eb9558310b518f0dfcd8860 
70911f1f4aee0366b6122f2b90d367ec0f066beb 
d0d8ad39ecb51cd7497cd524484fe09f50876798 
c1e667d2598b9b3ce62b8e89ed22dd38dfe9f57f 
76551856b28d227cb0386a1ab0e774329b941f7d 
c47984aabead53918e5ba6d43cdb3f1467452739
 147641 fail irrelevant
 147710 fail irrelevant
 147758 fail irrelevant
 147821 fail irrelevant
 148010 fail irrelevant
 148184 fail irrelevant
 148120 fail irrelevant
 148261 fail irrelevant
 148421 fail irrelevant
 148340 fail irrelevant
 148483 fail irrelevant
 148545 fail irrelevant
 148578 fail c3038e718a19fc596f7b1baba0f83d5146dc7784 
c530a75c1e6a472b0eb9558310b518f0dfcd8860 
799d88c1bae7978da23727df94b16f37bd1521f4 
d0d8ad39ecb51cd7497cd524484fe09f50876798 

Re: [Xen-devel] [PATCH v2] xen/sched: fix onlining cpu with core scheduling active

2020-03-24 Thread Jürgen Groß

Ping?

On 10.03.20 10:06, Juergen Gross wrote:

When onlining a cpu cpupool_cpu_add() checks whether all siblings of
the new cpu are free in order to decide whether to add it to cpupool0.
In case the added cpu is not the last sibling to be onlined this test
is wrong as it only checks for all online siblings to be free. The
test should include the check for the number of siblings having
reached the scheduling granularity of cpupool0, too.

Signed-off-by: Juergen Gross 
---
V2:
- modify condition form >= to == (Jan Beulich)
---
  xen/common/sched/cpupool.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/xen/common/sched/cpupool.c b/xen/common/sched/cpupool.c
index 9f70c7ec17..d40345b585 100644
--- a/xen/common/sched/cpupool.c
+++ b/xen/common/sched/cpupool.c
@@ -616,7 +616,8 @@ static int cpupool_cpu_add(unsigned int cpu)
  get_sched_res(cpu)->cpupool = NULL;
  
  cpus = sched_get_opt_cpumask(cpupool0->gran, cpu);

-if ( cpumask_subset(cpus, _free_cpus) )
+if ( cpumask_subset(cpus, _free_cpus) &&
+ cpumask_weight(cpus) == cpupool_get_granularity(cpupool0) )
  ret = cpupool_assign_cpu_locked(cpupool0, cpu);
  
  rcu_read_unlock(_res_rculock);







Re: [Xen-devel] [PATCH v3] xen/sched: fix cpu offlining with core scheduling

2020-03-24 Thread Jürgen Groß

Ping?

On 10.03.20 09:09, Juergen Gross wrote:

Offlining a cpu with core scheduling active can result in a hanging
system. Reason is the scheduling resource and unit of the to be removed
cpus needs to be split in order to remove the cpu from its cpupool and
move it to the idle scheduler. In case one of the involved cpus happens
to have received a sched slave event due to a vcpu former having been
running on that cpu being woken up again, it can happen that this cpu
will enter sched_wait_rendezvous_in() while its scheduling resource is
just about to be split. It might wait for ever for the other sibling
to join, which will never happen due to the resources already being
modified.

This can easily be avoided by:
- resetting the rendezvous counters of the idle unit which is kept
- checking for a new scheduling resource in sched_wait_rendezvous_in()
   after reacquiring the scheduling lock and resetting the counters in
   that case without scheduling another vcpu
- moving schedule resource modifications (in schedule_cpu_rm()) and
   retrieving (schedule(), sched_slave() is fine already, others are not
   critical) into locked regions

Reported-by: Igor Druzhinin 
Signed-off-by: Juergen Gross 
---
V2:
- fix unlocking, add some related comments

V3:
- small comment corrections (Jan Beulich)
---
  xen/common/sched/core.c | 39 ---
  1 file changed, 32 insertions(+), 7 deletions(-)

diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c
index 7e8e7d2c39..626861a3fe 100644
--- a/xen/common/sched/core.c
+++ b/xen/common/sched/core.c
@@ -2299,6 +2299,10 @@ void sched_context_switched(struct vcpu *vprev, struct 
vcpu *vnext)
  rcu_read_unlock(_res_rculock);
  }
  
+/*

+ * Switch to a new context or keep the current one running.
+ * On x86 it won't return, so it needs to drop the still held 
sched_res_rculock.
+ */
  static void sched_context_switch(struct vcpu *vprev, struct vcpu *vnext,
   bool reset_idle_unit, s_time_t now)
  {
@@ -2408,6 +2412,9 @@ static struct vcpu *sched_force_context_switch(struct 
vcpu *vprev,
   * zero do_schedule() is called and the rendezvous counter for leaving
   * context_switch() is set. All other members will wait until the counter is
   * becoming zero, dropping the schedule lock in between.
+ * Either returns the new unit to run, or NULL if no context switch is
+ * required or (on Arm) has already been performed. If NULL is returned
+ * sched_res_rculock has been dropped.
   */
  static struct sched_unit *sched_wait_rendezvous_in(struct sched_unit *prev,
 spinlock_t **lock, int cpu,
@@ -2415,7 +2422,8 @@ static struct sched_unit *sched_wait_rendezvous_in(struct 
sched_unit *prev,
  {
  struct sched_unit *next;
  struct vcpu *v;
-unsigned int gran = get_sched_res(cpu)->granularity;
+struct sched_resource *sr = get_sched_res(cpu);
+unsigned int gran = sr->granularity;
  
  if ( !--prev->rendezvous_in_cnt )

  {
@@ -2482,6 +2490,21 @@ static struct sched_unit 
*sched_wait_rendezvous_in(struct sched_unit *prev,
  atomic_set(>next_task->rendezvous_out_cnt, 0);
  prev->rendezvous_in_cnt = 0;
  }
+
+/*
+ * Check for scheduling resource switched. This happens when we are
+ * moved away from our cpupool and cpus are subject of the idle
+ * scheduler now.
+ */
+if ( unlikely(sr != get_sched_res(cpu)) )
+{
+ASSERT(is_idle_unit(prev));
+atomic_set(>next_task->rendezvous_out_cnt, 0);
+prev->rendezvous_in_cnt = 0;
+pcpu_schedule_unlock_irq(*lock, cpu);
+rcu_read_unlock(_res_rculock);
+return NULL;
+}
  }
  
  return prev->next_task;

@@ -2567,11 +2590,11 @@ static void schedule(void)
  
  rcu_read_lock(_res_rculock);
  
+lock = pcpu_schedule_lock_irq(cpu);

+
  sr = get_sched_res(cpu);
  gran = sr->granularity;
  
-lock = pcpu_schedule_lock_irq(cpu);

-
  if ( prev->rendezvous_in_cnt )
  {
  /*
@@ -3151,7 +3174,10 @@ int schedule_cpu_rm(unsigned int cpu)
  per_cpu(sched_res_idx, cpu_iter) = 0;
  if ( cpu_iter == cpu )
  {
-idle_vcpu[cpu_iter]->sched_unit->priv = NULL;
+unit = idle_vcpu[cpu_iter]->sched_unit;
+unit->priv = NULL;
+atomic_set(>next_task->rendezvous_out_cnt, 0);
+unit->rendezvous_in_cnt = 0;
  }
  else
  {
@@ -3182,6 +3208,8 @@ int schedule_cpu_rm(unsigned int cpu)
  }
  sr->scheduler = _idle_ops;
  sr->sched_priv = NULL;
+sr->granularity = 1;
+sr->cpupool = NULL;
  
  smp_mb();

  sr->schedule_lock = _free_cpu_lock;
@@ -3194,9 +3222,6 @@ int schedule_cpu_rm(unsigned int cpu)
  sched_free_udata(old_ops, vpriv_old);
  sched_free_pdata(old_ops, ppriv_old, cpu);
  
-

Re: [Xen-devel] [PATCH 2/3] x86/nvmx: clarify and fix usage of nvmx_update_apicv

2020-03-24 Thread Tian, Kevin
> From: Roger Pau Monné 
> Sent: Tuesday, March 24, 2020 7:23 PM
> 
> On Tue, Mar 24, 2020 at 10:11:15AM +, Tian, Kevin wrote:
> > > From: Roger Pau Monné 
> > > Sent: Tuesday, March 24, 2020 5:51 PM
> > >
> > > On Tue, Mar 24, 2020 at 06:03:28AM +, Tian, Kevin wrote:
> > > > > From: Roger Pau Monne 
> > > > > Sent: Saturday, March 21, 2020 3:08 AM
> > > > >
> > > > > The current usage of nvmx_update_apicv is not clear: it is deeply
> > > > > intertwined with the Ack interrupt on exit VMEXIT control.
> > > > >
> > > > > The code in nvmx_update_apicv should update the SVI (in service
> > > interrupt)
> > > > > field of the guest interrupt status only when the Ack interrupt on
> > > > > exit is set, as it is used to record that the interrupt being
> > > > > serviced is signaled in a vmcs field, and hence hasn't been injected
> > > > > as on native. It's important to record the current in service
> > > > > interrupt on the guest interrupt status field, or else further
> > > > > interrupts won't respect the priority of the in service one.
> > > > >
> > > > > While clarifying the usage make sure that the SVI is only updated
> when
> > > > > Ack on exit is set and the nested vmcs interrupt info field is valid. 
> > > > > Or
> > > > > else a guest not using the Ack on exit feature would loose interrupts
> as
> > > > > they would be signaled as being in service on the guest interrupt
> > > > > status field but won't actually be recorded on the interrupt info vmcs
> > > > > field, neither injected in any other way.
> > > >
> > > > It is insufficient. You also need to update RVI to enable virtual 
> > > > injection
> > > > when Ack on exit is cleared.
> > >
> > > But RVI should be updated in vmx_intr_assist in that case, since
> > > nvmx_intr_intercept shouldn't intercept the interrupt, as it should be
> > > handled normally.
> >
> > As we discussed before, vmx_intr_assist is invoked before
> nvmx_switch_guest.
> >
> > It is incorrectly to update RVI at that time since it might be still vmcs02
> being
> > active (if no pending softirq to make it invoked again).
> >
> > Also nvmx_intr_intercept does intercept Ack-on-exit=0 case:
> >
> > if ( intack.source == hvm_intsrc_pic ||
> >  intack.source == hvm_intsrc_lapic )
> > {
> > vmx_inject_extint(intack.vector, intack.source);
> >
> > ctrl = get_vvmcs(v, VM_EXIT_CONTROLS);
> > if ( ctrl & VM_EXIT_ACK_INTR_ON_EXIT )
> > {
> > /* for now, duplicate the ack path in vmx_intr_assist */
> > hvm_vcpu_ack_pending_irq(v, intack);
> > pt_intr_post(v, intack);
> >
> > intack = hvm_vcpu_has_pending_irq(v);
> > if ( unlikely(intack.source != hvm_intsrc_none) )
> > vmx_enable_intr_window(v, intack);
> > }
> > else if ( !cpu_has_vmx_virtual_intr_delivery )
> > vmx_enable_intr_window(v, intack);
> >
> > return 1; 
> 
> Right, I always get confused by the switch happening in the vmentry
> path.
> 
> That only happens when the vcpu is in nested mode
> (nestedhvm_vcpu_in_guestmode == true). That would be the case before a
> vmexit, at least for the first call to vmx_intr_assist if there are
> pending softirqs.
> 
> Note that if there are pending softirqs the second time
> nvmx_intr_intercept will return 0.

Exactly.

> 
> > }
> >
> > >
> > > > >
> > > > > Signed-off-by: Roger Pau Monné 
> > > > > ---
> > > > >  xen/arch/x86/hvm/vmx/vvmx.c | 11 ++-
> > > > >  1 file changed, 10 insertions(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/xen/arch/x86/hvm/vmx/vvmx.c
> > > b/xen/arch/x86/hvm/vmx/vvmx.c
> > > > > index 1b8461ba30..180d01e385 100644
> > > > > --- a/xen/arch/x86/hvm/vmx/vvmx.c
> > > > > +++ b/xen/arch/x86/hvm/vmx/vvmx.c
> > > > > @@ -1383,7 +1383,7 @@ static void (struct vcpu *v)
> > > > >  {
> > > > >  struct nestedvmx *nvmx = _2_nvmx(v);
> > > > >  unsigned long reason = get_vvmcs(v, VM_EXIT_REASON);
> > > > > -uint32_t intr_info = nvmx->intr.intr_info;
> > > > > +unsigned long intr_info = get_vvmcs(v, VM_EXIT_INTR_INFO);
> > > >
> > > > well, above reminds me an interesting question. Combining last and this
> > > > patch, we'd see essentially that it goes back to the state before
> f96e1469
> > > > (at least when Ack on exit is true).
> > >
> > > Well, patch 1/3 is a revert of f96e1469, so just reverting f96e1469
> > > gets us to that state.
> >
> > you are right. I just wanted to point out that this patch alone doesn't
> > do any real fix for ack-on-exit=1 case. It just makes sure that 
> > ack-on-exit=0
> > is skipped from that function. So it won't be the one fixing your previous
> > problem. 
> 
> Yes, that's correct.
> 
> > >
> > > This patch is an attempt to clarify that nvmx_update_apicv is closely
> > > related to the Ack on exit feature, as it modifies SVI in order to
> > > signal the 

Re: [Xen-devel] [PATCH 3/3] x86/nvmx: update exit bitmap on vmexit

2020-03-24 Thread Roger Pau Monné
On Tue, Mar 24, 2020 at 10:16:52AM +, Tian, Kevin wrote:
> > From: Roger Pau Monné 
> > Sent: Tuesday, March 24, 2020 5:59 PM
> > 
> > On Tue, Mar 24, 2020 at 06:22:43AM +, Tian, Kevin wrote:
> > > > From: Roger Pau Monne 
> > > > Sent: Saturday, March 21, 2020 3:08 AM
> > > >
> > > > Current code in nvmx_update_apicv set the guest interrupt status field
> > > > but doesn't update the exit bitmap, which can cause issues of lost
> > > > interrupts on the L1 hypervisor if vmx_intr_assist gets
> > > > short-circuited by nvmx_intr_intercept returning true.
> > >
> > > Above is not accurate. Currently Xen didn't choose to update the EOI
> > > exit bitmap every time when there is a change. Instead, it chose to
> > > batch the update before resuming to the guest. sort of optimization.
> > > So it is not related to whether SVI is changed. We should always do the
> > > bitmap update in nvmx_update_apicv, regardless of the setting of
> > > Ack-on-exit ...
> > 
> > But if Ack on exit is not set the GUEST_INTR_STATUS won't be changed
> > by nvmx_intr_intercept, and hence there's no need to update the EOI
> > exit map?
> > 
> > If OTOH the GUEST_INTR_STATUS field is changed by vmx_intr_assist the
> > bitmap will already be updated there.
> > 
> 
> If you agree with my comment in patch 2/3 about setting RVI for 
> ack-on-exit=0, then EOI bitmap update should be done there too.

Yes, I agree, see my reply to your comment. I (again) misremembered the
sequence and assumed the switch will happen prior to calling
vmx_intr_assist which is not the case.

Thanks, Roger.



Re: [Xen-devel] [PATCH 2/3] x86/nvmx: clarify and fix usage of nvmx_update_apicv

2020-03-24 Thread Roger Pau Monné
On Tue, Mar 24, 2020 at 10:11:15AM +, Tian, Kevin wrote:
> > From: Roger Pau Monné 
> > Sent: Tuesday, March 24, 2020 5:51 PM
> > 
> > On Tue, Mar 24, 2020 at 06:03:28AM +, Tian, Kevin wrote:
> > > > From: Roger Pau Monne 
> > > > Sent: Saturday, March 21, 2020 3:08 AM
> > > >
> > > > The current usage of nvmx_update_apicv is not clear: it is deeply
> > > > intertwined with the Ack interrupt on exit VMEXIT control.
> > > >
> > > > The code in nvmx_update_apicv should update the SVI (in service
> > interrupt)
> > > > field of the guest interrupt status only when the Ack interrupt on
> > > > exit is set, as it is used to record that the interrupt being
> > > > serviced is signaled in a vmcs field, and hence hasn't been injected
> > > > as on native. It's important to record the current in service
> > > > interrupt on the guest interrupt status field, or else further
> > > > interrupts won't respect the priority of the in service one.
> > > >
> > > > While clarifying the usage make sure that the SVI is only updated when
> > > > Ack on exit is set and the nested vmcs interrupt info field is valid. Or
> > > > else a guest not using the Ack on exit feature would loose interrupts as
> > > > they would be signaled as being in service on the guest interrupt
> > > > status field but won't actually be recorded on the interrupt info vmcs
> > > > field, neither injected in any other way.
> > >
> > > It is insufficient. You also need to update RVI to enable virtual 
> > > injection
> > > when Ack on exit is cleared.
> > 
> > But RVI should be updated in vmx_intr_assist in that case, since
> > nvmx_intr_intercept shouldn't intercept the interrupt, as it should be
> > handled normally.
> 
> As we discussed before, vmx_intr_assist is invoked before nvmx_switch_guest.
> 
> It is incorrectly to update RVI at that time since it might be still vmcs02 
> being 
> active (if no pending softirq to make it invoked again).
> 
> Also nvmx_intr_intercept does intercept Ack-on-exit=0 case:
> 
> if ( intack.source == hvm_intsrc_pic ||
>  intack.source == hvm_intsrc_lapic )
> {
> vmx_inject_extint(intack.vector, intack.source);
> 
> ctrl = get_vvmcs(v, VM_EXIT_CONTROLS);
> if ( ctrl & VM_EXIT_ACK_INTR_ON_EXIT )
> {
> /* for now, duplicate the ack path in vmx_intr_assist */
> hvm_vcpu_ack_pending_irq(v, intack);
> pt_intr_post(v, intack);
> 
> intack = hvm_vcpu_has_pending_irq(v);
> if ( unlikely(intack.source != hvm_intsrc_none) )
> vmx_enable_intr_window(v, intack);
> }
> else if ( !cpu_has_vmx_virtual_intr_delivery )
> vmx_enable_intr_window(v, intack);
> 
> return 1; 

Right, I always get confused by the switch happening in the vmentry
path.

That only happens when the vcpu is in nested mode
(nestedhvm_vcpu_in_guestmode == true). That would be the case before a
vmexit, at least for the first call to vmx_intr_assist if there are
pending softirqs.

Note that if there are pending softirqs the second time
nvmx_intr_intercept will return 0.

> }
> 
> > 
> > > >
> > > > Signed-off-by: Roger Pau Monné 
> > > > ---
> > > >  xen/arch/x86/hvm/vmx/vvmx.c | 11 ++-
> > > >  1 file changed, 10 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/xen/arch/x86/hvm/vmx/vvmx.c
> > b/xen/arch/x86/hvm/vmx/vvmx.c
> > > > index 1b8461ba30..180d01e385 100644
> > > > --- a/xen/arch/x86/hvm/vmx/vvmx.c
> > > > +++ b/xen/arch/x86/hvm/vmx/vvmx.c
> > > > @@ -1383,7 +1383,7 @@ static void (struct vcpu *v)
> > > >  {
> > > >  struct nestedvmx *nvmx = _2_nvmx(v);
> > > >  unsigned long reason = get_vvmcs(v, VM_EXIT_REASON);
> > > > -uint32_t intr_info = nvmx->intr.intr_info;
> > > > +unsigned long intr_info = get_vvmcs(v, VM_EXIT_INTR_INFO);
> > >
> > > well, above reminds me an interesting question. Combining last and this
> > > patch, we'd see essentially that it goes back to the state before f96e1469
> > > (at least when Ack on exit is true).
> > 
> > Well, patch 1/3 is a revert of f96e1469, so just reverting f96e1469
> > gets us to that state.
> 
> you are right. I just wanted to point out that this patch alone doesn't
> do any real fix for ack-on-exit=1 case. It just makes sure that ack-on-exit=0
> is skipped from that function. So it won't be the one fixing your previous
> problem. 

Yes, that's correct.

> > 
> > This patch is an attempt to clarify that nvmx_update_apicv is closely
> > related to the Ack on exit feature, as it modifies SVI in order to
> > signal the vector currently being serviced by the EXIT_INTR_INFO vmcs
> > field. This was not obvious to me, as at first sight I assumed
> > nvmx_update_apicv was actually injecting that vector into the guest.
> > 
> > > iirc, that commit was introduced to enable
> > > nested x2apic with apicv, and your very first version 

Re: [Xen-devel] [PATCH 1/3] Revert "x86/vvmx: fix virtual interrupt injection when Ack on exit control is used"

2020-03-24 Thread Tian, Kevin
> From: Roger Pau Monné 
> Sent: Tuesday, March 24, 2020 6:47 PM
> 
> On Tue, Mar 24, 2020 at 08:49:27AM +, Tian, Kevin wrote:
> > > From: Jan Beulich 
> > > Sent: Tuesday, March 24, 2020 4:10 PM
> > >
> > > On 24.03.2020 06:41, Tian, Kevin wrote:
> > > >> From: Roger Pau Monné 
> > > >> Sent: Monday, March 23, 2020 10:49 PM
> > > >>
> > > >> On Mon, Mar 23, 2020 at 09:09:59AM +0100, Jan Beulich wrote:
> > > >>> On 20.03.2020 20:07, Roger Pau Monne wrote:
> > >  This reverts commit f96e1469ad06b61796c60193daaeb9f8a96d7458.
> > > 
> > >  The commit is wrong, as the whole point of nvmx_update_apicv is
> to
> > >  update the guest interrupt status field when the Ack on exit VMEXIT
> > >  control feature is enabled.
> > > 
> > >  Signed-off-by: Roger Pau Monné 
> > > >>>
> > > >>> Before anyone gets to look at the other two patches, should this
> > > >>> be thrown in right away?
> > > >>
> > > >> I would like if possible get a confirmation from Kevin (or anyone
> > > >> else) that my understanding is correct. I find the nested code very
> > > >> confusing, and I've already made a mistake while trying to fix it.
> > > >> That being said, this was spotted by osstest as introducing a
> > > >> regression, so I guess it's safe to just toss it in now.
> > > >>
> > > >> FWIW patch 2/3 attempts to provide a description of my
> understanding
> > > >> of how nvmx_update_apicv works.
> > > >>
> > > >
> > > > I feel it is not good to take this patch alone, as it was introduced to 
> > > > fix
> > > > another problem. W/o understanding whether the whole series can
> > > > fix both old and new problems, we may risk putting nested interrupt
> > > > logic in an even worse state...
> > >
> > > Well, okay, I'll wait then, but it would seem to me that reverting
> > > wouldn't put us in a worse state than we were in before that change
> > > was put in.
> >
> > Roger needs to make the call, i.e. which problem is more severe, old or
> > new one.
> 
> AFAICT those problems affect different kind of hardware, so it doesn't
> make much difference. IMO f96e1469ad06b6 should be reverted because
> it's plain wrong, and albeit it seemed to fix the issue in one of my
> boxes it was just luck.
> 
> I don't thinks it's going to make matters worse, as osstest is already
> blocked, but reverting it alone without the rest of the series is not
> going to unblock osstest either. If you agree it's wrong I think it
> could go in now, even if it's just to avoid having to repost it with
> newer versions of the series.
> 

yes, I agree it's wrong.

Reviewed-by: Kevin Tian 


Re: [Xen-devel] [PATCH 1/3] Revert "x86/vvmx: fix virtual interrupt injection when Ack on exit control is used"

2020-03-24 Thread Roger Pau Monné
On Tue, Mar 24, 2020 at 08:49:27AM +, Tian, Kevin wrote:
> > From: Jan Beulich 
> > Sent: Tuesday, March 24, 2020 4:10 PM
> > 
> > On 24.03.2020 06:41, Tian, Kevin wrote:
> > >> From: Roger Pau Monné 
> > >> Sent: Monday, March 23, 2020 10:49 PM
> > >>
> > >> On Mon, Mar 23, 2020 at 09:09:59AM +0100, Jan Beulich wrote:
> > >>> On 20.03.2020 20:07, Roger Pau Monne wrote:
> >  This reverts commit f96e1469ad06b61796c60193daaeb9f8a96d7458.
> > 
> >  The commit is wrong, as the whole point of nvmx_update_apicv is to
> >  update the guest interrupt status field when the Ack on exit VMEXIT
> >  control feature is enabled.
> > 
> >  Signed-off-by: Roger Pau Monné 
> > >>>
> > >>> Before anyone gets to look at the other two patches, should this
> > >>> be thrown in right away?
> > >>
> > >> I would like if possible get a confirmation from Kevin (or anyone
> > >> else) that my understanding is correct. I find the nested code very
> > >> confusing, and I've already made a mistake while trying to fix it.
> > >> That being said, this was spotted by osstest as introducing a
> > >> regression, so I guess it's safe to just toss it in now.
> > >>
> > >> FWIW patch 2/3 attempts to provide a description of my understanding
> > >> of how nvmx_update_apicv works.
> > >>
> > >
> > > I feel it is not good to take this patch alone, as it was introduced to 
> > > fix
> > > another problem. W/o understanding whether the whole series can
> > > fix both old and new problems, we may risk putting nested interrupt
> > > logic in an even worse state...
> > 
> > Well, okay, I'll wait then, but it would seem to me that reverting
> > wouldn't put us in a worse state than we were in before that change
> > was put in.
> 
> Roger needs to make the call, i.e. which problem is more severe, old or
> new one.

AFAICT those problems affect different kind of hardware, so it doesn't
make much difference. IMO f96e1469ad06b6 should be reverted because
it's plain wrong, and albeit it seemed to fix the issue in one of my
boxes it was just luck.

I don't thinks it's going to make matters worse, as osstest is already
blocked, but reverting it alone without the rest of the series is not
going to unblock osstest either. If you agree it's wrong I think it
could go in now, even if it's just to avoid having to repost it with
newer versions of the series.

Thanks, Roger.



Re: [Xen-devel] [PATCH V6] x86/altp2m: Hypercall to set altp2m view visibility

2020-03-24 Thread Isaila Alexandru



Hi Kevin and sorry for the long reply time,

On 10.03.2020 04:04,  sTian, Kevin wrote:

From: Alexandru Stefan ISAILA 
Sent: Tuesday, March 3, 2020 8:23 PM

At this moment a guest can call vmfunc to change the altp2m view. This
should be limited in order to avoid any unwanted view switch.


I look forward to more elaboration of the motivation, especially for one
who doesn't track altp2m closely like me. For example, do_altp2m_op
mentions three modes: external, internal, coordinated. Then is this patch
trying to limit the view switch in all three modes or just one of them?
from the definition clearly external disallows guest to change any view
(then why do we want per-view visibility control) while the latter two
both allows guest to switch the view. later you noted some exception
with mixed (internal) mode. then is this restriction pushed just for
limited (coordinated) mode?



As you stated, there are some exceptions with mixed (internal) mode.
This restriction is clearly used for coordinated mode but it also 
restricts view switching in the external mode as well. I had a good 
example to start with, let's say we have one external agent in dom0 that 
uses view1 and view2 and the logic requires the switch between the 
views. At this point VMFUNC is available to the guest so with a simple 
asm code it can witch to view 0. At this time the external agent is not 
aware that the view has switched and further more view0 was not supposed 
to be in the main logic so it crashes. This example can be extended to 
any number of views. I hope it can paint a more clear picture of what 
this patch is trying to achive.



btw I'm not sure why altp2m invents two names per mode, and their
mapping looks a bit weird. e.g. isn't 'coordinated' mode sound more
like 'mixed' mode?


Yes that is true, it si a bit weird.





The new xc_altp2m_set_visibility() solves this by making views invisible
to vmfunc.


if one doesn't want to make view visible to vmfunc, why can't he just
avoids registering the view at the first place? Are you aiming for a
scenario that dom0 may register 10 views, with 5 views visible to
vmfunc with the other 5 views switched by dom0 itself?


That is one scenario, another can be that dom0 has a number of views 
created and in some time it wants to be sure that only some of the views 
can be switched, saving the rest and making them visible when the time 
is right. Sure the example given up is another example.


Regards,
Alex



Re: [Xen-devel] [PATCH] SVM: Add union intstat_t for offset 68h in vmcb struct

2020-03-24 Thread Wen Pu
On 2020/3/24 17:08, Jan Beulich wrote:
> On 24.03.2020 05:52, Pu Wen wrote:
>> --- a/xen/arch/x86/hvm/svm/nestedsvm.c
>> +++ b/xen/arch/x86/hvm/svm/nestedsvm.c
>> @@ -508,7 +508,7 @@ static int nsvm_vmcb_prepare4vmrun(struct vcpu *v, 
>> struct cpu_user_regs *regs)
>>   }
>>   
>>   /* Shadow Mode */
>> -n2vmcb->interrupt_shadow = ns_vmcb->interrupt_shadow;
>> +n2vmcb->int_stat.intr_shadow = ns_vmcb->int_stat.intr_shadow;
> 
> While bit 1 is irrelevant to VMRUN, I still wonder whether you
> shouldn't copy "raw" here.

Ok, will copy the whole "raw" as suggested, thanks.

>> @@ -1058,7 +1058,7 @@ nsvm_vmcb_prepare4vmexit(struct vcpu *v, struct 
>> cpu_user_regs *regs)
>>   ns_vmcb->_vintr.fields.intr_masking = 0;
>>   
>>   /* Shadow mode */
>> -ns_vmcb->interrupt_shadow = n2vmcb->interrupt_shadow;
>> +ns_vmcb->int_stat.intr_shadow = n2vmcb->int_stat.intr_shadow;
> 
> Same here, or at the very least you want to also copy bit 1 here.

Ok, will change to:
 /* Interrupt state */
 ns_vmcb->int_stat = n2vmcb->int_stat;

>> --- a/xen/arch/x86/hvm/svm/svmdebug.c
>> +++ b/xen/arch/x86/hvm/svm/svmdebug.c
>> @@ -51,9 +51,9 @@ void svm_vmcb_dump(const char *from, const struct 
>> vmcb_struct *vmcb)
>>   printk("iopm_base_pa = %#"PRIx64" msrpm_base_pa = %#"PRIx64" 
>> tsc_offset = %#"PRIx64"\n",
>>  vmcb_get_iopm_base_pa(vmcb), vmcb_get_msrpm_base_pa(vmcb),
>>  vmcb_get_tsc_offset(vmcb));
>> -printk("tlb_control = %#x vintr = %#"PRIx64" interrupt_shadow = 
>> %#"PRIx64"\n",
>> +printk("tlb_control = %#x vintr = %#"PRIx64" interrupt_shadow = %#x\n",
>>  vmcb->tlb_control, vmcb_get_vintr(vmcb).bytes,
>> -   vmcb->interrupt_shadow);
>> +   vmcb->int_stat.intr_shadow);

OK, will dump like this:
 printk("tlb_control = %#x vintr = %#"PRIx64" int_stat = %#"PRIx64"\n",
vmcb->tlb_control, vmcb_get_vintr(vmcb).bytes,
vmcb->int_stat.raw);

Thx.

-- 
Regards,
Pu Wen

[Xen-devel] [PATCH v2] SVM: Add union intstat_t for offset 68h in vmcb struct

2020-03-24 Thread Pu Wen
According to chapter "Appendix B Layout of VMCB" in the new version
(v3.32) AMD64 APM[1], bit 1 of the VMCB offset 68h is defined as
GUEST_INTERRUPT_MASK.

In current xen codes, it use whole u64 interrupt_shadow to setup
interrupt shadow, which will misuse other bit in VMCB offset 68h
as part of interrupt_shadow.

Add union intstat_t for VMCB offset 68h and fix codes to only use
bit 0 as intr_shadow according to the new APM description.

Reference:
[1] https://www.amd.com/system/files/TechDocs/24593.pdf

Signed-off-by: Pu Wen 
---
v1->v2:
  - Copy the whole int_stat in nsvm_vmcb_prepare4vmrun() and
nsvm_vmcb_prepare4vmexit().
  - Dump all 64 bits of int_stat in svm_vmcb_dump().

 xen/arch/x86/hvm/svm/nestedsvm.c   |  8 
 xen/arch/x86/hvm/svm/svm.c |  8 
 xen/arch/x86/hvm/svm/svmdebug.c|  4 ++--
 xen/include/asm-x86/hvm/svm/vmcb.h | 13 -
 4 files changed, 22 insertions(+), 11 deletions(-)

diff --git a/xen/arch/x86/hvm/svm/nestedsvm.c b/xen/arch/x86/hvm/svm/nestedsvm.c
index 3bd2a119d3..bbd06e342e 100644
--- a/xen/arch/x86/hvm/svm/nestedsvm.c
+++ b/xen/arch/x86/hvm/svm/nestedsvm.c
@@ -507,8 +507,8 @@ static int nsvm_vmcb_prepare4vmrun(struct vcpu *v, struct 
cpu_user_regs *regs)
 n2vmcb->_vintr.fields.intr_masking = 1;
 }
 
-/* Shadow Mode */
-n2vmcb->interrupt_shadow = ns_vmcb->interrupt_shadow;
+/* Interrupt state */
+n2vmcb->int_stat = ns_vmcb->int_stat;
 
 /* Exit codes */
 n2vmcb->exitcode = ns_vmcb->exitcode;
@@ -1057,8 +1057,8 @@ nsvm_vmcb_prepare4vmexit(struct vcpu *v, struct 
cpu_user_regs *regs)
 if (!(svm->ns_hostflags.fields.vintrmask))
 ns_vmcb->_vintr.fields.intr_masking = 0;
 
-/* Shadow mode */
-ns_vmcb->interrupt_shadow = n2vmcb->interrupt_shadow;
+/* Interrupt state */
+ns_vmcb->int_stat = n2vmcb->int_stat;
 
 /* Exit codes */
 ns_vmcb->exitcode = n2vmcb->exitcode;
diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
index 32d8d847f2..888f504a94 100644
--- a/xen/arch/x86/hvm/svm/svm.c
+++ b/xen/arch/x86/hvm/svm/svm.c
@@ -116,7 +116,7 @@ void __update_guest_eip(struct cpu_user_regs *regs, 
unsigned int inst_len)
 regs->rip += inst_len;
 regs->eflags &= ~X86_EFLAGS_RF;
 
-curr->arch.hvm.svm.vmcb->interrupt_shadow = 0;
+curr->arch.hvm.svm.vmcb->int_stat.intr_shadow = 0;
 
 if ( regs->eflags & X86_EFLAGS_TF )
 hvm_inject_hw_exception(TRAP_debug, X86_EVENT_NO_EC);
@@ -432,7 +432,7 @@ static unsigned int svm_get_interrupt_shadow(struct vcpu *v)
 struct vmcb_struct *vmcb = v->arch.hvm.svm.vmcb;
 unsigned int intr_shadow = 0;
 
-if ( vmcb->interrupt_shadow )
+if ( vmcb->int_stat.intr_shadow )
 intr_shadow |= HVM_INTR_SHADOW_MOV_SS | HVM_INTR_SHADOW_STI;
 
 if ( vmcb_get_general1_intercepts(vmcb) & GENERAL1_INTERCEPT_IRET )
@@ -446,7 +446,7 @@ static void svm_set_interrupt_shadow(struct vcpu *v, 
unsigned int intr_shadow)
 struct vmcb_struct *vmcb = v->arch.hvm.svm.vmcb;
 u32 general1_intercepts = vmcb_get_general1_intercepts(vmcb);
 
-vmcb->interrupt_shadow =
+vmcb->int_stat.intr_shadow =
 !!(intr_shadow & (HVM_INTR_SHADOW_MOV_SS|HVM_INTR_SHADOW_STI));
 
 general1_intercepts &= ~GENERAL1_INTERCEPT_IRET;
@@ -2945,7 +2945,7 @@ void svm_vmexit_handler(struct cpu_user_regs *regs)
  * retired.
  */
 general1_intercepts &= ~GENERAL1_INTERCEPT_IRET;
-vmcb->interrupt_shadow = 1;
+vmcb->int_stat.intr_shadow = 1;
 
 vmcb_set_general1_intercepts(vmcb, general1_intercepts);
 break;
diff --git a/xen/arch/x86/hvm/svm/svmdebug.c b/xen/arch/x86/hvm/svm/svmdebug.c
index 366a003f21..5aa9d410ba 100644
--- a/xen/arch/x86/hvm/svm/svmdebug.c
+++ b/xen/arch/x86/hvm/svm/svmdebug.c
@@ -51,9 +51,9 @@ void svm_vmcb_dump(const char *from, const struct vmcb_struct 
*vmcb)
 printk("iopm_base_pa = %#"PRIx64" msrpm_base_pa = %#"PRIx64" tsc_offset = 
%#"PRIx64"\n",
vmcb_get_iopm_base_pa(vmcb), vmcb_get_msrpm_base_pa(vmcb),
vmcb_get_tsc_offset(vmcb));
-printk("tlb_control = %#x vintr = %#"PRIx64" interrupt_shadow = 
%#"PRIx64"\n",
+printk("tlb_control = %#x vintr = %#"PRIx64" int_stat = %#"PRIx64"\n",
vmcb->tlb_control, vmcb_get_vintr(vmcb).bytes,
-   vmcb->interrupt_shadow);
+   vmcb->int_stat.raw);
 printk("event_inj %016"PRIx64", valid? %d, ec? %d, type %u, vector %#x\n",
vmcb->event_inj.raw, vmcb->event_inj.v,
vmcb->event_inj.ev, vmcb->event_inj.type,
diff --git a/xen/include/asm-x86/hvm/svm/vmcb.h 
b/xen/include/asm-x86/hvm/svm/vmcb.h
index b9e389d481..d8a3285752 100644
--- a/xen/include/asm-x86/hvm/svm/vmcb.h
+++ b/xen/include/asm-x86/hvm/svm/vmcb.h
@@ -316,6 +316,17 @@ typedef union
 uint64_t raw;
 } intinfo_t;
 
+typedef union
+{
+struct
+{
+u64 intr_shadow:1;
+u64 guest_intr_mask:1;
+u64 resvd:  62;
+};
+

Re: [Xen-devel] [PATCH v4 0/3] make sure PGC_extra pages are dealt with properly

2020-03-24 Thread Paul Durrant
> -Original Message-
> From: Jan Beulich 
> Sent: 24 March 2020 09:39
> To: Paul Durrant 
> Cc: xen-devel@lists.xenproject.org; Andrew Cooper 
> ; George Dunlap
> ; Ian Jackson ; Julien 
> Grall ;
> Konrad Rzeszutek Wilk ; Roger Pau Monné 
> ; Stefano
> Stabellini ; Tim Deegan ; Wei Liu 
> 
> Subject: Re: [PATCH v4 0/3] make sure PGC_extra pages are dealt with properly
> 
> On 18.03.2020 18:32, Paul Durrant wrote:
> > This series was formerly called "remove one more shared xenheap page:
> > shared_info" but I have dropped the patches actually changing shared_info
> > and just left the PGC_extra clean-up that was previously intertwined.
> >
> > Paul Durrant (3):
> >   mm: keep PGC_extra pages on a separate list
> >   x86 / ioreq: use a MEMF_no_refcount allocation for server pages...
> >   mm: add 'is_special_page' inline function...
> 
> So I'm confused - I had just replied twice to v6 patch 5/5. This
> series calls itself v4 and consists of the middle three patches
> of what v6 was. What's the deal? Is this really v7, and the two
> patches have been dropped / postponed?

Sorry, I clearly got confused with numbering against one of my other series. 
Yes, this should be v7.

I wanted to send the patches that clear up use of PGC_extra, separating from 
the change to shared_info since I'm pressed for time to complete all the other 
conversions from xenheap pages such that I can send them as a single series.

  Paul

> 
> Jan




Re: [Xen-devel] [PATCH v4 1/2] xen: Use evtchn_type_t as a type for event channels

2020-03-24 Thread Yan Yankovskyi
On Mon, Mar 23, 2020 at 05:55:39PM -0400, Boris Ostrovsky wrote:
> 
> On 3/23/20 12:15 PM, Yan Yankovskyi wrote:
> > Make event channel functions pass event channel port using
> > evtchn_port_t type. It eliminates signed <-> unsigned conversion.
> >
> > Signed-off-by: Yan Yankovskyi 
> 
> 
> If the difference is only the whitespace fix

There were two more fixes: missing include in gntdev-common.h and
a variable initialization in bind_virq_to_irq (eliminates warning).



Re: [Xen-devel] [PATCH 3/3] x86/nvmx: update exit bitmap on vmexit

2020-03-24 Thread Tian, Kevin
> From: Roger Pau Monné 
> Sent: Tuesday, March 24, 2020 5:59 PM
> 
> On Tue, Mar 24, 2020 at 06:22:43AM +, Tian, Kevin wrote:
> > > From: Roger Pau Monne 
> > > Sent: Saturday, March 21, 2020 3:08 AM
> > >
> > > Current code in nvmx_update_apicv set the guest interrupt status field
> > > but doesn't update the exit bitmap, which can cause issues of lost
> > > interrupts on the L1 hypervisor if vmx_intr_assist gets
> > > short-circuited by nvmx_intr_intercept returning true.
> >
> > Above is not accurate. Currently Xen didn't choose to update the EOI
> > exit bitmap every time when there is a change. Instead, it chose to
> > batch the update before resuming to the guest. sort of optimization.
> > So it is not related to whether SVI is changed. We should always do the
> > bitmap update in nvmx_update_apicv, regardless of the setting of
> > Ack-on-exit ...
> 
> But if Ack on exit is not set the GUEST_INTR_STATUS won't be changed
> by nvmx_intr_intercept, and hence there's no need to update the EOI
> exit map?
> 
> If OTOH the GUEST_INTR_STATUS field is changed by vmx_intr_assist the
> bitmap will already be updated there.
> 

If you agree with my comment in patch 2/3 about setting RVI for 
ack-on-exit=0, then EOI bitmap update should be done there too.

Thanks
Kevin


[Xen-devel] [PATCH] tools/xenstore: fix a use after free problem in xenstored

2020-03-24 Thread Juergen Gross
Commit 562a1c0f7ef3fb ("tools/xenstore: dont unlink connection object
twice") introduced a potential use after free problem in
domain_cleanup(): after calling talloc_unlink() for domain->conn
domain->conn is set to NULL. The problem is that domain is registered
as talloc child of domain->conn, so it might be freed by the
talloc_unlink() call.

Fixes: 562a1c0f7ef3fb ("tools/xenstore: dont unlink connection object twice")
Signed-off-by: Juergen Gross 
---
 tools/xenstore/xenstored_domain.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/tools/xenstore/xenstored_domain.c 
b/tools/xenstore/xenstored_domain.c
index baddaba5df..5858185211 100644
--- a/tools/xenstore/xenstored_domain.c
+++ b/tools/xenstore/xenstored_domain.c
@@ -214,6 +214,7 @@ static void domain_cleanup(void)
 {
xc_dominfo_t dominfo;
struct domain *domain;
+   struct connection *conn;
int notify = 0;
 
  again:
@@ -230,8 +231,10 @@ static void domain_cleanup(void)
continue;
}
if (domain->conn) {
-   talloc_unlink(talloc_autofree_context(), domain->conn);
+   /* domain is a talloc child of domain->conn. */
+   conn = domain->conn;
domain->conn = NULL;
+   talloc_unlink(talloc_autofree_context(), conn);
notify = 0; /* destroy_domain() fires the watch */
goto again;
}
-- 
2.16.4




Re: [Xen-devel] [PATCH 2/3] x86/nvmx: clarify and fix usage of nvmx_update_apicv

2020-03-24 Thread Tian, Kevin
> From: Roger Pau Monné 
> Sent: Tuesday, March 24, 2020 5:51 PM
> 
> On Tue, Mar 24, 2020 at 06:03:28AM +, Tian, Kevin wrote:
> > > From: Roger Pau Monne 
> > > Sent: Saturday, March 21, 2020 3:08 AM
> > >
> > > The current usage of nvmx_update_apicv is not clear: it is deeply
> > > intertwined with the Ack interrupt on exit VMEXIT control.
> > >
> > > The code in nvmx_update_apicv should update the SVI (in service
> interrupt)
> > > field of the guest interrupt status only when the Ack interrupt on
> > > exit is set, as it is used to record that the interrupt being
> > > serviced is signaled in a vmcs field, and hence hasn't been injected
> > > as on native. It's important to record the current in service
> > > interrupt on the guest interrupt status field, or else further
> > > interrupts won't respect the priority of the in service one.
> > >
> > > While clarifying the usage make sure that the SVI is only updated when
> > > Ack on exit is set and the nested vmcs interrupt info field is valid. Or
> > > else a guest not using the Ack on exit feature would loose interrupts as
> > > they would be signaled as being in service on the guest interrupt
> > > status field but won't actually be recorded on the interrupt info vmcs
> > > field, neither injected in any other way.
> >
> > It is insufficient. You also need to update RVI to enable virtual injection
> > when Ack on exit is cleared.
> 
> But RVI should be updated in vmx_intr_assist in that case, since
> nvmx_intr_intercept shouldn't intercept the interrupt, as it should be
> handled normally.

As we discussed before, vmx_intr_assist is invoked before nvmx_switch_guest.
It is incorrectly to update RVI at that time since it might be still vmcs02 
being 
active (if no pending softirq to make it invoked again).

Also nvmx_intr_intercept does intercept Ack-on-exit=0 case:

if ( intack.source == hvm_intsrc_pic ||
 intack.source == hvm_intsrc_lapic )
{
vmx_inject_extint(intack.vector, intack.source);

ctrl = get_vvmcs(v, VM_EXIT_CONTROLS);
if ( ctrl & VM_EXIT_ACK_INTR_ON_EXIT )
{
/* for now, duplicate the ack path in vmx_intr_assist */
hvm_vcpu_ack_pending_irq(v, intack);
pt_intr_post(v, intack);

intack = hvm_vcpu_has_pending_irq(v);
if ( unlikely(intack.source != hvm_intsrc_none) )
vmx_enable_intr_window(v, intack);
}
else if ( !cpu_has_vmx_virtual_intr_delivery )
vmx_enable_intr_window(v, intack);

return 1; 
}

> 
> > >
> > > Signed-off-by: Roger Pau Monné 
> > > ---
> > >  xen/arch/x86/hvm/vmx/vvmx.c | 11 ++-
> > >  1 file changed, 10 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/xen/arch/x86/hvm/vmx/vvmx.c
> b/xen/arch/x86/hvm/vmx/vvmx.c
> > > index 1b8461ba30..180d01e385 100644
> > > --- a/xen/arch/x86/hvm/vmx/vvmx.c
> > > +++ b/xen/arch/x86/hvm/vmx/vvmx.c
> > > @@ -1383,7 +1383,7 @@ static void (struct vcpu *v)
> > >  {
> > >  struct nestedvmx *nvmx = _2_nvmx(v);
> > >  unsigned long reason = get_vvmcs(v, VM_EXIT_REASON);
> > > -uint32_t intr_info = nvmx->intr.intr_info;
> > > +unsigned long intr_info = get_vvmcs(v, VM_EXIT_INTR_INFO);
> >
> > well, above reminds me an interesting question. Combining last and this
> > patch, we'd see essentially that it goes back to the state before f96e1469
> > (at least when Ack on exit is true).
> 
> Well, patch 1/3 is a revert of f96e1469, so just reverting f96e1469
> gets us to that state.

you are right. I just wanted to point out that this patch alone doesn't
do any real fix for ack-on-exit=1 case. It just makes sure that ack-on-exit=0
is skipped from that function. So it won't be the one fixing your previous
problem. 

> 
> This patch is an attempt to clarify that nvmx_update_apicv is closely
> related to the Ack on exit feature, as it modifies SVI in order to
> signal the vector currently being serviced by the EXIT_INTR_INFO vmcs
> field. This was not obvious to me, as at first sight I assumed
> nvmx_update_apicv was actually injecting that vector into the guest.
> 
> > iirc, that commit was introduced to enable
> > nested x2apic with apicv, and your very first version even just removed
> > the whole nvmx_update_apicv. Then now with the new reverted logic,
> > are you still suffering x2apic problem? If not, does it imply the real fix
> > is actually coming from patch 3/3 for eoi bitmap update?
> 
> Yes, patch 3/3 is the one that fixed the issue. Note however that
> strangely enough removing the call to nvmx_update_apicv (as my first
> attempt to solve the issue) did fix it on one of my boxes. It depends
> a lot on the available vmx features AFAICT.

Did you confirm that with 3/3 alone can fix that issue? Just want to make
sure the real gain of each patch, so we can reflect it in the commit msg
in updated version.

> 
> > >
> > >   

Re: [Xen-devel] [PATCH 2/2] xen/mm: Introduce PGC_state_uninitialised

2020-03-24 Thread Julien Grall

Hi David,

On 23/03/2020 10:55, David Woodhouse wrote:

On Mon, 2020-03-23 at 09:34 +, Julien Grall wrote:

For liveupdate, we will need a way to initialize a page but mark it as
already inuse (i.e in the same state as they would be if allocated
normally).


I am unconvinced of the veracity of this claim.

We don't want to turn specific details of the current Xen buddy
allocator part into of the implicit ABI of live update. That goes for
the power-of-two zone boundaries, amongst other things.


Why would you to do that? Marking the page as already used is no 
different to "PGC_state_unitialized" except the "struct page_info" and 
the internal of the buddy allocator would be properly setup for start 
rather than at free.




What if Xen receives LU state in which *all* pages in a given zone are
marked as already in use? That's one of the cases in which we *really*
want to pass through init_heap_pages() instead of just
free_heap_pages(), in order to allocate the zone data structures for
the first pages that get freed into that zone.

What if Xen starts to exclude more pages, like the exclusion at zero?

What if new Xen wants to exclude an additional page due to a hardware
erratum? It can't take it away from existing domains (especially if
there are assigned PCI devices) but it could be part of the vetting in
init_heap_pages(), for example.


I don't think it would be safe to continue to run a guest using pages 
that were excluded for an HW erratum. It would be safer to not restart 
the domain (or replace the page) in the target Xen if that's hapenning.




My intent for PGC_state_uninitialised was to mark pages that haven't
been through init_heap_pages(), whatever init_heap_pages() does in the
current version of Xen.

The pages which are "already in use" because they're inherited through
LU state should be in PGC_state_uninitialised, shouldn't they?


I think using "PGC_state_unitialised" for preserved page is an abuse. I 
understand this is existing in other part of Xen (particularly on x86), 
but I would rather not try to add more.


The PGC_state_unitialised may work for the current allocator because 
most of the fields are not going to be used after allocation. But it may 
not hold for any new allocator (I know the embedded folks are working on 
a new one).




Perhaps if there's a need for a helper, it could be a companion
function to init_heap_pages() which would return a boolean saying,
"nah, I didn't want to do anything to this page anyway", which could
short-circuit it into the PGC_state_inuse state. But I'm not sure I see
the need for such an optimisation.


I don't view it as an optimisation but as a way to avoid spreading the 
current misbehavior.


Cheers,

--
Julien Grall



Re: [Xen-devel] [PATCH 3/3] x86/nvmx: update exit bitmap on vmexit

2020-03-24 Thread Roger Pau Monné
On Tue, Mar 24, 2020 at 06:22:43AM +, Tian, Kevin wrote:
> > From: Roger Pau Monne 
> > Sent: Saturday, March 21, 2020 3:08 AM
> > 
> > Current code in nvmx_update_apicv set the guest interrupt status field
> > but doesn't update the exit bitmap, which can cause issues of lost
> > interrupts on the L1 hypervisor if vmx_intr_assist gets
> > short-circuited by nvmx_intr_intercept returning true.
> 
> Above is not accurate. Currently Xen didn't choose to update the EOI
> exit bitmap every time when there is a change. Instead, it chose to 
> batch the update before resuming to the guest. sort of optimization.
> So it is not related to whether SVI is changed. We should always do the 
> bitmap update in nvmx_update_apicv, regardless of the setting of
> Ack-on-exit ...

But if Ack on exit is not set the GUEST_INTR_STATUS won't be changed
by nvmx_intr_intercept, and hence there's no need to update the EOI
exit map?

If OTOH the GUEST_INTR_STATUS field is changed by vmx_intr_assist the
bitmap will already be updated there.

Thanks, Roger.



Re: [Xen-devel] [PATCH 2/3] x86/nvmx: clarify and fix usage of nvmx_update_apicv

2020-03-24 Thread Roger Pau Monné
On Tue, Mar 24, 2020 at 06:03:28AM +, Tian, Kevin wrote:
> > From: Roger Pau Monne 
> > Sent: Saturday, March 21, 2020 3:08 AM
> > 
> > The current usage of nvmx_update_apicv is not clear: it is deeply
> > intertwined with the Ack interrupt on exit VMEXIT control.
> > 
> > The code in nvmx_update_apicv should update the SVI (in service interrupt)
> > field of the guest interrupt status only when the Ack interrupt on
> > exit is set, as it is used to record that the interrupt being
> > serviced is signaled in a vmcs field, and hence hasn't been injected
> > as on native. It's important to record the current in service
> > interrupt on the guest interrupt status field, or else further
> > interrupts won't respect the priority of the in service one.
> > 
> > While clarifying the usage make sure that the SVI is only updated when
> > Ack on exit is set and the nested vmcs interrupt info field is valid. Or
> > else a guest not using the Ack on exit feature would loose interrupts as
> > they would be signaled as being in service on the guest interrupt
> > status field but won't actually be recorded on the interrupt info vmcs
> > field, neither injected in any other way.
> 
> It is insufficient. You also need to update RVI to enable virtual injection
> when Ack on exit is cleared.

But RVI should be updated in vmx_intr_assist in that case, since
nvmx_intr_intercept shouldn't intercept the interrupt, as it should be
handled normally.

> > 
> > Signed-off-by: Roger Pau Monné 
> > ---
> >  xen/arch/x86/hvm/vmx/vvmx.c | 11 ++-
> >  1 file changed, 10 insertions(+), 1 deletion(-)
> > 
> > diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
> > index 1b8461ba30..180d01e385 100644
> > --- a/xen/arch/x86/hvm/vmx/vvmx.c
> > +++ b/xen/arch/x86/hvm/vmx/vvmx.c
> > @@ -1383,7 +1383,7 @@ static void (struct vcpu *v)
> >  {
> >  struct nestedvmx *nvmx = _2_nvmx(v);
> >  unsigned long reason = get_vvmcs(v, VM_EXIT_REASON);
> > -uint32_t intr_info = nvmx->intr.intr_info;
> > +unsigned long intr_info = get_vvmcs(v, VM_EXIT_INTR_INFO);
> 
> well, above reminds me an interesting question. Combining last and this
> patch, we'd see essentially that it goes back to the state before f96e1469
> (at least when Ack on exit is true).

Well, patch 1/3 is a revert of f96e1469, so just reverting f96e1469
gets us to that state.

This patch is an attempt to clarify that nvmx_update_apicv is closely
related to the Ack on exit feature, as it modifies SVI in order to
signal the vector currently being serviced by the EXIT_INTR_INFO vmcs
field. This was not obvious to me, as at first sight I assumed
nvmx_update_apicv was actually injecting that vector into the guest.

> iirc, that commit was introduced to enable
> nested x2apic with apicv, and your very first version even just removed
> the whole nvmx_update_apicv. Then now with the new reverted logic,
> are you still suffering x2apic problem? If not, does it imply the real fix
> is actually coming from patch 3/3 for eoi bitmap update?

Yes, patch 3/3 is the one that fixed the issue. Note however that
strangely enough removing the call to nvmx_update_apicv (as my first
attempt to solve the issue) did fix it on one of my boxes. It depends
a lot on the available vmx features AFAICT.

> > 
> >  if ( reason == EXIT_REASON_EXTERNAL_INTERRUPT &&
> >   nvmx->intr.source == hvm_intsrc_lapic &&
> > @@ -1399,6 +1399,15 @@ static void nvmx_update_apicv(struct vcpu *v)
> >  ppr = vlapic_set_ppr(vlapic);
> >  WARN_ON((ppr & 0xf0) != (vector & 0xf0));
> > 
> > +/*
> > + * SVI must be updated when the interrupt has been signaled using 
> > the
> > + * Ack on exit feature, or else the currently in-service interrupt
> > + * won't be respected.
> > + *
> > + * Note that this is specific to the fact that when doing a VMEXIT 
> > an
> > + * interrupt might get delivered using the interrupt info vmcs 
> > field
> > + * instead of being injected normally.
> > + */
> 
> I'm not sure mentioning SVI specifically here is necessary, since all steps
> here are required - updating iir, isr, rvi, svi, ppr, etc. It is just an 
> emulation
> of updating virtual APIC state as if a virtual interrupt delivery happens.

Hm, it was hard for me to figure out that SVI is modified here in
order to signal that the Ack on exit vector is currently in service,
as opposed to being injected using the virtual interrupt delivery
mechanism.

I just wanted to clarify that the purpose of this function is not to
inject the vector in intr_info, but rather to signal that such vector
has already been injected using a different mechanism.

I'm happy to reword this, but IMO it should be clear that the purpose
of the function is not so much to inject an interrupt, but to update
the virtual interrupt delivery field in order to signal that an
interrupt has been injected using a different mechanism.

Thanks, Roger.


[Xen-devel] Ping: [PATCH] x86/HVM: fix AMD ECS handling for Fam 10

2020-03-24 Thread Jan Beulich
On 16.03.2020 14:41, Andrew Cooper wrote:
> On 16/03/2020 11:00, Jan Beulich wrote:
>> The involved comparison was, very likely inadvertently, converted from
>>> = to > when making changes unrelated to the actual family range.
>>
>> Fixes: 9841eb71ea87 ("x86/cpuid: Drop a guests cached x86 family and model 
>> information")
>> Signed-off-by: Jan Beulich 
> 
> Reviewed-by: Andrew Cooper 

Paul?

>> --- a/xen/arch/x86/hvm/ioreq.c
>> +++ b/xen/arch/x86/hvm/ioreq.c
>> @@ -1284,7 +1284,7 @@ struct hvm_ioreq_server *hvm_select_iore
>>  if ( CF8_ADDR_HI(cf8) &&
>>   d->arch.cpuid->x86_vendor == X86_VENDOR_AMD &&
>>   (x86_fam = get_cpu_family(
>> - d->arch.cpuid->basic.raw_fms, NULL, NULL)) > 0x10 &&
>> + d->arch.cpuid->basic.raw_fms, NULL, NULL)) >= 0x10 &&
>>   x86_fam < 0x17 )
>>  {
>>  uint64_t msr_val;
>>
> 




Re: [Xen-devel] [PATCH v4 0/3] make sure PGC_extra pages are dealt with properly

2020-03-24 Thread Jan Beulich
On 18.03.2020 18:32, Paul Durrant wrote:
> This series was formerly called "remove one more shared xenheap page:
> shared_info" but I have dropped the patches actually changing shared_info
> and just left the PGC_extra clean-up that was previously intertwined.
> 
> Paul Durrant (3):
>   mm: keep PGC_extra pages on a separate list
>   x86 / ioreq: use a MEMF_no_refcount allocation for server pages...
>   mm: add 'is_special_page' inline function...

So I'm confused - I had just replied twice to v6 patch 5/5. This
series calls itself v4 and consists of the middle three patches
of what v6 was. What's the deal? Is this really v7, and the two
patches have been dropped / postponed?

Jan



Re: [Xen-devel] [PATCH v6 5/5] domain: use PGC_extra domheap page for shared_info

2020-03-24 Thread Jan Beulich
On 24.03.2020 10:26, Jan Beulich wrote:
> On 10.03.2020 18:49, p...@xen.org wrote:
>> From: Paul Durrant 
>>
>> Currently shared_info is a shared xenheap page but shared xenheap pages
>> complicate future plans for live-update of Xen so it is desirable to,
>> where possible, not use them [1]. This patch therefore converts shared_info
>> into a PGC_extra domheap page. This does entail freeing shared_info during
>> domain_relinquish_resources() rather than domain_destroy() so care is
>> needed to avoid de-referencing a NULL shared_info pointer hence some
>> extra checks of 'is_dying' are needed.
>>
>> NOTE: For Arm, the call to free_shared_info() in arch_domain_destroy() is
>>   left in place since it is idempotent and called in the error path for
>>   arch_domain_create().
>>
>> [1] See 
>> https://lists.xenproject.org/archives/html/xen-devel/2020-02/msg02018.html
>>
>> Signed-off-by: Paul Durrant 
>> ---
>> Cc: Stefano Stabellini 
>> Cc: Julien Grall 
>> Cc: Volodymyr Babchuk 
>> Cc: Andrew Cooper 
>> Cc: George Dunlap 
>> Cc: Ian Jackson 
>> Cc: Jan Beulich 
>> Cc: Konrad Rzeszutek Wilk 
>> Cc: Wei Liu 
>>
>> v6:
>>  - Drop dump_shared_info() but tag the shared info in the 'ExtraPage'
>>dump
>>
>> v5:
>>  - Incorporate new dump_shared_info() function
>>
>> v2:
>>  - Addressed comments from Julien
>>  - Expanded the commit comment to explain why this patch is wanted
>> ---
>>  xen/arch/arm/domain.c  |  2 ++
> 
> Julien, Stefano? (I'd prefer to commit the entire series in one go,
> rather than leaving out just this last patch.)

Actually - never mind, I've just realized that there are still some
pending items on the last two patches of this series.

Jan



Re: [Xen-devel] [PATCH v6 5/5] domain: use PGC_extra domheap page for shared_info

2020-03-24 Thread Jan Beulich
On 10.03.2020 18:49, p...@xen.org wrote:
> From: Paul Durrant 
> 
> Currently shared_info is a shared xenheap page but shared xenheap pages
> complicate future plans for live-update of Xen so it is desirable to,
> where possible, not use them [1]. This patch therefore converts shared_info
> into a PGC_extra domheap page. This does entail freeing shared_info during
> domain_relinquish_resources() rather than domain_destroy() so care is
> needed to avoid de-referencing a NULL shared_info pointer hence some
> extra checks of 'is_dying' are needed.
> 
> NOTE: For Arm, the call to free_shared_info() in arch_domain_destroy() is
>   left in place since it is idempotent and called in the error path for
>   arch_domain_create().
> 
> [1] See 
> https://lists.xenproject.org/archives/html/xen-devel/2020-02/msg02018.html
> 
> Signed-off-by: Paul Durrant 
> ---
> Cc: Stefano Stabellini 
> Cc: Julien Grall 
> Cc: Volodymyr Babchuk 
> Cc: Andrew Cooper 
> Cc: George Dunlap 
> Cc: Ian Jackson 
> Cc: Jan Beulich 
> Cc: Konrad Rzeszutek Wilk 
> Cc: Wei Liu 
> 
> v6:
>  - Drop dump_shared_info() but tag the shared info in the 'ExtraPage'
>dump
> 
> v5:
>  - Incorporate new dump_shared_info() function
> 
> v2:
>  - Addressed comments from Julien
>  - Expanded the commit comment to explain why this patch is wanted
> ---
>  xen/arch/arm/domain.c  |  2 ++

Julien, Stefano? (I'd prefer to commit the entire series in one go,
rather than leaving out just this last patch.)

Jan



Re: [Xen-devel] [PATCH] SVM: Add union intstat_t for offset 68h in vmcb struct

2020-03-24 Thread Jan Beulich
On 24.03.2020 05:52, Pu Wen wrote:
> --- a/xen/arch/x86/hvm/svm/nestedsvm.c
> +++ b/xen/arch/x86/hvm/svm/nestedsvm.c
> @@ -508,7 +508,7 @@ static int nsvm_vmcb_prepare4vmrun(struct vcpu *v, struct 
> cpu_user_regs *regs)
>  }
>  
>  /* Shadow Mode */
> -n2vmcb->interrupt_shadow = ns_vmcb->interrupt_shadow;
> +n2vmcb->int_stat.intr_shadow = ns_vmcb->int_stat.intr_shadow;

While bit 1 is irrelevant to VMRUN, I still wonder whether you
shouldn't copy "raw" here.

> @@ -1058,7 +1058,7 @@ nsvm_vmcb_prepare4vmexit(struct vcpu *v, struct 
> cpu_user_regs *regs)
>  ns_vmcb->_vintr.fields.intr_masking = 0;
>  
>  /* Shadow mode */
> -ns_vmcb->interrupt_shadow = n2vmcb->interrupt_shadow;
> +ns_vmcb->int_stat.intr_shadow = n2vmcb->int_stat.intr_shadow;

Same here, or at the very least you want to also copy bit 1 here.

> --- a/xen/arch/x86/hvm/svm/svmdebug.c
> +++ b/xen/arch/x86/hvm/svm/svmdebug.c
> @@ -51,9 +51,9 @@ void svm_vmcb_dump(const char *from, const struct 
> vmcb_struct *vmcb)
>  printk("iopm_base_pa = %#"PRIx64" msrpm_base_pa = %#"PRIx64" tsc_offset 
> = %#"PRIx64"\n",
> vmcb_get_iopm_base_pa(vmcb), vmcb_get_msrpm_base_pa(vmcb),
> vmcb_get_tsc_offset(vmcb));
> -printk("tlb_control = %#x vintr = %#"PRIx64" interrupt_shadow = 
> %#"PRIx64"\n",
> +printk("tlb_control = %#x vintr = %#"PRIx64" interrupt_shadow = %#x\n",
> vmcb->tlb_control, vmcb_get_vintr(vmcb).bytes,
> -   vmcb->interrupt_shadow);
> +   vmcb->int_stat.intr_shadow);

Please dump all 64 bits here.

Jan



Re: [Xen-devel] [PATCH] x86/mce: Correct the machine check vendor for Hygon

2020-03-24 Thread Jan Beulich
On 24.03.2020 05:51, Pu Wen wrote:
> Currently the xl dmesg output on Hygon platforms will be
> "(XEN) CPU0: AMD Fam18h machine check reporting enabled",
> which is misleading as AMD does not have family 18h (Hygon
> negotiated with AMD to confirm that only Hygon has family 18h).
> 
> To correct this, add Hygon machine check type and vendor string.
> 
> Signed-off-by: Pu Wen 

Reviewed-by: Jan Beulich 



Re: [Xen-devel] [PATCH 1/3] Revert "x86/vvmx: fix virtual interrupt injection when Ack on exit control is used"

2020-03-24 Thread Tian, Kevin
> From: Jan Beulich 
> Sent: Tuesday, March 24, 2020 4:10 PM
> 
> On 24.03.2020 06:41, Tian, Kevin wrote:
> >> From: Roger Pau Monné 
> >> Sent: Monday, March 23, 2020 10:49 PM
> >>
> >> On Mon, Mar 23, 2020 at 09:09:59AM +0100, Jan Beulich wrote:
> >>> On 20.03.2020 20:07, Roger Pau Monne wrote:
>  This reverts commit f96e1469ad06b61796c60193daaeb9f8a96d7458.
> 
>  The commit is wrong, as the whole point of nvmx_update_apicv is to
>  update the guest interrupt status field when the Ack on exit VMEXIT
>  control feature is enabled.
> 
>  Signed-off-by: Roger Pau Monné 
> >>>
> >>> Before anyone gets to look at the other two patches, should this
> >>> be thrown in right away?
> >>
> >> I would like if possible get a confirmation from Kevin (or anyone
> >> else) that my understanding is correct. I find the nested code very
> >> confusing, and I've already made a mistake while trying to fix it.
> >> That being said, this was spotted by osstest as introducing a
> >> regression, so I guess it's safe to just toss it in now.
> >>
> >> FWIW patch 2/3 attempts to provide a description of my understanding
> >> of how nvmx_update_apicv works.
> >>
> >
> > I feel it is not good to take this patch alone, as it was introduced to fix
> > another problem. W/o understanding whether the whole series can
> > fix both old and new problems, we may risk putting nested interrupt
> > logic in an even worse state...
> 
> Well, okay, I'll wait then, but it would seem to me that reverting
> wouldn't put us in a worse state than we were in before that change
> was put in.

Roger needs to make the call, i.e. which problem is more severe, old or
new one.

Thanks
Kevin


[Xen-devel] [linux-5.4 test] 148903: regressions - trouble: fail/pass/starved

2020-03-24 Thread osstest service owner
flight 148903 linux-5.4 real [real]
http://logs.test-lab.xenproject.org/osstest/logs/148903/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-amd64-qemuu-nested-intel 17 debian-hvm-install/l1/l2 fail REGR. vs. 
146121

Tests which are failing intermittently (not blocking):
 test-amd64-amd64-xl-rtds 18 guest-localmigrate/x10 fail in 148814 pass in 
148903
 test-armhf-armhf-xl-rtds 16 guest-start/debian.repeat  fail pass in 148814

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-dom0pvh-xl-intel 15 guest-saverestore  fail baseline untested
 test-amd64-i386-xl-pvshim12 guest-start  fail   never pass
 test-arm64-arm64-xl-seattle  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  14 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt 13 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  13 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  13 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2  fail never pass
 test-arm64-arm64-xl-xsm  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  13 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  14 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 12 migrate-support-checkfail   never pass
 test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stop fail never pass
 test-arm64-arm64-xl-thunderx 13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 14 saverestore-support-checkfail   never pass
 test-amd64-i386-xl-qemut-win7-amd64 17 guest-stop  fail never pass
 test-amd64-amd64-xl-qemut-win7-amd64 17 guest-stop fail never pass
 test-amd64-amd64-xl-qemuu-ws16-amd64 17 guest-stop fail never pass
 test-armhf-armhf-xl-cubietruck 13 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 14 saverestore-support-checkfail never pass
 test-armhf-armhf-xl-rtds 13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 13 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt 14 saverestore-support-checkfail   never pass
 test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop  fail never pass
 test-armhf-armhf-libvirt-raw 12 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 13 saverestore-support-checkfail   never pass
 test-amd64-i386-xl-qemut-ws16-amd64 17 guest-stop  fail never pass
 test-armhf-armhf-xl-vhd  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 13 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 14 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-credit2  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  14 saverestore-support-checkfail   never pass
 test-amd64-i386-xl-qemuu-ws16-amd64 17 guest-stop  fail never pass
 test-amd64-amd64-xl-qemut-ws16-amd64 17 guest-stop fail never pass
 test-amd64-amd64-dom0pvh-xl-amd  2 hosts-allocate   starved  n/a

version targeted for testing:
 linux585e0cc080690239f0689973c119459ff69db473
baseline version:
 linux122179cb7d648a6f36b20dd6bf34f953cb384c30


Re: [Xen-devel] [PATCH 1/3] Revert "x86/vvmx: fix virtual interrupt injection when Ack on exit control is used"

2020-03-24 Thread Jan Beulich
On 24.03.2020 06:41, Tian, Kevin wrote:
>> From: Roger Pau Monné 
>> Sent: Monday, March 23, 2020 10:49 PM
>>
>> On Mon, Mar 23, 2020 at 09:09:59AM +0100, Jan Beulich wrote:
>>> On 20.03.2020 20:07, Roger Pau Monne wrote:
 This reverts commit f96e1469ad06b61796c60193daaeb9f8a96d7458.

 The commit is wrong, as the whole point of nvmx_update_apicv is to
 update the guest interrupt status field when the Ack on exit VMEXIT
 control feature is enabled.

 Signed-off-by: Roger Pau Monné 
>>>
>>> Before anyone gets to look at the other two patches, should this
>>> be thrown in right away?
>>
>> I would like if possible get a confirmation from Kevin (or anyone
>> else) that my understanding is correct. I find the nested code very
>> confusing, and I've already made a mistake while trying to fix it.
>> That being said, this was spotted by osstest as introducing a
>> regression, so I guess it's safe to just toss it in now.
>>
>> FWIW patch 2/3 attempts to provide a description of my understanding
>> of how nvmx_update_apicv works.
>>
> 
> I feel it is not good to take this patch alone, as it was introduced to fix
> another problem. W/o understanding whether the whole series can
> fix both old and new problems, we may risk putting nested interrupt
> logic in an even worse state...

Well, okay, I'll wait then, but it would seem to me that reverting
wouldn't put us in a worse state than we were in before that change
was put in.

Jan



Re: [Xen-devel] [PATCH 3/3] x86/nvmx: update exit bitmap on vmexit

2020-03-24 Thread Tian, Kevin
> From: Roger Pau Monne 
> Sent: Saturday, March 21, 2020 3:08 AM
> 
> Current code in nvmx_update_apicv set the guest interrupt status field
> but doesn't update the exit bitmap, which can cause issues of lost
> interrupts on the L1 hypervisor if vmx_intr_assist gets
> short-circuited by nvmx_intr_intercept returning true.

Above is not accurate. Currently Xen didn't choose to update the EOI
exit bitmap every time when there is a change. Instead, it chose to 
batch the update before resuming to the guest. sort of optimization.
So it is not related to whether SVI is changed. We should always do the 
bitmap update in nvmx_update_apicv, regardless of the setting of
Ack-on-exit ...

Thanks
Kevin

> 
> Extract the code to update the exit bitmap from vmx_intr_assist into a
> helper and use it in nvmx_update_apicv when updating the guest
> interrupt status field.
> 
> Signed-off-by: Roger Pau Monné 
> ---
>  xen/arch/x86/hvm/vmx/intr.c   | 21 +
>  xen/arch/x86/hvm/vmx/vvmx.c   |  1 +
>  xen/include/asm-x86/hvm/vmx/vmx.h |  2 ++
>  3 files changed, 16 insertions(+), 8 deletions(-)
> 
> diff --git a/xen/arch/x86/hvm/vmx/intr.c b/xen/arch/x86/hvm/vmx/intr.c
> index 49a1295f09..000e14af49 100644
> --- a/xen/arch/x86/hvm/vmx/intr.c
> +++ b/xen/arch/x86/hvm/vmx/intr.c
> @@ -224,6 +224,18 @@ static int nvmx_intr_intercept(struct vcpu *v, struct
> hvm_intack intack)
>  return 0;
>  }
> 
> +void vmx_sync_exit_bitmap(struct vcpu *v)
> +{
> +const unsigned int n = ARRAY_SIZE(v->arch.hvm.vmx.eoi_exit_bitmap);
> +unsigned int i;
> +
> +while ( (i = find_first_bit(>arch.hvm.vmx.eoi_exitmap_changed, n)) <
> n )
> +{
> +clear_bit(i, >arch.hvm.vmx.eoi_exitmap_changed);
> +__vmwrite(EOI_EXIT_BITMAP(i), v->arch.hvm.vmx.eoi_exit_bitmap[i]);
> +}
> +}
> +
>  void vmx_intr_assist(void)
>  {
>  struct hvm_intack intack;
> @@ -318,7 +330,6 @@ void vmx_intr_assist(void)
>intack.source != hvm_intsrc_vector )
>  {
>  unsigned long status;
> -unsigned int i, n;
> 
> /*
>  * intack.vector is the highest priority vector. So we set 
> eoi_exit_bitmap
> @@ -379,13 +390,7 @@ void vmx_intr_assist(void)
>  intack.vector;
>  __vmwrite(GUEST_INTR_STATUS, status);
> 
> -n = ARRAY_SIZE(v->arch.hvm.vmx.eoi_exit_bitmap);
> -while ( (i = find_first_bit(>arch.hvm.vmx.eoi_exitmap_changed,
> -n)) < n )
> -{
> -clear_bit(i, >arch.hvm.vmx.eoi_exitmap_changed);
> -__vmwrite(EOI_EXIT_BITMAP(i), 
> v->arch.hvm.vmx.eoi_exit_bitmap[i]);
> -}
> +vmx_sync_exit_bitmap(v);
> 
>  pt_intr_post(v, intack);
>  }
> diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
> index 180d01e385..e041ecc115 100644
> --- a/xen/arch/x86/hvm/vmx/vvmx.c
> +++ b/xen/arch/x86/hvm/vmx/vvmx.c
> @@ -1414,6 +1414,7 @@ static void nvmx_update_apicv(struct vcpu *v)
>  status |= rvi & VMX_GUEST_INTR_STATUS_SUBFIELD_BITMASK;
> 
>  __vmwrite(GUEST_INTR_STATUS, status);
> +vmx_sync_exit_bitmap(v);
>  }
>  }
> 
> diff --git a/xen/include/asm-x86/hvm/vmx/vmx.h b/xen/include/asm-
> x86/hvm/vmx/vmx.h
> index b334e1ec94..111ccd7e61 100644
> --- a/xen/include/asm-x86/hvm/vmx/vmx.h
> +++ b/xen/include/asm-x86/hvm/vmx/vmx.h
> @@ -610,6 +610,8 @@ void update_guest_eip(void);
>  void vmx_pi_per_cpu_init(unsigned int cpu);
>  void vmx_pi_desc_fixup(unsigned int cpu);
> 
> +void vmx_sync_exit_bitmap(struct vcpu *v);
> +
>  #ifdef CONFIG_HVM
>  void vmx_pi_hooks_assign(struct domain *d);
>  void vmx_pi_hooks_deassign(struct domain *d);
> --
> 2.25.0



Re: [Xen-devel] [PATCH 2/3] x86/nvmx: clarify and fix usage of nvmx_update_apicv

2020-03-24 Thread Tian, Kevin
> From: Roger Pau Monne 
> Sent: Saturday, March 21, 2020 3:08 AM
> 
> The current usage of nvmx_update_apicv is not clear: it is deeply
> intertwined with the Ack interrupt on exit VMEXIT control.
> 
> The code in nvmx_update_apicv should update the SVI (in service interrupt)
> field of the guest interrupt status only when the Ack interrupt on
> exit is set, as it is used to record that the interrupt being
> serviced is signaled in a vmcs field, and hence hasn't been injected
> as on native. It's important to record the current in service
> interrupt on the guest interrupt status field, or else further
> interrupts won't respect the priority of the in service one.
> 
> While clarifying the usage make sure that the SVI is only updated when
> Ack on exit is set and the nested vmcs interrupt info field is valid. Or
> else a guest not using the Ack on exit feature would loose interrupts as
> they would be signaled as being in service on the guest interrupt
> status field but won't actually be recorded on the interrupt info vmcs
> field, neither injected in any other way.

It is insufficient. You also need to update RVI to enable virtual injection
when Ack on exit is cleared.

> 
> Signed-off-by: Roger Pau Monné 
> ---
>  xen/arch/x86/hvm/vmx/vvmx.c | 11 ++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
> index 1b8461ba30..180d01e385 100644
> --- a/xen/arch/x86/hvm/vmx/vvmx.c
> +++ b/xen/arch/x86/hvm/vmx/vvmx.c
> @@ -1383,7 +1383,7 @@ static void nvmx_update_apicv(struct vcpu *v)
>  {
>  struct nestedvmx *nvmx = _2_nvmx(v);
>  unsigned long reason = get_vvmcs(v, VM_EXIT_REASON);
> -uint32_t intr_info = nvmx->intr.intr_info;
> +unsigned long intr_info = get_vvmcs(v, VM_EXIT_INTR_INFO);

well, above reminds me an interesting question. Combining last and this
patch, we'd see essentially that it goes back to the state before f96e1469
(at least when Ack on exit is true). iirc, that commit was introduced to enable
nested x2apic with apicv, and your very first version even just removed
the whole nvmx_update_apicv. Then now with the new reverted logic,
are you still suffering x2apic problem? If not, does it imply the real fix
is actually coming from patch 3/3 for eoi bitmap update?

> 
>  if ( reason == EXIT_REASON_EXTERNAL_INTERRUPT &&
>   nvmx->intr.source == hvm_intsrc_lapic &&
> @@ -1399,6 +1399,15 @@ static void nvmx_update_apicv(struct vcpu *v)
>  ppr = vlapic_set_ppr(vlapic);
>  WARN_ON((ppr & 0xf0) != (vector & 0xf0));
> 
> +/*
> + * SVI must be updated when the interrupt has been signaled using the
> + * Ack on exit feature, or else the currently in-service interrupt
> + * won't be respected.
> + *
> + * Note that this is specific to the fact that when doing a VMEXIT an
> + * interrupt might get delivered using the interrupt info vmcs field
> + * instead of being injected normally.
> + */

I'm not sure mentioning SVI specifically here is necessary, since all steps
here are required - updating iir, isr, rvi, svi, ppr, etc. It is just an 
emulation
of updating virtual APIC state as if a virtual interrupt delivery happens.

>  status = vector << VMX_GUEST_INTR_STATUS_SVI_OFFSET;
>  rvi = vlapic_has_pending_irq(v);
>  if ( rvi != -1 )
> --
> 2.25.0