date:20210420

[xen-4.15-testing test] 161322: tolerable FAIL - PUSHED

2021-04-20 Thread osstest service owner

flight 161322 xen-4.15-testing real [real]
http://logs.test-lab.xenproject.org/osstest/logs/161322/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 161049
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 161049
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 161049
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 161049
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 161049
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 161049
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 161049
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 161049
 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 161049
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 161049
 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 161049
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass

version targeted for testing:
 xen  eb1f325186be9e02c3e89619cd154eb57f202eba
baseline version:
 xen  c86d8ec3b816cc1317d9cc2fb0817e59b5bc4cfa

Last test of basis   161049  2021-04-12 10:36:30 Z8 days
Testing same since   161322  2021-04-20 10:06:36 Z0 days1 attempts


People who touched revisions under test:
  Andrew Cooper 
  Boris Ostrovsky 
  Frédéric Pierret 
  Jan Beulich 

jobs:
 build-amd64-xsm  pass
 build-arm64-xsm  pass
 build-i386-xsm   pass

[linux-linus test] 161320: regressions - FAIL

2021-04-20 Thread osstest service owner

flight 161320 linux-linus real [real]
http://logs.test-lab.xenproject.org/osstest/logs/161320/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-i386-xl-qemuu-ws16-amd64  7 xen-install   fail REGR. vs. 152332
 test-amd64-i386-xl-qemuu-debianhvm-amd64-shadow 7 xen-install fail REGR. vs. 
152332
 test-amd64-i386-xl-xsm7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-qemut-rhel6hvm-intel  7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-xl-qemuu-dmrestrict-amd64-dmrestrict 7 xen-install fail REGR. 
vs. 152332
 test-amd64-i386-xl-qemut-debianhvm-amd64  7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-examine   6 xen-install  fail REGR. vs. 152332
 test-amd64-i386-libvirt   7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-xl-qemuu-debianhvm-i386-xsm 7 xen-install fail REGR. vs. 152332
 test-amd64-i386-xl-qemuu-debianhvm-amd64  7 xen-install  fail REGR. vs. 152332
 test-amd64-coresched-i386-xl  7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-pair 10 xen-install/src_host fail REGR. vs. 152332
 test-amd64-i386-pair 11 xen-install/dst_host fail REGR. vs. 152332
 test-amd64-i386-qemut-rhel6hvm-amd  7 xen-installfail REGR. vs. 152332
 test-amd64-i386-xl-qemut-ws16-amd64  7 xen-install   fail REGR. vs. 152332
 test-amd64-i386-qemuu-rhel6hvm-amd  7 xen-installfail REGR. vs. 152332
 test-amd64-i386-xl7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-qemuu-rhel6hvm-intel  7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-xl-pvshim 7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-freebsd10-amd64  7 xen-install   fail REGR. vs. 152332
 test-amd64-i386-xl-raw7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-xl-qemut-debianhvm-i386-xsm 7 xen-install fail REGR. vs. 152332
 test-amd64-i386-xl-shadow 7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-freebsd10-i386  7 xen-installfail REGR. vs. 152332
 test-amd64-i386-libvirt-xsm   7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-xl-qemuu-ovmf-amd64  7 xen-install   fail REGR. vs. 152332
 test-amd64-i386-xl-qemut-win7-amd64  7 xen-install   fail REGR. vs. 152332
 test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm 7 xen-install fail REGR. 
vs. 152332
 test-amd64-i386-xl-qemuu-win7-amd64  7 xen-install   fail REGR. vs. 152332
 test-amd64-i386-libvirt-pair 10 xen-install/src_host fail REGR. vs. 152332
 test-amd64-i386-libvirt-pair 11 xen-install/dst_host fail REGR. vs. 152332
 test-amd64-amd64-xl-pvshim8 xen-boot fail REGR. vs. 152332
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 7 xen-install fail REGR. vs. 
152332
 test-amd64-amd64-amd64-pvgrub 20 guest-stop  fail REGR. vs. 152332
 test-amd64-amd64-i386-pvgrub 20 guest-stop   fail REGR. vs. 152332

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 152332
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 152332
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 152332
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 152332
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 152332
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 152332
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 152332
 test-arm64-arm64-xl-seattle  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13

Re: [PATCH] xen/arm: Ensure the vCPU context is seen before clearing the _VPF_down

2021-04-20 Thread Stefano Stabellini

On Tue, 20 Apr 2021, Julien Grall wrote:
> (+ Andrew and Jan)
> 
> Hi Stefano,
> 
> On 20/04/2021 20:38, Stefano Stabellini wrote:
> > On Fri, 16 Apr 2021, Julien Grall wrote:
> > > > I think your explanation is correct. However, don't we need a smp_rmb()
> > > > barrier after reading v->is_initialised in xen/common/domctl.c:do_domctl
> > > > ? That would be the barrier that pairs with smp_wmb in regards to
> > > > v->is_initialised.
> > > 
> > > There is already a smp_mb() in sync_vcpu_exec_state() which is called from
> > > vcpu_pause() -> vcpu_sleep_sync().
> > 
> > But that's too late, isn't? The v->is_initialized check is done before
> > the vcpu_pause() call. We might end up taking the wrong code path:
> > 
> > https://gitlab.com/xen-project/xen/-/blob/staging/xen/common/domctl.c#L587
> > https://gitlab.com/xen-project/xen/-/blob/staging/xen/common/domctl.c#L598
> 
> I am a bit confused what you mean by "wrong path" here. There is no timing
> guarantee with a memory barrier. What the barrier will guarantee you is
> v->is_initialized will be read *before* anything after the barrier.
> 
> Are you worried that some variables in vcpu_pause() may be read before
> v->is_initialized?
> 
> > 
> > > I don't think we can ever remove the memory barrier in
> > > sync_vcpu_exec_state()
> > > because the vCPU paused may have run (or initialized) on a different pCPU.
> > > So
> > > I would like to rely on the barrier rather than adding an extra one (even
> > > thought it is not a fast path).
> > > 
> > > I am thinking to add a comment on top of vcpu_pause() to clarify that
> > > after
> > > the call, the vCPU context will be observed without extra synchronization
> > > required.
> > 
> > Given that an if.. goto is involved, even if both code paths lead to
> > good results, I think it would be best to have the smp_rmb() barrier
> > above after the first v->is_initialised read to make sure we take the
> > right path.
> 
> I don't understand what you are referring by "wrong" and "right" path. The
> processor will always execute/commit the path based on the value of
> v->is_initialized. What may happen is the processor may start to speculate the
> wrong path and then backtrack if it discovers the value is not the expected
> one. But, AFAIK, smp_rmb() is not going to protect you against speculation...
> smp_rmb() is only going to enforce a memory ordering between read.
> 
> > Instead of having to make sure that even the "wrong" path
> > behaves correctly.
> Just for clarification, I think you meant writing the following code:
> 
> if ( !v->is_initialized )
>   goto getvcpucontext_out;
> 
> smp_rmb();

No, sorry, I'll try to be clearer, see below


> ...
> 
> vcpu_pause();
> 
> AFAICT, there is nothing in the implementation of XEN_DOMCTL_getvcpucontext
> that justify the extra barrier (assuming we consider vcpu_pause() as a full
> memory barrier).
> 
> From your e-mail, I also could not infer what is your exact concern regarding
> the memory ordering. If you have something in mind, then would you mind to
> provide a sketch showing what could go wrong?

Let me start with what I had in mind:

initialized = v->is_initialized;
smp_rmb();
if ( !initialized )
  goto getvcpucontext_out;


Without smp_rmb there are no guarantees that we see an up-to-date value
of v->is_initialized, right? This READ of v->is_initialized is supposed
to pair with the WRITE in arch_set_info_guest.

Relying on the the barrier in vcpu_pause is OK only if is_initialised
was already read as "1".

I think it might be OK in this case, because probably nothing wrong
happens if we read the old value of v->is_initialized as "0" but it
doesn't seem to be a good idea. Theoretically it could pass a very long
time before we see v->is_initialized as "1" without the smp_rmb.

Please let me know if I am missing something.

[xen-unstable-smoke test] 161338: tolerable all pass - PUSHED

2021-04-20 Thread osstest service owner

flight 161338 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/161338/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  aaa3eafb3ba8b544d19ca41cda1477640b22b8fc
baseline version:
 xen  9f6cd4983715cb31f0ea540e6bbb63f799a35d8a

Last test of basis   161332  2021-04-20 15:00:28 Z0 days
Testing same since   161338  2021-04-20 20:01:32 Z0 days1 attempts


People who touched revisions under test:
  Andrew Cooper 
  Stefano Stabellini 
  Stefano Stabellini 
  Wei Liu 

jobs:
 build-arm64-xsm  pass
 build-amd64  pass
 build-armhf  pass
 build-amd64-libvirt  pass
 test-armhf-armhf-xl  pass
 test-arm64-arm64-xl-xsm  pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/xen.git
   9f6cd49837..aaa3eafb3b  aaa3eafb3ba8b544d19ca41cda1477640b22b8fc -> smoke

[xen-unstable test] 161317: tolerable FAIL - PUSHED

2021-04-20 Thread osstest service owner

flight 161317 xen-unstable real [real]
http://logs.test-lab.xenproject.org/osstest/logs/161317/

Failures :-/ but no regressions.

Regressions which are regarded as allowable (not blocking):
 test-armhf-armhf-xl-rtds18 guest-start/debian.repeat fail REGR. vs. 161296

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 161296
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 161296
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 161296
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 161296
 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 161296
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 161296
 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 161296
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 161296
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 161296
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 161296
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 161296
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-arm64-arm64-xl-seattle  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  a8c532be6a44c7faa54ac777a717f4aa65e3a806
baseline version:
 xen  b5b93627dd1c398a90b832af765b4720fc71814e

Last test of basis   161296  2021-04-19 17:07:45 Z1 days
Testing same since   161317  2021-04-20 08:13:00 Z0 days1 attempts


People who touched revisions under test:
  Jan Beulich 
  Tamas K Lengyel 
  Tim Deegan 

jobs:
 build-amd64-xsm  pass
 build-arm64-xsm

Re: Writing to arbritary cannonical addresses

2021-04-20 Thread Charles Gonçalves

Thanks again Andrew, ...

My initial idea was to allocate a frame on kernel space and change the
update_va_mapping to "forcibly" write the desired MFN as the l1 page table
and return the va.

You can see what I did here:
https://github.com/charlesfg/xen/blob/5755f0752bd50891942768bf99898bbc8f7eac82/xen/arch/x86/mm.c#L4539

Basically, I assume the fast path and for the UPDATE_ENTRY
https://github.com/charlesfg/xen/blob/5755f0752bd50891942768bf99898bbc8f7eac82/xen/arch/x86/mm.c#L2184


Any help on this would be greatly appreciated  :)


Atenciosamente,
*Charles Ferreira Gonçalves *




On Tue, Apr 20, 2021 at 7:05 PM Andrew Cooper 
wrote:

> On 20/04/2021 17:13, Charles Gonçalves wrote:
> > Hello Guys,
> >
> > I'm trying to reproduce old exploit behaviors in a simplistic way:
> > create an hypercall to write a buffer to a specific MFN.
> >
> > At first, I thought that updating an l1 page in a valid VA in guest
> > kernel space would do the trick.
> > But for addresses outside the  Guest-defined use (0x -
> > 0x7fff ) is a no go!
> > I get a page fault with  'reserved bit in page table' warning message.
> >
> > Now I'm trying to write to the address inside the hypervisor code, but
> > not sure how to do it.
> >
> > Any comments or tips on this?
>
> So you're looking to add a hypercall to make arbitrary unaudited changes
> to arbitrary memory, to simulate past security issues?
>
> If you're seeing "Reserved bit in page table" then you've managed to
> corrupt a pagetable entry somehow.  Xen doesn't write any reserved bits
> (which it doesn't know how to interpret).
>
> I'm afraid that if you want any further help with this specific failure,
> you're going to have to post your changes to Xen somewhere.  pastebin,
> or a pointer to a git branch, or whatever, but my divination skills
> aren't great...
>
> ~Andrew
>
>

Re: [PATCH] xen/arm: Ensure the vCPU context is seen before clearing the _VPF_down

2021-04-20 Thread Julien Grall


(+ Andrew and Jan)

Hi Stefano,

On 20/04/2021 20:38, Stefano Stabellini wrote:

On Fri, 16 Apr 2021, Julien Grall wrote:

I think your explanation is correct. However, don't we need a smp_rmb()
barrier after reading v->is_initialised in xen/common/domctl.c:do_domctl
? That would be the barrier that pairs with smp_wmb in regards to
v->is_initialised.


There is already a smp_mb() in sync_vcpu_exec_state() which is called from
vcpu_pause() -> vcpu_sleep_sync().


But that's too late, isn't? The v->is_initialized check is done before
the vcpu_pause() call. We might end up taking the wrong code path:

https://gitlab.com/xen-project/xen/-/blob/staging/xen/common/domctl.c#L587
https://gitlab.com/xen-project/xen/-/blob/staging/xen/common/domctl.c#L598


I am a bit confused what you mean by "wrong path" here. There is no 
timing guarantee with a memory barrier. What the barrier will guarantee 
you is v->is_initialized will be read *before* anything after the barrier.


Are you worried that some variables in vcpu_pause() may be read before 
v->is_initialized?





I don't think we can ever remove the memory barrier in sync_vcpu_exec_state()
because the vCPU paused may have run (or initialized) on a different pCPU. So
I would like to rely on the barrier rather than adding an extra one (even
thought it is not a fast path).

I am thinking to add a comment on top of vcpu_pause() to clarify that after
the call, the vCPU context will be observed without extra synchronization
required.


Given that an if.. goto is involved, even if both code paths lead to
good results, I think it would be best to have the smp_rmb() barrier
above after the first v->is_initialised read to make sure we take the
right path.


I don't understand what you are referring by "wrong" and "right" path. 
The processor will always execute/commit the path based on the value of 
v->is_initialized. What may happen is the processor may start to 
speculate the wrong path and then backtrack if it discovers the value is 
not the expected one. But, AFAIK, smp_rmb() is not going to protect you 
against speculation... smp_rmb() is only going to enforce a memory 
ordering between read.



Instead of having to make sure that even the "wrong" path
behaves correctly.

Just for clarification, I think you meant writing the following code:

if ( !v->is_initialized )
  goto getvcpucontext_out;

smp_rmb();

...

vcpu_pause();

AFAICT, there is nothing in the implementation of 
XEN_DOMCTL_getvcpucontext that justify the extra barrier (assuming we 
consider vcpu_pause() as a full memory barrier).


From your e-mail, I also could not infer what is your exact concern 
regarding the memory ordering. If you have something in mind, then would 
you mind to provide a sketch showing what could go wrong?


Cheers,

--
Julien Grall

Re: [PATCH] xen/arm: Ensure the vCPU context is seen before clearing the _VPF_down

2021-04-20 Thread Stefano Stabellini

On Fri, 16 Apr 2021, Julien Grall wrote:
> Hi Stefano,
> 
> On 13/04/2021 23:43, Stefano Stabellini wrote:
> > On Sat, 20 Mar 2021, Julien Grall wrote:
> > > On 20/03/2021 00:01, Stefano Stabellini wrote:
> > > > On Sat, 27 Feb 2021, Julien Grall wrote:
> > > > > (+ Dario and George)
> > > > > 
> > > > > Hi Stefano,
> > > > > 
> > > > > I have added Dario and George to get some inputs from the scheduling
> > > > > part.
> > > > > 
> > > > > On 27/02/2021 01:58, Stefano Stabellini wrote:
> > > > > > On Fri, 26 Feb 2021, Julien Grall wrote:
> > > > > > > From: Julien Grall 
> > > > > > > 
> > > > > > > A vCPU can get scheduled as soon as _VPF_down is cleared. As there
> > > > > > > is
> > > > > > > currently not ordering guarantee in arch_set_info_guest(), it may
> > > > > > > be
> > > > > > > possible that flag can be observed cleared before the new values
> > > > > > > of
> > > > > > > vCPU
> > > > > > > registers are observed.
> > > > > > > 
> > > > > > > Add an smp_mb() before the flag is cleared to prevent re-ordering.
> > > > > > > 
> > > > > > > Signed-off-by: Julien Grall 
> > > > > > > 
> > > > > > > ---
> > > > > > > 
> > > > > > > Barriers should work in pair. However, I am not entirely sure
> > > > > > > whether
> > > > > > > to
> > > > > > > put the other half. Maybe at the beginning of context_switch_to()?
> > > > > > 
> > > > > > It should be right after VGCF_online is set or cleared, right?
> > > > > 
> > > > > vcpu_guest_context_t is variable allocated on the heap just for the
> > > > > purpose of
> > > > > this call. So an ordering with VGFC_online is not going to do
> > > > > anything.
> > > > > 
> > > > > > So it
> > > > > > would be:
> > > > > > 
> > > > > > xen/arch/arm/domctl.c:arch_get_info_guest
> > > > > > xen/arch/arm/vpsci.c:do_common_cpu_on
> > > > > > 
> > > > > > But I think it is impossible that either of them get called at the
> > > > > > same
> > > > > > time as arch_set_info_guest, which makes me wonder if we actually
> > > > > > need
> > > > > > the barrier...
> > > > > arch_get_info_guest() is called without the domain lock held and I
> > > > > can't
> > > > > see
> > > > > any other lock that could prevent it to be called in // of
> > > > > arch_set_info_guest().
> > > > > 
> > > > > So you could technically get corrupted information from
> > > > > XEN_DOMCTL_getvcpucontext. For this case, we would want a smp_wmb()
> > > > > before
> > > > > writing to v->is_initialised. The corresponding read barrier would be
> > > > > in
> > > > > vcpu_pause() -> vcpu_sleep_sync() -> sync_vcpu_execstate().
> > > > > 
> > > > > But this is not the issue I was originally trying to solve. Currently,
> > > > > do_common_cpu_on() will roughly do:
> > > > > 
> > > > >1) domain_lock(d)
> > > > > 
> > > > >2) v->arch.sctlr = ...
> > > > >   v->arch.ttbr0 = ...
> > > > > 
> > > > >3) clear_bit(_VFP_down, >pause_flags);
> > > > > 
> > > > >4) domain_unlock(d)
> > > > > 
> > > > >5) vcpu_wake(v);
> > > > > 
> > > > > If we had only one pCPU on the system, then we would only wake the
> > > > > vCPU in
> > > > > step 5. We would be fine in this situation. But that's not the
> > > > > interesting
> > > > > case.
> > > > > 
> > > > > If you add a second pCPU in the story, it may be possible to have
> > > > > vcpu_wake()
> > > > > happening in // (see more below). As there is no memory barrier, step
> > > > > 3
> > > > > may be
> > > > > observed before 2. So, assuming the vcpu is runnable, we could start
> > > > > to
> > > > > schedule a vCPU before any update to the registers (step 2) are
> > > > > observed.
> > > > > 
> > > > > This means that when context_switch_to() is called, we may end up to
> > > > > restore
> > > > > some old values.
> > > > > 
> > > > > Now the question is can vcpu_wake() be called in // from another pCPU?
> > > > > AFAICT,
> > > > > it would be only called if a given flag in v->pause_flags is cleared
> > > > > (e.g.
> > > > > _VFP_blocked). But can we rely on that?
> > > > > 
> > > > > Even if we can rely on it, v->pause_flags has other flags in it. I
> > > > > couldn't
> > > > > rule out that _VPF_down cannot be set at the same time as the other
> > > > > _VPF_*.
> > > > > 
> > > > > Therefore, I think a barrier is necessary to ensure the ordering.
> > > > > 
> > > > > Do you agree with this analysis?
> > > >Yes, I think this makes sense. The corresponding barrier in the
> > > > scheduling code would have to be after reading _VPF_down and before
> > > > reading v->arch.sctlr, etc.
> > > > 
> > > > 
> > > > > > > The issues described here is also quite theoritical because there
> > > > > > > are
> > > > > > > hundreds of instructions executed between the time a vCPU is seen
> > > > > > > runnable and scheduled. But better be safe than sorry :).
> > > > > > > ---
> > > > > > > xen/arch/arm/domain.c | 7 +++
> > > > > > > 1 file changed, 7 insertions(+)
> > > > > > > 
> > > > > > > diff --git a/xen/arch/arm/domain.c

Re: Writing to arbritary cannonical addresses

2021-04-20 Thread Andrew Cooper

On 20/04/2021 17:13, Charles Gonçalves wrote:
> Hello Guys,
>
> I'm trying to reproduce old exploit behaviors in a simplistic way: 
> create an hypercall to write a buffer to a specific MFN. 
>
> At first, I thought that updating an l1 page in a valid VA in guest
> kernel space would do the trick. 
> But for addresses outside the  Guest-defined use (0x -
> 0x7fff ) is a no go! 
> I get a page fault with  'reserved bit in page table' warning message.
>
> Now I'm trying to write to the address inside the hypervisor code, but
> not sure how to do it.  
>
> Any comments or tips on this?

So you're looking to add a hypercall to make arbitrary unaudited changes
to arbitrary memory, to simulate past security issues?

If you're seeing "Reserved bit in page table" then you've managed to
corrupt a pagetable entry somehow.  Xen doesn't write any reserved bits
(which it doesn't know how to interpret).

I'm afraid that if you want any further help with this specific failure,
you're going to have to post your changes to Xen somewhere.  pastebin,
or a pointer to a git branch, or whatever, but my divination skills
aren't great...

~Andrew

[PATCH v2] x86/shim: Simplify compat handling in write_start_info()

2021-04-20 Thread Andrew Cooper

Factor out a compat boolean to remove the lfence overhead from multiple
is_pv_32bit_domain() calls.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Roger Pau Monné 
CC: Wei Liu 

v2:
 * Reinstate the conditional for the start info pointer, in case Intel/AMD
   can't be persuaded to adjust the architectural guarentees for upper bits in
   compatibility mode transitions.
---
 xen/arch/x86/pv/shim.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/xen/arch/x86/pv/shim.c b/xen/arch/x86/pv/shim.c
index d16c0048c0..4c6f442274 100644
--- a/xen/arch/x86/pv/shim.c
+++ b/xen/arch/x86/pv/shim.c
@@ -280,12 +280,12 @@ void __init pv_shim_setup_dom(struct domain *d, 
l4_pgentry_t *l4start,
 static void write_start_info(struct domain *d)
 {
 struct cpu_user_regs *regs = guest_cpu_user_regs();
-start_info_t *si = map_domain_page(_mfn(is_pv_32bit_domain(d) ? regs->edx
-  : 
regs->rdx));
+bool compat = is_pv_32bit_domain(d);
+start_info_t *si = map_domain_page(_mfn(compat ? regs->edx : regs->rdx));
 uint64_t param;
 
 snprintf(si->magic, sizeof(si->magic), "xen-3.0-x86_%s",
- is_pv_32bit_domain(d) ? "32p" : "64");
+ compat ? "32p" : "64");
 si->nr_pages = domain_tot_pages(d);
 si->shared_info = virt_to_maddr(d->shared_info);
 si->flags = 0;
@@ -300,7 +300,7 @@ static void write_start_info(struct domain *d)
   >console.domU.mfn) )
 BUG();
 
-if ( is_pv_32bit_domain(d) )
+if ( compat )
 xlat_start_info(si, XLAT_start_info_console_domU);
 
 unmap_domain_page(si);
-- 
2.11.0

Re: [PATCH] x86/shim: Simplify compat handling in write_start_info()

2021-04-20 Thread Andrew Cooper

On 19/04/2021 17:00, Jan Beulich wrote:
> On 19.04.2021 17:57, Andrew Cooper wrote:
>> On 19/04/2021 16:55, Jan Beulich wrote:
>>> On 19.04.2021 16:45, Andrew Cooper wrote:
 Factor out a compat boolean to remove the lfence overhead from multiple
 is_pv_32bit_domain() calls.

 For a compat guest, the upper 32 bits of rdx are zero, so there is no need 
 to
 have any conditional logic at all when mapping the start info page.
>>> Iirc the contents of the upper halves hold unspecified contents after
>>> a switch from compat to 64-bit mode. Therefore only with this part of
>>> the change dropped ...
>> But we're shim, so will never ever mix compat and non-compat guests.
> That's not the point: A compat guest will still cause the CPU to
> transition back and forth between 64-bit and compat modes. It is
> this transitioning which leaves the upper halves of all GPRs in
> undefined state (even if in reality a CPU would likely need to go
> through extra hoops to prevent them from being zero if they were
> written to in compat mode).

Hmm.  That's awkward.

So real behaviour (I've checked with some contacts) is that upper bits
are preserved until the next write to the register, after which the
upper bits are zeroed.

I wonder whether I'll have any luck formally asking AMD and Intel for a
tweak to this effect in the manuals.

~Andrew

[xen-unstable-smoke test] 161332: tolerable all pass - PUSHED

2021-04-20 Thread osstest service owner

flight 161332 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/161332/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  9f6cd4983715cb31f0ea540e6bbb63f799a35d8a
baseline version:
 xen  a8c532be6a44c7faa54ac777a717f4aa65e3a806

Last test of basis   161293  2021-04-19 14:00:26 Z1 days
Failing since161321  2021-04-20 10:02:00 Z0 days2 attempts
Testing same since   161332  2021-04-20 15:00:28 Z0 days1 attempts


People who touched revisions under test:
  Jan Beulich 
  Julien Grall 
  Rahul Singh 
  Roger Pau Monné 
  Stefano Stabellini 

jobs:
 build-arm64-xsm  pass
 build-amd64  pass
 build-armhf  pass
 build-amd64-libvirt  pass
 test-armhf-armhf-xl  pass
 test-arm64-arm64-xl-xsm  pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/xen.git
   a8c532be6a..9f6cd49837  9f6cd4983715cb31f0ea540e6bbb63f799a35d8a -> smoke

[qemu-mainline test] 161308: regressions - FAIL

2021-04-20 Thread osstest service owner

flight 161308 qemu-mainline real [real]
http://logs.test-lab.xenproject.org/osstest/logs/161308/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-amd64-qemuu-freebsd11-amd64 16 guest-saverestore fail REGR. vs. 
152631
 test-amd64-i386-libvirt  14 guest-start  fail REGR. vs. 152631
 test-amd64-amd64-libvirt 14 guest-start  fail REGR. vs. 152631
 test-amd64-i386-libvirt-xsm  14 guest-start  fail REGR. vs. 152631
 test-amd64-amd64-qemuu-freebsd12-amd64 16 guest-saverestore fail REGR. vs. 
152631
 test-amd64-amd64-libvirt-xsm 14 guest-start  fail REGR. vs. 152631
 test-amd64-i386-freebsd10-amd64 16 guest-saverestore fail REGR. vs. 152631
 test-amd64-amd64-libvirt-pair 25 guest-start/debian  fail REGR. vs. 152631
 test-amd64-i386-freebsd10-i386 16 guest-saverestore  fail REGR. vs. 152631
 test-amd64-i386-libvirt-pair 25 guest-start/debian   fail REGR. vs. 152631
 test-amd64-i386-xl-qemuu-debianhvm-amd64 15 guest-saverestore fail REGR. vs. 
152631
 test-amd64-i386-xl-qemuu-debianhvm-amd64-shadow 15 guest-saverestore fail 
REGR. vs. 152631
 test-amd64-amd64-xl-qemuu-debianhvm-i386-xsm 15 guest-saverestore fail REGR. 
vs. 152631
 test-amd64-i386-xl-qemuu-ovmf-amd64 15 guest-saverestore fail REGR. vs. 152631
 test-amd64-amd64-xl-qemuu-debianhvm-amd64 15 guest-saverestore fail REGR. vs. 
152631
 test-amd64-amd64-xl-qemuu-debianhvm-amd64-shadow 15 guest-saverestore fail 
REGR. vs. 152631
 test-amd64-amd64-xl-qemuu-win7-amd64 15 guest-saverestore fail REGR. vs. 152631
 test-amd64-amd64-xl-qemuu-ovmf-amd64 15 guest-saverestore fail REGR. vs. 152631
 test-arm64-arm64-libvirt-xsm 14 guest-start  fail REGR. vs. 152631
 test-amd64-i386-xl-qemuu-debianhvm-i386-xsm 15 guest-saverestore fail REGR. 
vs. 152631
 test-amd64-amd64-qemuu-nested-intel 20 debian-hvm-install/l1/l2 fail REGR. vs. 
152631
 test-armhf-armhf-libvirt 14 guest-start  fail REGR. vs. 152631
 test-amd64-i386-xl-qemuu-win7-amd64 15 guest-saverestore fail REGR. vs. 152631
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 12 debian-hvm-install fail 
REGR. vs. 152631
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 12 debian-hvm-install fail 
REGR. vs. 152631
 test-amd64-i386-xl-qemuu-ws16-amd64 15 guest-saverestore fail REGR. vs. 152631
 test-amd64-amd64-xl-qemuu-ws16-amd64 15 guest-saverestore fail REGR. vs. 152631

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 152631
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 152631
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never

Re: [PATCH] build: centralize / unify asm-offsets generation

2021-04-20 Thread Roger Pau Monné

On Tue, Apr 20, 2021 at 05:47:49PM +0200, Jan Beulich wrote:
> On 20.04.2021 17:29, Roger Pau Monné wrote:
> > On Thu, Apr 01, 2021 at 10:33:47AM +0200, Jan Beulich wrote:
> >> @@ -399,7 +399,11 @@ include/xen/compile.h: include/xen/compi
> >>@sed -rf tools/process-banner.sed < .banner >> $@.new
> >>@mv -f $@.new $@
> >>  
> >> -include/asm-$(TARGET_ARCH)/asm-offsets.h: 
> >> arch/$(TARGET_ARCH)/asm-offsets.s
> >> +asm-offsets.s: arch/$(TARGET_ARCH)/$(TARGET_SUBARCH)/asm-offsets.c
> >> +  $(CC) $(filter-out -Wa$(comma)% -flto,$(c_flags)) -S -g0 -o $@.new -MQ 
> >> $@ $<
> >> +  $(call move-if-changed,$@.new,$@)
> > 
> > Won't it be more natural to keep the .s file in arch/$(TARGET_ARCH)?
> 
> Yes and no: Yes as far as the actual file location is concerned.
> No when considering where it gets generated: I generally consider
> it risky to generate files outside of the directory where make
> currently runs. There may be good reasons for certain exceptions,
> but personally I don't see this case as good enough a reason.
> 
> Somewhat related - if doing as you suggest, which Makefile's
> clean: rule should clean up that file in your opinion?

The clean rule should be in the makefile where it's generated IMO,
same as asm-offsets.h clean rule currently in xen/Makefile.

> Nevertheless, if there's general agreement that keeping the file
> there is better, I'd make the change and simply ignore my unhappy
> feelings about it.

I don't have a strong opinion, it just feels weird to have this IMO
stray asm-offsets.s outside of it's arch directory, taking into
account that we have asm-offsets.h generated from xen/Makefile into an
arch specific directory already as a precedent in that makefile.

Thanks, Roger.

Re: [PATCH v2] VT-d: Don't assume register-based invalidation is always supported

2021-04-20 Thread Roger Pau Monné

On Tue, Apr 20, 2021 at 05:38:51PM +0200, Jan Beulich wrote:
> On 20.04.2021 17:08, Roger Pau Monné wrote:
> > On Thu, Apr 02, 2020 at 04:06:06AM +0800, Chao Gao wrote:
> >> --- a/xen/drivers/passthrough/vtd/qinval.c
> >> +++ b/xen/drivers/passthrough/vtd/qinval.c
> >> @@ -442,6 +442,23 @@ int enable_qinval(struct vtd_iommu *iommu)
> >>  return 0;
> >>  }
> >>  
> >> +static int vtd_flush_context_noop(struct vtd_iommu *iommu, uint16_t did,
> >> +  uint16_t source_id, uint8_t 
> >> function_mask,
> >> +  uint64_t type, bool 
> >> flush_non_present_entry)
> >> +{
> >> +dprintk(XENLOG_ERR VTDPREFIX, "IOMMU: Cannot flush CONTEXT.\n");
> >> +return -EIO;
> >> +}
> >> +
> >> +static int vtd_flush_iotlb_noop(struct vtd_iommu *iommu, uint16_t did,
> >> +uint64_t addr, unsigned int size_order,
> >> +uint64_t type, bool 
> >> flush_non_present_entry,
> >> +bool flush_dev_iotlb)
> >> +{
> >> +dprintk(XENLOG_ERR VTDPREFIX, "IOMMU: Cannot flush IOTLB.\n");
> >> +return -EIO;
> >> +}
> > 
> > I think I would add an ASSERT_UNREACHABLE() to both noop handlers
> > above, as I would expect trying to use them without the proper mode
> > being configured would point to an error elsewhere?
> 
> If such an assertion triggered e.g. during S3 suspend/resume, it may
> lead to the box simply not doing anything useful, without there being
> any way to know what went wrong. If instead the system at least
> managed to resume, the log message could be observed.

Oh, OK then. I'm simply worried that people might ignore such one line
messages, maybe add a WARN?

Would it make sense to mark as tainted which could help identify the
issue on production builds? Maybe that's too much.

Thanks, Roger.

Writing to arbritary cannonical addresses

2021-04-20 Thread Charles Gonçalves

Hello Guys,

I'm trying to reproduce old exploit behaviors in a simplistic way:  create
an hypercall to write a buffer to a specific MFN.

At first, I thought that updating an l1 page in a valid VA in guest kernel
space would do the trick.
But for addresses outside the  Guest-defined use (0x -
0x7fff ) is a no go!
I get a page fault with  'reserved bit in page table' warning message.

Now I'm trying to write to the address inside the hypervisor code, but not
sure how to do it.

Any comments or tips on this?


Atenciosamente,
*Charles Ferreira Gonçalves *

Re: [PATCH v4 3/3] x86/time: avoid reading the platform timer in rendezvous functions

2021-04-20 Thread Roger Pau Monné

On Thu, Apr 01, 2021 at 11:55:10AM +0200, Jan Beulich wrote:
> Reading the platform timer isn't cheap, so we'd better avoid it when the
> resulting value is of no interest to anyone.
> 
> The consumer of master_stime, obtained by
> time_calibration_{std,tsc}_rendezvous() and propagated through
> this_cpu(cpu_calibration), is local_time_calibration(). With
> CONSTANT_TSC the latter function uses an early exit path, which doesn't
> explicitly use the field. While this_cpu(cpu_calibration) (including the
> master_stime field) gets propagated to this_cpu(cpu_time).stamp on that
> path, both structures' fields get consumed only by the !CONSTANT_TSC
> logic of the function.
> 
> Signed-off-by: Jan Beulich 
> ---
> v4: New.
> ---
> I realize there's some risk associated with potential new uses of the
> field down the road. What would people think about compiling time.c a
> 2nd time into a dummy object file, with a conditional enabled to force
> assuming CONSTANT_TSC, and with that conditional used to suppress
> presence of the field as well as all audited used of it (i.e. in
> particular that large part of local_time_calibration())? Unexpected new
> users of the field would then cause build time errors.

Wouldn't that add quite a lot of churn to the file itself in the form
of pre-processor conditionals?

Could we instead set master_stime to an invalid value that would make
the consumers explode somehow?

I know there might be new consumers, but those should be able to
figure whether the value is sane by looking at the existing ones.

Also, since this is only done on the BSP on the last iteration I
wonder if it really makes such a difference performance-wise to
warrant all this trouble.

Thanks, Roger.

Re: [PATCH v4 2/3] x86/time: yield to hyperthreads after updating TSC during rendezvous

2021-04-20 Thread Roger Pau Monné

On Thu, Apr 01, 2021 at 11:54:27AM +0200, Jan Beulich wrote:
> Since we'd like the updates to be done as synchronously as possible,
> make an attempt at yielding immediately after the TSC write.
> 
> Signed-off-by: Jan Beulich 

Reviewed-by: Roger Pau Monné 

Did you observe any difference with the pause inserted?

I wonder whether that's enough to give a chance the hyperthread to
also perform the TSC write. In any case there's no harm from it
certainly.

Thanks, Roger.

Re: [PATCH] build: centralize / unify asm-offsets generation

2021-04-20 Thread Jan Beulich

On 20.04.2021 17:29, Roger Pau Monné wrote:
> On Thu, Apr 01, 2021 at 10:33:47AM +0200, Jan Beulich wrote:
>> @@ -399,7 +399,11 @@ include/xen/compile.h: include/xen/compi
>>  @sed -rf tools/process-banner.sed < .banner >> $@.new
>>  @mv -f $@.new $@
>>  
>> -include/asm-$(TARGET_ARCH)/asm-offsets.h: arch/$(TARGET_ARCH)/asm-offsets.s
>> +asm-offsets.s: arch/$(TARGET_ARCH)/$(TARGET_SUBARCH)/asm-offsets.c
>> +$(CC) $(filter-out -Wa$(comma)% -flto,$(c_flags)) -S -g0 -o $@.new -MQ 
>> $@ $<
>> +$(call move-if-changed,$@.new,$@)
> 
> Won't it be more natural to keep the .s file in arch/$(TARGET_ARCH)?

Yes and no: Yes as far as the actual file location is concerned.
No when considering where it gets generated: I generally consider
it risky to generate files outside of the directory where make
currently runs. There may be good reasons for certain exceptions,
but personally I don't see this case as good enough a reason.

Somewhat related - if doing as you suggest, which Makefile's
clean: rule should clean up that file in your opinion?

Nevertheless, if there's general agreement that keeping the file
there is better, I'd make the change and simply ignore my unhappy
feelings about it.

Jan

Re: [PATCH v4 1/3] x86/time: latch to-be-written TSC value early in rendezvous loop

2021-04-20 Thread Roger Pau Monné

On Thu, Apr 01, 2021 at 11:54:05AM +0200, Jan Beulich wrote:
> To reduce latency on time_calibration_tsc_rendezvous()'s last loop
> iteration, read the value to be written on the last iteration at the end
> of the loop body (i.e. in particular at the end of the second to last
> iteration).
> 
> On my single-socket 18-core Skylake system this reduces the average loop
> exit time on CPU0 (from the TSC write on the last iteration to until
> after the main loop) from around 32k cycles to around 29k (albeit the
> values measured on separate runs vary quite significantly).
> 
> Signed-off-by: Jan Beulich 

Reviewed-by: Roger Pau Monné 

Thanks, Roger.

Re: [PATCH v2] VT-d: Don't assume register-based invalidation is always supported

2021-04-20 Thread Jan Beulich

On 20.04.2021 17:08, Roger Pau Monné wrote:
> On Thu, Apr 02, 2020 at 04:06:06AM +0800, Chao Gao wrote:
>> --- a/xen/drivers/passthrough/vtd/qinval.c
>> +++ b/xen/drivers/passthrough/vtd/qinval.c
>> @@ -442,6 +442,23 @@ int enable_qinval(struct vtd_iommu *iommu)
>>  return 0;
>>  }
>>  
>> +static int vtd_flush_context_noop(struct vtd_iommu *iommu, uint16_t did,
>> +  uint16_t source_id, uint8_t function_mask,
>> +  uint64_t type, bool 
>> flush_non_present_entry)
>> +{
>> +dprintk(XENLOG_ERR VTDPREFIX, "IOMMU: Cannot flush CONTEXT.\n");
>> +return -EIO;
>> +}
>> +
>> +static int vtd_flush_iotlb_noop(struct vtd_iommu *iommu, uint16_t did,
>> +uint64_t addr, unsigned int size_order,
>> +uint64_t type, bool flush_non_present_entry,
>> +bool flush_dev_iotlb)
>> +{
>> +dprintk(XENLOG_ERR VTDPREFIX, "IOMMU: Cannot flush IOTLB.\n");
>> +return -EIO;
>> +}
> 
> I think I would add an ASSERT_UNREACHABLE() to both noop handlers
> above, as I would expect trying to use them without the proper mode
> being configured would point to an error elsewhere?

If such an assertion triggered e.g. during S3 suspend/resume, it may
lead to the box simply not doing anything useful, without there being
any way to know what went wrong. If instead the system at least
managed to resume, the log message could be observed.

Jan

Re: [PATCH v2] xen/pci: Refactor PCI MSI interrupts related code

2021-04-20 Thread Jan Beulich

On 20.04.2021 15:45, Rahul Singh wrote:
>> On 19 Apr 2021, at 1:33 pm, Jan Beulich  wrote:
>> On 19.04.2021 13:54, Julien Grall wrote:
>>> For the time being, I think move this code in x86 is a lot better than 
>>> #ifdef or keep the code in common code.
>>
>> Well, I would perhaps agree if it ended up being #ifdef CONFIG_X86.
>> I would perhaps not agree if there was a new CONFIG_* which other
>> (future) arch-es could select if desired.
> 
> I agree with Julien moving the code to x86 file as currently it is referenced 
> only in x86 code
> and as of now we are not sure how other architecture will implement the 
> Interrupt remapping
> (via IOMMU or any other means).  
> 
> Let me know if you are ok with moving the code to x86.

I can't answer this with "yes" or "no" without knowing what the alternative
would be. As said, if the alternative is CONFIG_X86 #ifdef-ary, then yes.
If a separate CONFIG_* gets introduced (and selected by X86), then a
separate file (getting built only when that new setting is y) would seem
better to me.

Jan

Re: [PATCH] build: centralize / unify asm-offsets generation

2021-04-20 Thread Roger Pau Monné

On Thu, Apr 01, 2021 at 10:33:47AM +0200, Jan Beulich wrote:
> Except for an additional prereq Arm and x86 have the same needs here,
> and Arm can also benefit from the recent x86 side improvement. Recurse
> into arch/*/ only for a phony include target (doing nothing on Arm),
> and handle asm-offsets itself entirely locally to xen/Makefile.
> 
> Signed-off-by: Jan Beulich 
> 
> --- a/.gitignore
> +++ b/.gitignore
> @@ -318,7 +318,6 @@
>  xen/arch/x86/efi/check.efi
>  xen/arch/x86/efi/mkreloc
>  xen/arch/*/xen.lds
> -xen/arch/*/asm-offsets.s
>  xen/arch/*/efi/boot.c
>  xen/arch/*/efi/compat.c
>  xen/arch/*/efi/ebmalloc.c
> @@ -325,6 +324,7 @@
>  xen/arch/*/efi/efi.h
>  xen/arch/*/efi/pe.c
>  xen/arch/*/efi/runtime.c
> +xen/asm-offsets.s
>  xen/common/config_data.S
>  xen/common/config.gz
>  xen/include/headers*.chk
> --- a/xen/Makefile
> +++ b/xen/Makefile
> @@ -341,7 +341,7 @@ _clean: delete-unfresh-files
>   find . \( -name "*.o" -o -name ".*.d" -o -name ".*.d2" \
>   -o -name "*.gcno" -o -name ".*.cmd" \) -exec rm -f {} \;
>   rm -f include/asm $(TARGET) $(TARGET).gz $(TARGET).efi 
> $(TARGET).efi.map $(TARGET)-syms $(TARGET)-syms.map *~ core
> - rm -f include/asm-*/asm-offsets.h
> + rm -f asm-offsets.s include/asm-*/asm-offsets.h
>   rm -f .banner
>  
>  .PHONY: _distclean
> @@ -362,7 +362,7 @@ $(TARGET): delete-unfresh-files
>   done; \
>   true
>   $(MAKE) -f $(BASEDIR)/Rules.mk -C include
> - $(MAKE) -f $(BASEDIR)/Rules.mk -C arch/$(TARGET_ARCH) asm-offsets.s
> + $(MAKE) -f $(BASEDIR)/Rules.mk -C arch/$(TARGET_ARCH) include
>   $(MAKE) -f $(BASEDIR)/Rules.mk include/asm-$(TARGET_ARCH)/asm-offsets.h
>   $(MAKE) -f $(BASEDIR)/Rules.mk -C arch/$(TARGET_ARCH) $@
>  
> @@ -399,7 +399,11 @@ include/xen/compile.h: include/xen/compi
>   @sed -rf tools/process-banner.sed < .banner >> $@.new
>   @mv -f $@.new $@
>  
> -include/asm-$(TARGET_ARCH)/asm-offsets.h: arch/$(TARGET_ARCH)/asm-offsets.s
> +asm-offsets.s: arch/$(TARGET_ARCH)/$(TARGET_SUBARCH)/asm-offsets.c
> + $(CC) $(filter-out -Wa$(comma)% -flto,$(c_flags)) -S -g0 -o $@.new -MQ 
> $@ $<
> + $(call move-if-changed,$@.new,$@)

Won't it be more natural to keep the .s file in arch/$(TARGET_ARCH)?

Thanks, Roger.

Re: [PATCH] tools/xenstored: Remove unused prototype

2021-04-20 Thread Bertrand Marquis

Hi Julien,

On 20 Apr 2021, at 14:49, Julien Grall mailto:jul...@xen.org>> 
wrote:

From: Julien Grall mailto:jgr...@amazon.com>>

A prototype for dump_conn() has been present for quite a long time
but there are no implementation. Even, AFAICT in the patch that
introduced it. So drop it.

Signed-off-by: Julien Grall mailto:jgr...@amazon.com>>

Agree, no implementation of this exists so:
Reviewed-by: Bertrand Marquis 
mailto:bertrand.marq...@arm.com>>

Cheers
Bertrand

---
tools/xenstore/xenstored_core.c | 1 -
1 file changed, 1 deletion(-)

diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
index d54a6042a9f7..591b28e4552f 100644
--- a/tools/xenstore/xenstored_core.c
+++ b/tools/xenstore/xenstored_core.c
@@ -2085,7 +2085,6 @@ static struct option options[] = {
#endif
{ NULL, 0, NULL, 0 } };

-extern void dump_conn(struct connection *conn);
int dom0_domid = 0;
int dom0_event = 0;
int priv_domid = 0;
--
2.17.1

Re: [PATCH v2] VT-d: Don't assume register-based invalidation is always supported

2021-04-20 Thread Roger Pau Monné

On Thu, Apr 02, 2020 at 04:06:06AM +0800, Chao Gao wrote:
> --- a/xen/drivers/passthrough/vtd/qinval.c
> +++ b/xen/drivers/passthrough/vtd/qinval.c
> @@ -442,6 +442,23 @@ int enable_qinval(struct vtd_iommu *iommu)
>  return 0;
>  }
>  
> +static int vtd_flush_context_noop(struct vtd_iommu *iommu, uint16_t did,
> +  uint16_t source_id, uint8_t function_mask,
> +  uint64_t type, bool 
> flush_non_present_entry)
> +{
> +dprintk(XENLOG_ERR VTDPREFIX, "IOMMU: Cannot flush CONTEXT.\n");
> +return -EIO;
> +}
> +
> +static int vtd_flush_iotlb_noop(struct vtd_iommu *iommu, uint16_t did,
> +uint64_t addr, unsigned int size_order,
> +uint64_t type, bool flush_non_present_entry,
> +bool flush_dev_iotlb)
> +{
> +dprintk(XENLOG_ERR VTDPREFIX, "IOMMU: Cannot flush IOTLB.\n");
> +return -EIO;
> +}

I think I would add an ASSERT_UNREACHABLE() to both noop handlers
above, as I would expect trying to use them without the proper mode
being configured would point to an error elsewhere?

Thanks, Roger.

[xen-unstable-smoke test] 161321: regressions - FAIL

2021-04-20 Thread osstest service owner

flight 161321 xen-unstable-smoke real [real]
flight 161328 xen-unstable-smoke real-retest [real]
http://logs.test-lab.xenproject.org/osstest/logs/161321/
http://logs.test-lab.xenproject.org/osstest/logs/161328/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-amd64-xl-qemuu-debianhvm-amd64 18 guest-localmigrate/x10 fail REGR. 
vs. 161293

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  730d0f6082e66eefae64f35bc62e51fc54d02d55
baseline version:
 xen  a8c532be6a44c7faa54ac777a717f4aa65e3a806

Last test of basis   161293  2021-04-19 14:00:26 Z1 days
Testing same since   161321  2021-04-20 10:02:00 Z0 days1 attempts


People who touched revisions under test:
  Jan Beulich 
  Roger Pau Monné 

jobs:
 build-arm64-xsm  pass
 build-amd64  pass
 build-armhf  pass
 build-amd64-libvirt  pass
 test-armhf-armhf-xl  pass
 test-arm64-arm64-xl-xsm  pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64fail
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Not pushing.


commit 730d0f6082e66eefae64f35bc62e51fc54d02d55
Author: Roger Pau Monné 
Date:   Tue Apr 20 11:36:54 2021 +0200

x86/dpci: remove the dpci EOI timer

Current interrupt pass though code will setup a timer for each
interrupt injected to the guest that requires an EOI from the guest.
Such timer would perform two actions if the guest doesn't EOI the
interrupt before a given period of time. The first one is deasserting
the virtual line, the second is perform an EOI of the physical
interrupt source if it requires such.

The deasserting of the guest virtual line is wrong, since it messes
with the interrupt status of the guest. This seems to have been done
in order to compensate for missing deasserts when certain interrupt
controller actions are performed. The original motivation of the
introduction of the timer was to fix issues when a GSI was shared
between different guests. We believe that other changes in the
interrupt handling code (ie: proper propagation of EOI related actions
to dpci) will have fixed such errors now.

Performing an EOI of the physical interrupt source is redundant, since
there's already a timer that takes care of this for all interrupts,
not just the HVM dpci ones, see irq_guest_action_t struct eoi_timer
field.

Since both of the actions performed by the dpci timer are not
required, remove it altogether.

Signed-off-by: Roger Pau Monné 
Reviewed-by: Jan Beulich 

commit 2d494f2198d7909a394085d079475bb099d7afe7
Author: Roger Pau Monné 
Date:   Tue Apr 20 11:36:09 2021 +0200

x86/vpic: issue dpci EOI for cleared pins at ICW1

When pins are cleared from either ISR or IRR as part of the
initialization sequence forward the clearing of those pins to the dpci
EOI handler, as it is equivalent to an EOI. Not doing so can bring the
interrupt controller state out of sync with the dpci handling logic,
that expects a notification when a pin has been EOI'ed.

Fixes: 7b3cb5e5416 ('IRQ injection changes for HVM PCI passthru.')
Signed-off-by: Roger Pau Monné 
Reviewed-by: Jan Beulich 

commit 192f7479f21ef63dad8d8acbbda93cce0971fe66
Author: Roger Pau Monné 
Date:   Tue Apr 20 11:35:29 2021 +0200

x86/vpic: don't trigger unmask event until end of init

Wait until the end of the init sequence to trigger the unmask event.
Note that it will be unconditionally

[PATCH v4 10/12] x86/irq: drop return value from hvm_ioapic_assert

2021-04-20 Thread Roger Pau Monne

There's no caller anymore that cares about the injected vector, so
drop the returned vector from the function.

No functional change indented.

Signed-off-by: Roger Pau Monné 
---
Changes since v3:
 - New in this version.
---
 xen/arch/x86/hvm/irq.c| 8 ++--
 xen/include/asm-x86/hvm/irq.h | 2 +-
 2 files changed, 3 insertions(+), 7 deletions(-)

diff --git a/xen/arch/x86/hvm/irq.c b/xen/arch/x86/hvm/irq.c
index c3d8f2a786a..1c588e212f9 100644
--- a/xen/arch/x86/hvm/irq.c
+++ b/xen/arch/x86/hvm/irq.c
@@ -47,24 +47,20 @@ static void assert_gsi(struct domain *d, unsigned 
ioapic_gsi)
 vioapic_irq_positive_edge(d, ioapic_gsi);
 }
 
-int hvm_ioapic_assert(struct domain *d, unsigned int gsi, bool level)
+void hvm_ioapic_assert(struct domain *d, unsigned int gsi, bool level)
 {
 struct hvm_irq *hvm_irq = hvm_domain_irq(d);
-int vector;
 
 if ( gsi >= hvm_irq->nr_gsis )
 {
 ASSERT_UNREACHABLE();
-return -1;
+return;
 }
 
 spin_lock(>arch.hvm.irq_lock);
 if ( !level || hvm_irq->gsi_assert_count[gsi]++ == 0 )
 assert_gsi(d, gsi);
-vector = vioapic_get_vector(d, gsi);
 spin_unlock(>arch.hvm.irq_lock);
-
-return vector;
 }
 
 void hvm_ioapic_deassert(struct domain *d, unsigned int gsi)
diff --git a/xen/include/asm-x86/hvm/irq.h b/xen/include/asm-x86/hvm/irq.h
index 4e3534d4eb4..fda2f8e8ebf 100644
--- a/xen/include/asm-x86/hvm/irq.h
+++ b/xen/include/asm-x86/hvm/irq.h
@@ -226,7 +226,7 @@ int hvm_set_pci_link_route(struct domain *d, u8 link, u8 
isa_irq);
 int hvm_inject_msi(struct domain *d, uint64_t addr, uint32_t data);
 
 /* Assert/deassert an IO APIC pin. */
-int hvm_ioapic_assert(struct domain *d, unsigned int gsi, bool level);
+void hvm_ioapic_assert(struct domain *d, unsigned int gsi, bool level);
 void hvm_ioapic_deassert(struct domain *d, unsigned int gsi);
 
 void hvm_maybe_deassert_evtchn_irq(void);
-- 
2.30.1

[PATCH v4 12/12] x86/vpt: introduce a per-vPT lock

2021-04-20 Thread Roger Pau Monne

Introduce a per virtual timer lock that replaces the existing per-vCPU
and per-domain vPT locks. Since virtual timers are no longer assigned
or migrated between vCPUs the locking can be simplified to a
in-structure spinlock that protects all the fields.

This requires introducing a helper to initialize the spinlock, and
that could be used to initialize other virtual timer fields in the
future.

Signed-off-by: Roger Pau Monné 
Reviewed-by: Jan Beulich 
---
Changes since v1:
 - New in his version.
---
 xen/arch/x86/emul-i8254.c  |  1 +
 xen/arch/x86/hvm/hpet.c|  8 +-
 xen/arch/x86/hvm/hvm.c |  4 ---
 xen/arch/x86/hvm/rtc.c |  1 +
 xen/arch/x86/hvm/vlapic.c  |  1 +
 xen/arch/x86/hvm/vpt.c | 52 ++
 xen/include/asm-x86/hvm/vcpu.h |  3 --
 xen/include/asm-x86/hvm/vpt.h  | 15 ++
 8 files changed, 33 insertions(+), 52 deletions(-)

diff --git a/xen/arch/x86/emul-i8254.c b/xen/arch/x86/emul-i8254.c
index 73be4188ad4..a47138cbab7 100644
--- a/xen/arch/x86/emul-i8254.c
+++ b/xen/arch/x86/emul-i8254.c
@@ -484,6 +484,7 @@ void pit_init(struct domain *d, unsigned long cpu_khz)
 {
 register_portio_handler(d, PIT_BASE, 4, handle_pit_io);
 register_portio_handler(d, 0x61, 1, handle_speaker_io);
+init_periodic_timer(>pt0);
 }
 
 pit_reset(d);
diff --git a/xen/arch/x86/hvm/hpet.c b/xen/arch/x86/hvm/hpet.c
index ca94e8b4538..20593c3862d 100644
--- a/xen/arch/x86/hvm/hpet.c
+++ b/xen/arch/x86/hvm/hpet.c
@@ -739,12 +739,18 @@ static void hpet_set(HPETState *h)
 
 void hpet_init(struct domain *d)
 {
+HPETState *h = domain_vhpet(d);
+unsigned int i;
+
 if ( !has_vhpet(d) )
 return;
 
-hpet_set(domain_vhpet(d));
+hpet_set(h);
 register_mmio_handler(d, _mmio_ops);
 d->arch.hvm.params[HVM_PARAM_HPET_ENABLED] = 1;
+
+for ( i = 0; i < HPET_TIMER_NUM; i++ )
+init_periodic_timer(>pt[i]);
 }
 
 void hpet_deinit(struct domain *d)
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index ec4ab1f5199..0498c3b02e2 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -666,8 +666,6 @@ int hvm_domain_initialise(struct domain *d)
 /* need link to containing domain */
 d->arch.hvm.pl_time->domain = d;
 
-rwlock_init(>arch.hvm.pl_time->pt_migrate);
-
 /* Set the default IO Bitmap. */
 if ( is_hardware_domain(d) )
 {
@@ -1557,8 +1555,6 @@ int hvm_vcpu_initialise(struct vcpu *v)
 
 hvm_asid_flush_vcpu(v);
 
-spin_lock_init(>arch.hvm.tm_lock);
-
 rc = hvm_vcpu_cacheattr_init(v); /* teardown: vcpu_cacheattr_destroy */
 if ( rc != 0 )
 goto fail1;
diff --git a/xen/arch/x86/hvm/rtc.c b/xen/arch/x86/hvm/rtc.c
index b66ca6f64f1..8c8b4ed4018 100644
--- a/xen/arch/x86/hvm/rtc.c
+++ b/xen/arch/x86/hvm/rtc.c
@@ -821,6 +821,7 @@ void rtc_init(struct domain *d)
 init_timer(>update_timer, rtc_update_timer, s, smp_processor_id());
 init_timer(>update_timer2, rtc_update_timer2, s, smp_processor_id());
 init_timer(>alarm_timer, rtc_alarm_cb, s, smp_processor_id());
+init_periodic_timer(>pt);
 
 register_portio_handler(d, RTC_PORT(0), 2, handle_rtc_io);
 
diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c
index 2af24989dd5..5c60bf66584 100644
--- a/xen/arch/x86/hvm/vlapic.c
+++ b/xen/arch/x86/hvm/vlapic.c
@@ -1654,6 +1654,7 @@ int vlapic_init(struct vcpu *v)
 return 0;
 }
 
+init_periodic_timer(>pt);
 vlapic->pt.source = PTSRC_lapic;
 
 vlapic->regs_page = alloc_domheap_page(v->domain, MEMF_no_owner);
diff --git a/xen/arch/x86/hvm/vpt.c b/xen/arch/x86/hvm/vpt.c
index 6a8d216c7b5..e4ecb98d3a4 100644
--- a/xen/arch/x86/hvm/vpt.c
+++ b/xen/arch/x86/hvm/vpt.c
@@ -125,27 +125,6 @@ static int pt_irq_masked(struct periodic_time *pt)
 return 1;
 }
 
-/*
- * Functions which want to modify a particular periodic_time object
- * use pt_{un}lock locking helpers.
- *
- * These users lock whichever vCPU the periodic_time is attached to,
- * but since the vCPU could be modified without holding any lock, they
- * need to take an additional lock that protects against pt->vcpu
- * changing.
- */
-static void pt_lock(struct periodic_time *pt)
-{
-read_lock(>vcpu->domain->arch.hvm.pl_time->pt_migrate);
-spin_lock(>vcpu->arch.hvm.tm_lock);
-}
-
-static void pt_unlock(struct periodic_time *pt)
-{
-spin_unlock(>vcpu->arch.hvm.tm_lock);
-read_unlock(>vcpu->domain->arch.hvm.pl_time->pt_migrate);
-}
-
 static void pt_process_missed_ticks(struct periodic_time *pt)
 {
 s_time_t missed_ticks, now = NOW();
@@ -220,7 +199,7 @@ static void pt_timer_fn(void *data)
 void *cb_priv;
 unsigned int irq;
 
-pt_lock(pt);
+spin_lock(>lock);
 
 v = pt->vcpu;
 irq = pt->irq;
@@ -235,7 +214,7 @@ static void pt_timer_fn(void *data)
 cb_priv = pt->priv;
 }
 
-pt_unlock(pt);
+spin_unlock(>lock);
 
 if ( cb )
 cb(v, cb_priv);
@@ -247,7

[PATCH v4 09/12] x86/irq: remove unused parameter from hvm_isa_irq_assert

2021-04-20 Thread Roger Pau Monne

There are no callers anymore passing a get_vector function pointer to
hvm_isa_irq_assert, so drop the parameter.

No functional change expected.

Signed-off-by: Roger Pau Monné 
---
Changes since v3:
 - New in this version.
---
 xen/arch/x86/hvm/dm.c |  2 +-
 xen/arch/x86/hvm/irq.c| 10 +-
 xen/arch/x86/hvm/pmtimer.c|  2 +-
 xen/arch/x86/hvm/rtc.c|  2 +-
 xen/arch/x86/hvm/vpt.c|  2 +-
 xen/include/asm-x86/hvm/irq.h |  4 +---
 6 files changed, 6 insertions(+), 16 deletions(-)

diff --git a/xen/arch/x86/hvm/dm.c b/xen/arch/x86/hvm/dm.c
index b60b9f3364a..c62a259b7fc 100644
--- a/xen/arch/x86/hvm/dm.c
+++ b/xen/arch/x86/hvm/dm.c
@@ -110,7 +110,7 @@ static int set_isa_irq_level(struct domain *d, uint8_t 
isa_irq,
 hvm_isa_irq_deassert(d, isa_irq);
 break;
 case 1:
-hvm_isa_irq_assert(d, isa_irq, NULL);
+hvm_isa_irq_assert(d, isa_irq);
 break;
 default:
 return -EINVAL;
diff --git a/xen/arch/x86/hvm/irq.c b/xen/arch/x86/hvm/irq.c
index 4825a387bdc..c3d8f2a786a 100644
--- a/xen/arch/x86/hvm/irq.c
+++ b/xen/arch/x86/hvm/irq.c
@@ -212,13 +212,10 @@ void hvm_gsi_deassert(struct domain *d, unsigned int gsi)
 spin_unlock(>arch.hvm.irq_lock);
 }
 
-int hvm_isa_irq_assert(struct domain *d, unsigned int isa_irq,
-   int (*get_vector)(const struct domain *d,
- unsigned int gsi))
+void hvm_isa_irq_assert(struct domain *d, unsigned int isa_irq)
 {
 struct hvm_irq *hvm_irq = hvm_domain_irq(d);
 unsigned int gsi = hvm_isa_irq_to_gsi(isa_irq);
-int vector = -1;
 
 ASSERT(isa_irq <= 15);
 
@@ -228,12 +225,7 @@ int hvm_isa_irq_assert(struct domain *d, unsigned int 
isa_irq,
  (hvm_irq->gsi_assert_count[gsi]++ == 0) )
 assert_irq(d, gsi, isa_irq);
 
-if ( get_vector )
-vector = get_vector(d, gsi);
-
 spin_unlock(>arch.hvm.irq_lock);
-
-return vector;
 }
 
 void hvm_isa_irq_deassert(
diff --git a/xen/arch/x86/hvm/pmtimer.c b/xen/arch/x86/hvm/pmtimer.c
index 97b9e41712f..9d30b145f60 100644
--- a/xen/arch/x86/hvm/pmtimer.c
+++ b/xen/arch/x86/hvm/pmtimer.c
@@ -61,7 +61,7 @@ static void pmt_update_sci(PMTState *s)
 ASSERT(spin_is_locked(>lock));
 
 if ( acpi->pm1a_en & acpi->pm1a_sts & SCI_MASK )
-hvm_isa_irq_assert(s->vcpu->domain, SCI_IRQ, NULL);
+hvm_isa_irq_assert(s->vcpu->domain, SCI_IRQ);
 else
 hvm_isa_irq_deassert(s->vcpu->domain, SCI_IRQ);
 }
diff --git a/xen/arch/x86/hvm/rtc.c b/xen/arch/x86/hvm/rtc.c
index 9992595c45a..b66ca6f64f1 100644
--- a/xen/arch/x86/hvm/rtc.c
+++ b/xen/arch/x86/hvm/rtc.c
@@ -63,7 +63,7 @@ static void rtc_update_irq(RTCState *s)
 
 s->hw.cmos_data[RTC_REG_C] |= RTC_IRQF;
 hvm_isa_irq_deassert(vrtc_domain(s), RTC_IRQ);
-hvm_isa_irq_assert(vrtc_domain(s), RTC_IRQ, NULL);
+hvm_isa_irq_assert(vrtc_domain(s), RTC_IRQ);
 }
 
 /* Called by the VPT code after it's injected a PF interrupt for us.
diff --git a/xen/arch/x86/hvm/vpt.c b/xen/arch/x86/hvm/vpt.c
index 6744b88d20c..639e45c520e 100644
--- a/xen/arch/x86/hvm/vpt.c
+++ b/xen/arch/x86/hvm/vpt.c
@@ -368,7 +368,7 @@ static bool inject_interrupt(struct periodic_time *pt)
 
 case PTSRC_isa:
 hvm_isa_irq_deassert(d, irq);
-hvm_isa_irq_assert(d, irq, NULL);
+hvm_isa_irq_assert(d, irq);
 break;
 
 case PTSRC_ioapic:
diff --git a/xen/include/asm-x86/hvm/irq.h b/xen/include/asm-x86/hvm/irq.h
index 57d51c15863..4e3534d4eb4 100644
--- a/xen/include/asm-x86/hvm/irq.h
+++ b/xen/include/asm-x86/hvm/irq.h
@@ -214,9 +214,7 @@ void hvm_pci_intx_deassert(struct domain *d, unsigned int 
device,
  * allows us to get the interrupt vector in the protection of irq_lock.
  * For most cases, just set get_vector to NULL.
  */
-int hvm_isa_irq_assert(struct domain *d, unsigned int isa_irq,
-   int (*get_vector)(const struct domain *d,
- unsigned int gsi));
+void hvm_isa_irq_assert(struct domain *d, unsigned int isa_irq);
 void hvm_isa_irq_deassert(struct domain *d, unsigned int isa_irq);
 
 /* Modify state of GSIs. */
-- 
2.30.1

[PATCH v4 08/12] x86/vpt: switch interrupt injection model

2021-04-20 Thread Roger Pau Monne

Currently vPT relies on timers being assigned to a vCPU and performing
checks on every return to HVM guest in order to check if an interrupt
from a vPT timer assigned to the vCPU is currently being injected.

This model doesn't work properly since the interrupt destination vCPU
of a vPT timer can be different from the vCPU where the timer is
currently assigned, in which case the timer would get stuck because it
never sees the interrupt as being injected.

Knowing when a vPT interrupt is injected is relevant for the guest
timer modes where missed vPT interrupts are not discarded and instead
are accumulated and injected when possible.

This change aims to modify the logic described above, so that vPT
doesn't need to check on every return to HVM guest if a vPT interrupt
is being injected. In order to achieve this the vPT code is modified
to make use of the new EOI callbacks, so that virtual timers can
detect when a interrupt has been serviced by the guest by waiting for
the EOI callback to execute.

This model also simplifies some of the logic, as when executing the
timer EOI callback Xen can try to inject another interrupt if the
timer has interrupts pending for delivery.

Note that timers are still bound to a vCPU for the time being, this
relation however doesn't limit the interrupt destination anymore, and
will be removed by further patches.

This model has been tested with Windows 7 guests without showing any
timer delay, even when the guest was limited to have very little CPU
capacity and pending virtual timer interrupts accumulate.

Signed-off-by: Roger Pau Monné 
---
Changes since v3:
 - Rename pt_irq_fired to irq_eoi and adjust the logic.
 - Initialize v and cb_priv in eoi_callback.

Changes since v2:
 - Avoid and explicit != NULL check.
 - Use a switch in inject_interrupt to evaluate the timer mode.
 - Print the pt->source field on error in create_periodic_time.

Changes since v1:
 - New in this version.
---
 xen/arch/x86/hvm/svm/intr.c   |   3 -
 xen/arch/x86/hvm/vmx/intr.c   |  59 --
 xen/arch/x86/hvm/vpt.c| 352 +++---
 xen/include/asm-x86/hvm/vpt.h |   5 +-
 4 files changed, 157 insertions(+), 262 deletions(-)

diff --git a/xen/arch/x86/hvm/svm/intr.c b/xen/arch/x86/hvm/svm/intr.c
index 7f815d23078..2ee2332253b 100644
--- a/xen/arch/x86/hvm/svm/intr.c
+++ b/xen/arch/x86/hvm/svm/intr.c
@@ -146,8 +146,6 @@ void svm_intr_assist(void)
 return;
 
 /* Crank the handle on interrupt state. */
-pt_update_irq(v);
-
 do {
 intack = hvm_vcpu_has_pending_irq(v);
 if ( likely(intack.source == hvm_intsrc_none) )
@@ -219,7 +217,6 @@ void svm_intr_assist(void)
 {
 HVMTRACE_2D(INJ_VIRQ, intack.vector, /*fake=*/ 0);
 svm_inject_extint(v, intack.vector);
-pt_intr_post(v, intack);
 }
 
 /* Is there another IRQ to queue up behind this one? */
diff --git a/xen/arch/x86/hvm/vmx/intr.c b/xen/arch/x86/hvm/vmx/intr.c
index 80bfbb47878..3fcc7073db2 100644
--- a/xen/arch/x86/hvm/vmx/intr.c
+++ b/xen/arch/x86/hvm/vmx/intr.c
@@ -203,7 +203,6 @@ static int nvmx_intr_intercept(struct vcpu *v, struct 
hvm_intack intack)
 {
 /* for now, duplicate the ack path in vmx_intr_assist */
 hvm_vcpu_ack_pending_irq(v, intack);
-pt_intr_post(v, intack);
 
 intack = hvm_vcpu_has_pending_irq(v);
 if ( unlikely(intack.source != hvm_intsrc_none) )
@@ -242,7 +241,6 @@ void vmx_intr_assist(void)
 struct vcpu *v = current;
 unsigned int tpr_threshold = 0;
 enum hvm_intblk intblk;
-int pt_vector;
 
 /* Block event injection when single step with MTF. */
 if ( unlikely(v->arch.hvm.single_step) )
@@ -263,8 +261,6 @@ void vmx_intr_assist(void)
 #endif
 
 /* Crank the handle on interrupt state. */
-pt_vector = pt_update_irq(v);
-
 do {
 unsigned long intr_info;
 
@@ -337,58 +333,6 @@ void vmx_intr_assist(void)
 {
 unsigned long status;
 
-   /*
-* intack.vector is the highest priority vector. So we set 
eoi_exit_bitmap
-* for intack.vector - give a chance to post periodic time interrupts 
when
-* periodic time interrupts become the highest one
-*/
-if ( pt_vector != -1 )
-{
-#ifndef NDEBUG
-/*
- * We assert that intack.vector is the highest priority vector for
- * only an interrupt from vlapic can reach this point and the
- * highest vector is chosen in hvm_vcpu_has_pending_irq().
- * But, in fact, the assertion failed sometimes. It is suspected
- * that PIR is not synced to vIRR which makes pt_vector is left in
- * PIR. In order to verify this suspicion, dump some information
- * when the assertion fails.
- */
-if ( unlikely(intack.vector < pt_vector) )
-{
-const struct vlapic *vlapic;
-

[PATCH v4 11/12] x86/vpt: remove vPT timers per-vCPU lists

2021-04-20 Thread Roger Pau Monne

No longer add vPT timers to lists on specific vCPUs, since there's no
need anymore to check if timer interrupts have been injected on return
to HVM guest.

Such change allows to get rid of virtual timers vCPU migration, and
also cleanup some of the virtual timers fields that are no longer
required.

The model is also slightly different now in that timers are not
stopped when a vCPU is de-scheduled. Such timers will continue
running, and when triggered the function will try to inject the
corresponding interrupt to the guest (which might be different than
the currently running one). Note that the timer triggering when the
guest is no longer running can only happen once, as the timer callback
will not reset the interrupt to fire again. Such resetting if required
will be done by the EOI callback.

Signed-off-by: Roger Pau Monné 
Reviewed-by: Jan Beulich 
---
Changes since v3:
 - Remove stale commit log paragrpah.

Changes since v2:
 - Remove pt_{save/restore}_timer and instead use
   pt_{freeze/thaw}_time.
 - Remove the introduction of the 'masked' field, it's not needed.
 - Rework pt_active to use timer_is_active.

Changes since v1:
 - New in this version.
---
 xen/arch/x86/domain.c  |   4 +-
 xen/arch/x86/hvm/hvm.c |   4 +-
 xen/arch/x86/hvm/vlapic.c  |   1 -
 xen/arch/x86/hvm/vpt.c | 192 -
 xen/include/asm-x86/hvm/vcpu.h |   3 +-
 xen/include/asm-x86/hvm/vpt.h  |  12 +--
 6 files changed, 25 insertions(+), 191 deletions(-)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 4dc27f798e7..b6cd715dce9 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -2039,8 +2039,8 @@ void context_switch(struct vcpu *prev, struct vcpu *next)
 vpmu_switch_from(prev);
 np2m_schedule(NP2M_SCHEDLE_OUT);
 
-if ( is_hvm_domain(prevd) && !list_empty(>arch.hvm.tm_list) )
-pt_save_timer(prev);
+if ( is_hvm_domain(prevd) )
+pt_freeze_time(prev);
 
 local_irq_disable();
 
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 2c4dd1b86f2..ec4ab1f5199 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -489,7 +489,6 @@ void hvm_set_info_guest(struct vcpu *v)
 void hvm_migrate_timers(struct vcpu *v)
 {
 rtc_migrate_timers(v);
-pt_migrate(v);
 }
 
 void hvm_migrate_pirq(struct hvm_pirq_dpci *pirq_dpci, const struct vcpu *v)
@@ -544,7 +543,7 @@ void hvm_do_resume(struct vcpu *v)
 {
 check_wakeup_from_wait();
 
-pt_restore_timer(v);
+pt_thaw_time(v);
 
 if ( !vcpu_ioreq_handle_completion(v) )
 return;
@@ -1559,7 +1558,6 @@ int hvm_vcpu_initialise(struct vcpu *v)
 hvm_asid_flush_vcpu(v);
 
 spin_lock_init(>arch.hvm.tm_lock);
-INIT_LIST_HEAD(>arch.hvm.tm_list);
 
 rc = hvm_vcpu_cacheattr_init(v); /* teardown: vcpu_cacheattr_destroy */
 if ( rc != 0 )
diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c
index 8f3a0a2e8f7..2af24989dd5 100644
--- a/xen/arch/x86/hvm/vlapic.c
+++ b/xen/arch/x86/hvm/vlapic.c
@@ -1342,7 +1342,6 @@ void vlapic_adjust_i8259_target(struct domain *d)
 if ( d->arch.hvm.i8259_target == v )
 return;
 d->arch.hvm.i8259_target = v;
-pt_adjust_global_vcpu_target(v);
 }
 
 int vlapic_has_pending_irq(struct vcpu *v)
diff --git a/xen/arch/x86/hvm/vpt.c b/xen/arch/x86/hvm/vpt.c
index 639e45c520e..6a8d216c7b5 100644
--- a/xen/arch/x86/hvm/vpt.c
+++ b/xen/arch/x86/hvm/vpt.c
@@ -125,24 +125,6 @@ static int pt_irq_masked(struct periodic_time *pt)
 return 1;
 }
 
-/*
- * Functions which read (maybe write) all periodic_time instances
- * attached to a particular vCPU use pt_vcpu_{un}lock locking helpers.
- *
- * Such users are explicitly forbidden from changing the value of the
- * pt->vcpu field, because another thread holding the pt_migrate lock
- * may already be spinning waiting for your vcpu lock.
- */
-static void pt_vcpu_lock(struct vcpu *v)
-{
-spin_lock(>arch.hvm.tm_lock);
-}
-
-static void pt_vcpu_unlock(struct vcpu *v)
-{
-spin_unlock(>arch.hvm.tm_lock);
-}
-
 /*
  * Functions which want to modify a particular periodic_time object
  * use pt_{un}lock locking helpers.
@@ -176,14 +158,12 @@ static void pt_process_missed_ticks(struct periodic_time 
*pt)
 return;
 
 missed_ticks = missed_ticks / (s_time_t) pt->period + 1;
-if ( mode_is(pt->vcpu->domain, no_missed_ticks_pending) )
-pt->do_not_freeze = !pt->pending_intr_nr;
-else
+if ( !mode_is(pt->vcpu->domain, no_missed_ticks_pending) )
 pt->pending_intr_nr += missed_ticks;
 pt->scheduled += missed_ticks * pt->period;
 }
 
-static void pt_freeze_time(struct vcpu *v)
+void pt_freeze_time(struct vcpu *v)
 {
 if ( !mode_is(v->domain, delay_for_missed_ticks) )
 return;
@@ -191,7 +171,7 @@ static void pt_freeze_time(struct vcpu *v)
 v->arch.hvm.guest_time = hvm_get_guest_time(v);
 }
 
-static void pt_thaw_time(struct vcpu *v)
+void pt_thaw_time(struct vcpu *v)
 {
 if (

[PATCH v4 07/12] x86/dpci: switch to use a GSI EOI callback

2021-04-20 Thread Roger Pau Monne

Switch the dpci GSI EOI callback hooks to use the newly introduced
generic callback functionality, and remove the custom dpci calls found
on the vPIC and vIO-APIC implementations.

Signed-off-by: Roger Pau Monné 
---
Changes since v3:
 - Print a warning message if the EOI callback cannot be unregistered.

Changes since v2:
 - Avoid leaking the allocated callback on error paths of
   pt_irq_create_bind.

Changes since v1:
 - New in this version.
---
 xen/arch/x86/hvm/vioapic.c|  8 -
 xen/arch/x86/hvm/vpic.c   |  4 ---
 xen/drivers/passthrough/x86/hvm.c | 57 ---
 xen/include/asm-x86/hvm/io.h  |  1 -
 xen/include/asm-x86/hvm/irq.h |  1 +
 5 files changed, 54 insertions(+), 17 deletions(-)

diff --git a/xen/arch/x86/hvm/vioapic.c b/xen/arch/x86/hvm/vioapic.c
index 0824ede91ab..4f844965423 100644
--- a/xen/arch/x86/hvm/vioapic.c
+++ b/xen/arch/x86/hvm/vioapic.c
@@ -284,7 +284,6 @@ static void vioapic_write_redirent(
  */
 ASSERT(prev_level);
 ASSERT(!top_word);
-hvm_dpci_eoi(d, gsi);
 hvm_gsi_execute_callbacks(d, gsi);
 }
 
@@ -418,13 +417,6 @@ static void eoi_callback(struct vcpu *v, unsigned int 
vector, void *data)
 
 ent->fields.remote_irr = 0;
 
-if ( is_iommu_enabled(d) )
-{
-spin_unlock(>arch.hvm.irq_lock);
-hvm_dpci_eoi(d, gsi);
-spin_lock(>arch.hvm.irq_lock);
-}
-
 /*
  * Callbacks don't expect to be executed with any lock held, so
  * drop the lock that protects the vIO-APIC fields from changing.
diff --git a/xen/arch/x86/hvm/vpic.c b/xen/arch/x86/hvm/vpic.c
index ef668f3668a..60d6740f69a 100644
--- a/xen/arch/x86/hvm/vpic.c
+++ b/xen/arch/x86/hvm/vpic.c
@@ -237,8 +237,6 @@ static void vpic_ioport_write(
 ASSERT(pin < 8);
 hvm_gsi_execute_callbacks(current->domain,
 hvm_isa_irq_to_gsi((addr >> 7) ? (pin | 8) : pin));
-hvm_dpci_eoi(current->domain,
- hvm_isa_irq_to_gsi((addr >> 7) ? (pin | 8) : 
pin));
 __clear_bit(pin, );
 }
 return;
@@ -289,8 +287,6 @@ static void vpic_ioport_write(
 vpic_unlock(vpic);
 hvm_gsi_execute_callbacks(current->domain,
 hvm_isa_irq_to_gsi((addr >> 7) ? (pin | 8) : pin));
-hvm_dpci_eoi(current->domain,
- hvm_isa_irq_to_gsi((addr >> 7) ? (pin | 8) : 
pin));
 return; /* bail immediately */
 case 6: /* Set Priority*/
 vpic->priority_add = (val + 1) & 7;
diff --git a/xen/drivers/passthrough/x86/hvm.c 
b/xen/drivers/passthrough/x86/hvm.c
index 0db26e5dbb2..02e027cff8c 100644
--- a/xen/drivers/passthrough/x86/hvm.c
+++ b/xen/drivers/passthrough/x86/hvm.c
@@ -252,7 +252,7 @@ static void hvm_gsi_eoi(struct domain *d, unsigned int gsi)
 hvm_pirq_eoi(pirq);
 }
 
-void hvm_dpci_eoi(struct domain *d, unsigned int guest_gsi)
+static void dpci_eoi(struct domain *d, unsigned int guest_gsi, void *data)
 {
 const struct hvm_irq_dpci *hvm_irq_dpci;
 const struct hvm_girq_dpci_mapping *girq;
@@ -476,6 +476,7 @@ int pt_irq_create_bind(
 {
 struct dev_intx_gsi_link *digl = NULL;
 struct hvm_girq_dpci_mapping *girq = NULL;
+struct hvm_gsi_eoi_callback *cb = NULL;
 unsigned int guest_gsi;
 
 /*
@@ -489,7 +490,7 @@ int pt_irq_create_bind(
 unsigned int link;
 
 digl = xmalloc(struct dev_intx_gsi_link);
-girq = xmalloc(struct hvm_girq_dpci_mapping);
+girq = xzalloc(struct hvm_girq_dpci_mapping);
 
 if ( !digl || !girq )
 {
@@ -502,11 +503,22 @@ int pt_irq_create_bind(
 girq->bus = digl->bus = pt_irq_bind->u.pci.bus;
 girq->device = digl->device = pt_irq_bind->u.pci.device;
 girq->intx = digl->intx = pt_irq_bind->u.pci.intx;
-list_add_tail(>list, _dpci->digl_list);
+girq->cb.callback = dpci_eoi;
 
 guest_gsi = hvm_pci_intx_gsi(digl->device, digl->intx);
 link = hvm_pci_intx_link(digl->device, digl->intx);
 
+rc = hvm_gsi_register_callback(d, guest_gsi, >cb);
+if ( rc )
+{
+spin_unlock(>event_lock);
+xfree(girq);
+xfree(digl);
+return rc;
+}
+
+list_add_tail(>list, _dpci->digl_list);
+
 hvm_irq_dpci->link_cnt[link]++;
 
 girq->machine_gsi = pirq;
@@ -514,17 +526,43 @@ int pt_irq_create_bind(
 }
 else
 {
+/*
+ * NB: the callback structure allocated below will never be freed
+ * once setup because it's used by the hardware domain and will
+

[PATCH v4 06/12] x86/dpci: move code

2021-04-20 Thread Roger Pau Monne

This is code movement in order to simply further changes.

No functional change intended.

Signed-off-by: Roger Pau Monné 
Acked-by: Jan Beulich 
---
Changes since v2:
 - Drop one of the leading underscores from __hvm_dpci_eoi.

Changes since v1:
 - New in this version.
---
 xen/drivers/passthrough/x86/hvm.c | 162 +++---
 1 file changed, 81 insertions(+), 81 deletions(-)

diff --git a/xen/drivers/passthrough/x86/hvm.c 
b/xen/drivers/passthrough/x86/hvm.c
index 8f78c0935b9..0db26e5dbb2 100644
--- a/xen/drivers/passthrough/x86/hvm.c
+++ b/xen/drivers/passthrough/x86/hvm.c
@@ -205,6 +205,87 @@ static struct vcpu *vector_hashing_dest(const struct 
domain *d,
 return dest;
 }
 
+static void hvm_pirq_eoi(struct pirq *pirq)
+{
+struct hvm_pirq_dpci *pirq_dpci;
+
+if ( !pirq )
+{
+ASSERT_UNREACHABLE();
+return;
+}
+
+pirq_dpci = pirq_dpci(pirq);
+
+/*
+ * No need to get vector lock for timer
+ * since interrupt is still not EOIed
+ */
+if ( --pirq_dpci->pending ||
+ /* When the interrupt source is MSI no Ack should be performed. */
+ (pirq_dpci->flags & HVM_IRQ_DPCI_TRANSLATE) )
+return;
+
+pirq_guest_eoi(pirq);
+}
+
+static void __hvm_dpci_eoi(struct domain *d,
+   const struct hvm_girq_dpci_mapping *girq)
+{
+struct pirq *pirq = pirq_info(d, girq->machine_gsi);
+
+if ( !hvm_domain_use_pirq(d, pirq) )
+hvm_pci_intx_deassert(d, girq->device, girq->intx);
+
+hvm_pirq_eoi(pirq);
+}
+
+static void hvm_gsi_eoi(struct domain *d, unsigned int gsi)
+{
+struct pirq *pirq = pirq_info(d, gsi);
+
+/* Check if GSI is actually mapped. */
+if ( !pirq_dpci(pirq) )
+return;
+
+hvm_gsi_deassert(d, gsi);
+hvm_pirq_eoi(pirq);
+}
+
+void hvm_dpci_eoi(struct domain *d, unsigned int guest_gsi)
+{
+const struct hvm_irq_dpci *hvm_irq_dpci;
+const struct hvm_girq_dpci_mapping *girq;
+
+if ( !is_iommu_enabled(d) )
+return;
+
+if ( is_hardware_domain(d) )
+{
+spin_lock(>event_lock);
+hvm_gsi_eoi(d, guest_gsi);
+goto unlock;
+}
+
+if ( guest_gsi < NR_ISAIRQS )
+{
+hvm_dpci_isairq_eoi(d, guest_gsi);
+return;
+}
+
+spin_lock(>event_lock);
+hvm_irq_dpci = domain_get_irq_dpci(d);
+
+if ( !hvm_irq_dpci )
+goto unlock;
+
+list_for_each_entry ( girq, _irq_dpci->girq[guest_gsi], list )
+__hvm_dpci_eoi(d, girq);
+
+unlock:
+spin_unlock(>event_lock);
+}
+
 int pt_irq_create_bind(
 struct domain *d, const struct xen_domctl_bind_pt_irq *pt_irq_bind)
 {
@@ -860,87 +941,6 @@ static void hvm_dirq_assist(struct domain *d, struct 
hvm_pirq_dpci *pirq_dpci)
 spin_unlock(>event_lock);
 }
 
-static void hvm_pirq_eoi(struct pirq *pirq)
-{
-struct hvm_pirq_dpci *pirq_dpci;
-
-if ( !pirq )
-{
-ASSERT_UNREACHABLE();
-return;
-}
-
-pirq_dpci = pirq_dpci(pirq);
-
-/*
- * No need to get vector lock for timer
- * since interrupt is still not EOIed
- */
-if ( --pirq_dpci->pending ||
- /* When the interrupt source is MSI no Ack should be performed. */
- (pirq_dpci->flags & HVM_IRQ_DPCI_TRANSLATE) )
-return;
-
-pirq_guest_eoi(pirq);
-}
-
-static void __hvm_dpci_eoi(struct domain *d,
-   const struct hvm_girq_dpci_mapping *girq)
-{
-struct pirq *pirq = pirq_info(d, girq->machine_gsi);
-
-if ( !hvm_domain_use_pirq(d, pirq) )
-hvm_pci_intx_deassert(d, girq->device, girq->intx);
-
-hvm_pirq_eoi(pirq);
-}
-
-static void hvm_gsi_eoi(struct domain *d, unsigned int gsi)
-{
-struct pirq *pirq = pirq_info(d, gsi);
-
-/* Check if GSI is actually mapped. */
-if ( !pirq_dpci(pirq) )
-return;
-
-hvm_gsi_deassert(d, gsi);
-hvm_pirq_eoi(pirq);
-}
-
-void hvm_dpci_eoi(struct domain *d, unsigned int guest_gsi)
-{
-const struct hvm_irq_dpci *hvm_irq_dpci;
-const struct hvm_girq_dpci_mapping *girq;
-
-if ( !is_iommu_enabled(d) )
-return;
-
-if ( is_hardware_domain(d) )
-{
-spin_lock(>event_lock);
-hvm_gsi_eoi(d, guest_gsi);
-goto unlock;
-}
-
-if ( guest_gsi < NR_ISAIRQS )
-{
-hvm_dpci_isairq_eoi(d, guest_gsi);
-return;
-}
-
-spin_lock(>event_lock);
-hvm_irq_dpci = domain_get_irq_dpci(d);
-
-if ( !hvm_irq_dpci )
-goto unlock;
-
-list_for_each_entry ( girq, _irq_dpci->girq[guest_gsi], list )
-__hvm_dpci_eoi(d, girq);
-
-unlock:
-spin_unlock(>event_lock);
-}
-
 static int pci_clean_dpci_irq(struct domain *d,
   struct hvm_pirq_dpci *pirq_dpci, void *arg)
 {
-- 
2.30.1

[PATCH v4 05/12] x86/hvm: allowing registering EOI callbacks for GSIs

2021-04-20 Thread Roger Pau Monne

Such callbacks will be executed once a EOI is performed by the guest,
regardless of whether the interrupts are injected from the vIO-APIC or
the vPIC, as ISA IRQs are translated to GSIs and then the
corresponding callback is executed at EOI.

The vIO-APIC infrastructure for handling EOIs is build on top of the
existing vlapic EOI callback functionality, while the vPIC one is
handled when writing to the vPIC EOI register.

Note that such callbacks need to be registered and de-registered, and
that a single GSI can have multiple callbacks associated. That's
because GSIs can be level triggered and shared, as that's the case
with legacy PCI interrupts shared between several devices.

Strictly speaking this is a non-functional change, since there are no
users of this new interface introduced by this change.

Signed-off-by: Roger Pau Monné 
---
Changes since v3:
 - Make callback take a domain parameter.
 - Return whether the unregistered callback was found.
 - Add a comment regarding the result of hvm_gsi_has_callbacks being
   stable.

Changes since v2:
 - Latch hvm_domain_irq in some functions.
 - Make domain parameter of hvm_gsi_has_callbacks const.
 - Add comment about dropping the lock around the
   hvm_gsi_execute_callbacks call.
 - Drop change to ioapic_load.

Changes since v1:
 - New in this version.
---
 xen/arch/x86/hvm/hvm.c| 15 ++-
 xen/arch/x86/hvm/irq.c| 75 +++
 xen/arch/x86/hvm/vioapic.c| 29 +++---
 xen/arch/x86/hvm/vpic.c   |  4 ++
 xen/include/asm-x86/hvm/irq.h | 21 ++
 5 files changed, 137 insertions(+), 7 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index ae37bc434ae..2c4dd1b86f2 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -608,7 +608,7 @@ static int hvm_print_line(
 
 int hvm_domain_initialise(struct domain *d)
 {
-unsigned int nr_gsis;
+unsigned int nr_gsis, i;
 int rc;
 
 if ( !hvm_enabled )
@@ -656,6 +656,14 @@ int hvm_domain_initialise(struct domain *d)
 BUILD_BUG_ON(NR_HVM_DOMU_IRQS < NR_ISAIRQS);
 ASSERT(hvm_domain_irq(d)->nr_gsis >= NR_ISAIRQS);
 
+/* Initialize the EOI callback list. */
+hvm_domain_irq(d)->gsi_callbacks = xmalloc_array(struct list_head, 
nr_gsis);
+if ( !hvm_domain_irq(d)->gsi_callbacks )
+goto fail1;
+rwlock_init(_domain_irq(d)->gsi_callbacks_lock);
+for ( i = 0; i < nr_gsis; i++ )
+INIT_LIST_HEAD(_domain_irq(d)->gsi_callbacks[i]);
+
 /* need link to containing domain */
 d->arch.hvm.pl_time->domain = d;
 
@@ -715,6 +723,8 @@ int hvm_domain_initialise(struct domain *d)
  fail1:
 if ( is_hardware_domain(d) )
 xfree(d->arch.hvm.io_bitmap);
+if ( hvm_domain_irq(d) )
+XFREE(hvm_domain_irq(d)->gsi_callbacks);
 XFREE(d->arch.hvm.params);
 XFREE(d->arch.hvm.irq);
  fail0:
@@ -777,6 +787,9 @@ void hvm_domain_destroy(struct domain *d)
 vioapic_deinit(d);
 
 XFREE(d->arch.hvm.pl_time);
+
+if ( hvm_domain_irq(d) )
+XFREE(hvm_domain_irq(d)->gsi_callbacks);
 XFREE(d->arch.hvm.irq);
 
 list_for_each_safe ( ioport_list, tmp, >arch.hvm.g2m_ioport_list )
diff --git a/xen/arch/x86/hvm/irq.c b/xen/arch/x86/hvm/irq.c
index 38ac5fb6c7c..4825a387bdc 100644
--- a/xen/arch/x86/hvm/irq.c
+++ b/xen/arch/x86/hvm/irq.c
@@ -595,6 +595,81 @@ int hvm_local_events_need_delivery(struct vcpu *v)
 return !hvm_interrupt_blocked(v, intack);
 }
 
+int hvm_gsi_register_callback(struct domain *d, unsigned int gsi,
+  struct hvm_gsi_eoi_callback *cb)
+{
+struct hvm_irq *hvm_irq = hvm_domain_irq(d);
+
+if ( gsi >= hvm_irq->nr_gsis )
+{
+ASSERT_UNREACHABLE();
+return -EINVAL;
+}
+
+write_lock(_irq->gsi_callbacks_lock);
+list_add(>list, _irq->gsi_callbacks[gsi]);
+write_unlock(_irq->gsi_callbacks_lock);
+
+return 0;
+}
+
+int hvm_gsi_unregister_callback(struct domain *d, unsigned int gsi,
+struct hvm_gsi_eoi_callback *cb)
+{
+struct hvm_irq *hvm_irq = hvm_domain_irq(d);
+const struct list_head *tmp;
+bool found = false;
+
+if ( gsi >= hvm_irq->nr_gsis )
+{
+ASSERT_UNREACHABLE();
+return -EINVAL;
+}
+
+write_lock(_irq->gsi_callbacks_lock);
+list_for_each ( tmp, _irq->gsi_callbacks[gsi] )
+if ( tmp == >list )
+{
+list_del(>list);
+found = true;
+break;
+}
+write_unlock(_irq->gsi_callbacks_lock);
+
+return found ? 0 : -ENOENT;
+}
+
+void hvm_gsi_execute_callbacks(struct domain *d, unsigned int gsi)
+{
+struct hvm_irq *hvm_irq = hvm_domain_irq(d);
+struct hvm_gsi_eoi_callback *cb;
+
+read_lock(_irq->gsi_callbacks_lock);
+list_for_each_entry ( cb, _irq->gsi_callbacks[gsi], list )
+cb->callback(d, gsi, cb->data);
+read_unlock(_irq->gsi_callbacks_lock);
+}
+
+bool hvm_gsi_has_callbacks(const struct domain *d,

[PATCH v4 04/12] x86/vioapic: switch to use the EOI callback mechanism

2021-04-20 Thread Roger Pau Monne

Switch the emulated IO-APIC code to use the local APIC EOI callback
mechanism. This allows to remove the last hardcoded callback from
vlapic_handle_EOI. Removing the hardcoded vIO-APIC callback also
allows to getting rid of setting the EOI exit bitmap based on the
triggering mode, as now all users that require an EOI action use the
newly introduced callback mechanism.

Move and rename the vioapic_update_EOI now that it can be made static.

Signed-off-by: Roger Pau Monné 
---
Changes since v3:
 - Remove assert in eoi_callback.
 - Cast callback to bool.
 - Simplify check in ioapic_load: GSIs < 16 and edge interrupts can
   also have callbacks.
 - Reword comment about casting to boolean.

Changes since v2:
 - Explicitly convert the last alternative_vcall parameter to a
   boolean in vlapic_set_irq_callback.

Changes since v1:
 - Remove the triggering check in the update_eoi_exit_bitmap call.
 - Register the vlapic callbacks when loading the vIO-APIC state.
 - Reduce scope of ent.
---
 xen/arch/x86/hvm/vioapic.c | 127 -
 xen/arch/x86/hvm/vlapic.c  |  12 ++--
 2 files changed, 89 insertions(+), 50 deletions(-)

diff --git a/xen/arch/x86/hvm/vioapic.c b/xen/arch/x86/hvm/vioapic.c
index 87370dd4172..1c2763fc2bf 100644
--- a/xen/arch/x86/hvm/vioapic.c
+++ b/xen/arch/x86/hvm/vioapic.c
@@ -394,6 +394,48 @@ static const struct hvm_mmio_ops vioapic_mmio_ops = {
 .write = vioapic_write
 };
 
+static void eoi_callback(struct vcpu *v, unsigned int vector, void *data)
+{
+struct domain *d = v->domain;
+struct hvm_irq *hvm_irq = hvm_domain_irq(d);
+unsigned int i;
+
+spin_lock(>arch.hvm.irq_lock);
+
+for ( i = 0; i < d->arch.hvm.nr_vioapics; i++ )
+{
+struct hvm_vioapic *vioapic = domain_vioapic(d, i);
+unsigned int pin;
+
+for ( pin = 0; pin < vioapic->nr_pins; pin++ )
+{
+union vioapic_redir_entry *ent = >redirtbl[pin];
+
+if ( ent->fields.vector != vector )
+continue;
+
+ent->fields.remote_irr = 0;
+
+if ( is_iommu_enabled(d) )
+{
+spin_unlock(>arch.hvm.irq_lock);
+hvm_dpci_eoi(d, vioapic->base_gsi + pin);
+spin_lock(>arch.hvm.irq_lock);
+}
+
+if ( (ent->fields.trig_mode == VIOAPIC_LEVEL_TRIG) &&
+ !ent->fields.mask && !ent->fields.remote_irr &&
+ hvm_irq->gsi_assert_count[vioapic->base_gsi + pin] )
+{
+ent->fields.remote_irr = 1;
+vioapic_deliver(vioapic, pin);
+}
+}
+}
+
+spin_unlock(>arch.hvm.irq_lock);
+}
+
 static void ioapic_inj_irq(
 struct hvm_vioapic *vioapic,
 struct vlapic *target,
@@ -407,7 +449,8 @@ static void ioapic_inj_irq(
 ASSERT((delivery_mode == dest_Fixed) ||
(delivery_mode == dest_LowestPrio));
 
-vlapic_set_irq(target, vector, trig_mode);
+vlapic_set_irq_callback(target, vector, trig_mode,
+trig_mode ? eoi_callback : NULL, NULL);
 }
 
 static void vioapic_deliver(struct hvm_vioapic *vioapic, unsigned int pin)
@@ -514,49 +557,6 @@ void vioapic_irq_positive_edge(struct domain *d, unsigned 
int irq)
 }
 }
 
-void vioapic_update_EOI(struct domain *d, u8 vector)
-{
-struct hvm_irq *hvm_irq = hvm_domain_irq(d);
-union vioapic_redir_entry *ent;
-unsigned int i;
-
-ASSERT(has_vioapic(d));
-
-spin_lock(>arch.hvm.irq_lock);
-
-for ( i = 0; i < d->arch.hvm.nr_vioapics; i++ )
-{
-struct hvm_vioapic *vioapic = domain_vioapic(d, i);
-unsigned int pin;
-
-for ( pin = 0; pin < vioapic->nr_pins; pin++ )
-{
-ent = >redirtbl[pin];
-if ( ent->fields.vector != vector )
-continue;
-
-ent->fields.remote_irr = 0;
-
-if ( is_iommu_enabled(d) )
-{
-spin_unlock(>arch.hvm.irq_lock);
-hvm_dpci_eoi(d, vioapic->base_gsi + pin);
-spin_lock(>arch.hvm.irq_lock);
-}
-
-if ( (ent->fields.trig_mode == VIOAPIC_LEVEL_TRIG) &&
- !ent->fields.mask && !ent->fields.remote_irr &&
- hvm_irq->gsi_assert_count[vioapic->base_gsi + pin] )
-{
-ent->fields.remote_irr = 1;
-vioapic_deliver(vioapic, pin);
-}
-}
-}
-
-spin_unlock(>arch.hvm.irq_lock);
-}
-
 int vioapic_get_mask(const struct domain *d, unsigned int gsi)
 {
 unsigned int pin = 0; /* See gsi_vioapic */
@@ -610,6 +610,8 @@ static int ioapic_save(struct vcpu *v, hvm_domain_context_t 
*h)
 static int ioapic_load(struct domain *d, hvm_domain_context_t *h)
 {
 struct hvm_vioapic *s;
+unsigned int i;
+int rc;
 
 if ( !has_vioapic(d) )
 return -ENODEV;
@@ -620,7 +622,42 @@ static int ioapic_load(struct domain *d, 
hvm_domain_context_t *h)

[PATCH v4 03/12] x86/vmsi: use the newly introduced EOI callbacks

2021-04-20 Thread Roger Pau Monne

Remove the unconditional call to hvm_dpci_msi_eoi in vlapic_handle_EOI
and instead use the newly introduced EOI callback mechanism in order
to register a callback for MSI vectors injected from passed through
devices.

This avoids having multiple callback functions open-coded in
vlapic_handle_EOI, as there is now a generic framework for registering
such callbacks. It also avoids doing an unconditional call to
hvm_dpci_msi_eoi for each EOI processed by the local APIC.

Note that now the callback is only registered (and thus executed) when
there's an MSI interrupt originating from a PCI passthrough device
being injected into the guest, so the check in hvm_dpci_msi_eoi can be
removed as it's already done by hvm_dirq_assist which is the only
caller of vmsi_deliver_pirq.

Signed-off-by: Roger Pau Monné 
Reviewed-by: Jan Beulich 
---
Changes since v3:
 - Fix the callback to take a vcpu parameter.

Changes since v2:
 - Expand commit message.
 - Pass the domain as the callback data.
 - Remove the check in hvm_dpci_msi_eoi
---
 xen/arch/x86/hvm/vlapic.c |  2 --
 xen/arch/x86/hvm/vmsi.c   | 35 ++-
 xen/drivers/passthrough/x86/hvm.c |  6 ++
 xen/include/asm-x86/hvm/io.h  |  2 +-
 4 files changed, 24 insertions(+), 21 deletions(-)

diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c
index 4465beaeec1..cfcbd732b16 100644
--- a/xen/arch/x86/hvm/vlapic.c
+++ b/xen/arch/x86/hvm/vlapic.c
@@ -503,8 +503,6 @@ void vlapic_handle_EOI(struct vlapic *vlapic, u8 vector)
 if ( vlapic_test_vector(vector, >regs->data[APIC_TMR]) )
 vioapic_update_EOI(d, vector);
 
-hvm_dpci_msi_eoi(d, vector);
-
 spin_lock_irqsave(>callback_lock, flags);
 callback = vlapic->callbacks[index].callback;
 vlapic->callbacks[index].callback = NULL;
diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c
index 13e2a190b43..03ae0dfb3c5 100644
--- a/xen/arch/x86/hvm/vmsi.c
+++ b/xen/arch/x86/hvm/vmsi.c
@@ -44,11 +44,9 @@
 #include 
 #include 
 
-static void vmsi_inj_irq(
-struct vlapic *target,
-uint8_t vector,
-uint8_t trig_mode,
-uint8_t delivery_mode)
+static void vmsi_inj_irq(struct vlapic *target, uint8_t vector,
+ uint8_t trig_mode, uint8_t delivery_mode,
+ vlapic_eoi_callback_t *callback, void *data)
 {
 HVM_DBG_LOG(DBG_LEVEL_VLAPIC, "vmsi_inj_irq: vec %02x trig %d dm %d\n",
 vector, trig_mode, delivery_mode);
@@ -57,17 +55,17 @@ static void vmsi_inj_irq(
 {
 case dest_Fixed:
 case dest_LowestPrio:
-vlapic_set_irq(target, vector, trig_mode);
+vlapic_set_irq_callback(target, vector, trig_mode, callback, data);
 break;
 default:
 BUG();
 }
 }
 
-int vmsi_deliver(
-struct domain *d, int vector,
-uint8_t dest, uint8_t dest_mode,
-uint8_t delivery_mode, uint8_t trig_mode)
+static int vmsi_deliver_callback(struct domain *d, int vector, uint8_t dest,
+ uint8_t dest_mode, uint8_t delivery_mode,
+ uint8_t trig_mode,
+ vlapic_eoi_callback_t *callback, void *data)
 {
 struct vlapic *target;
 struct vcpu *v;
@@ -78,7 +76,8 @@ int vmsi_deliver(
 target = vlapic_lowest_prio(d, NULL, 0, dest, dest_mode);
 if ( target != NULL )
 {
-vmsi_inj_irq(target, vector, trig_mode, delivery_mode);
+vmsi_inj_irq(target, vector, trig_mode, delivery_mode, callback,
+ data);
 break;
 }
 HVM_DBG_LOG(DBG_LEVEL_VLAPIC, "null MSI round robin: vector=%02x\n",
@@ -89,8 +88,8 @@ int vmsi_deliver(
 for_each_vcpu ( d, v )
 if ( vlapic_match_dest(vcpu_vlapic(v), NULL,
0, dest, dest_mode) )
-vmsi_inj_irq(vcpu_vlapic(v), vector,
- trig_mode, delivery_mode);
+vmsi_inj_irq(vcpu_vlapic(v), vector, trig_mode, delivery_mode,
+ callback, data);
 break;
 
 default:
@@ -103,6 +102,13 @@ int vmsi_deliver(
 return 0;
 }
 
+int vmsi_deliver(struct domain *d, int vector, uint8_t dest, uint8_t dest_mode,
+ uint8_t delivery_mode, uint8_t trig_mode)
+{
+return vmsi_deliver_callback(d, vector, dest, dest_mode, delivery_mode,
+ trig_mode, NULL, NULL);
+}
+
 void vmsi_deliver_pirq(struct domain *d, const struct hvm_pirq_dpci *pirq_dpci)
 {
 uint32_t flags = pirq_dpci->gmsi.gflags;
@@ -119,7 +125,8 @@ void vmsi_deliver_pirq(struct domain *d, const struct 
hvm_pirq_dpci *pirq_dpci)
 
 ASSERT(pirq_dpci->flags & HVM_IRQ_DPCI_GUEST_MSI);
 
-vmsi_deliver(d, vector, dest, dest_mode, delivery_mode, trig_mode);
+vmsi_deliver_callback(d, vector, dest, dest_mode, delivery_mode, trig_mode,
+  hvm_dpci_msi_eoi, NULL);
 }
 
 /*

[PATCH v4 02/12] x86/vlapic: introduce an EOI callback mechanism

2021-04-20 Thread Roger Pau Monne

Add a new vlapic_set_irq_callback helper in order to inject a vector
and set a callback to be executed when the guest performs the end of
interrupt acknowledgment.

Such functionality will be used to migrate the current ad hoc handling
done in vlapic_handle_EOI for the vectors that require some logic to
be executed when the end of interrupt is performed.

The setter of the callback will be in charge for setting the callback
again on guest restore, as callbacks are not saved as part of the
vlapic state. That is the reason why vlapic_set_callback is not a
static function.

No current users are migrated to use this new functionality yet, so no
functional change expected as a result.

Signed-off-by: Roger Pau Monné 
---
Changes since v3:
 - Use xzalloc.
 - Drop printk on ENOMEM.
 - Add vcpu parameter to vlapic EOI callback.
 - Check that the vector is pending in ISR or IRR when printing a
   warning message because of an overriding callback.
 - Fix commit message regarding resume mention.

Changes since v2:
 - Fix commit message typo.
 - Expand commit message.
 - Also print a warning if the callback data is overridden.
 - Properly free memory in case of error in vlapic_init.

Changes since v1:
 - Make vlapic_set_irq an inline function on the header.
 - Clear the callback hook in vlapic_handle_EOI.
 - Introduce a helper to set the callback without injecting a vector.
 - Remove unneeded parentheses.
 - Reduce callback table by 16.
 - Use %pv to print domain/vcpu ID.
---
 xen/arch/x86/hvm/vlapic.c| 62 ++--
 xen/include/asm-x86/hvm/vlapic.h | 19 +-
 2 files changed, 78 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c
index 5e21fb4937d..4465beaeec1 100644
--- a/xen/arch/x86/hvm/vlapic.c
+++ b/xen/arch/x86/hvm/vlapic.c
@@ -144,7 +144,37 @@ bool vlapic_test_irq(const struct vlapic *vlapic, uint8_t 
vec)
 return vlapic_test_vector(vec, >regs->data[APIC_IRR]);
 }
 
-void vlapic_set_irq(struct vlapic *vlapic, uint8_t vec, uint8_t trig)
+void vlapic_set_callback(struct vlapic *vlapic, unsigned int vec,
+ vlapic_eoi_callback_t *callback, void *data)
+{
+unsigned long flags;
+unsigned int index = vec - 16;
+
+if ( !callback || vec < 16 || vec >= X86_NR_VECTORS )
+{
+ASSERT_UNREACHABLE();
+return;
+}
+
+spin_lock_irqsave(>callback_lock, flags);
+if ( vlapic->callbacks[index].callback &&
+ (vlapic->callbacks[index].callback != callback ||
+  vlapic->callbacks[index].data != data) &&
+ (vlapic_test_vector(vec, >regs->data[APIC_IRR]) ||
+  vlapic_test_vector(vec, >regs->data[APIC_ISR])) )
+printk(XENLOG_G_WARNING
+   "%pv overriding vector %#x callback %ps (%p) data %p "
+   "with %ps (%p) data %p\n",
+   vlapic_vcpu(vlapic), vec, vlapic->callbacks[index].callback,
+   vlapic->callbacks[index].callback, 
vlapic->callbacks[index].data,
+   callback, callback, data);
+vlapic->callbacks[index].callback = callback;
+vlapic->callbacks[index].data = data;
+spin_unlock_irqrestore(>callback_lock, flags);
+}
+
+void vlapic_set_irq_callback(struct vlapic *vlapic, uint8_t vec, uint8_t trig,
+ vlapic_eoi_callback_t *callback, void *data)
 {
 struct vcpu *target = vlapic_vcpu(vlapic);
 
@@ -159,8 +189,12 @@ void vlapic_set_irq(struct vlapic *vlapic, uint8_t vec, 
uint8_t trig)
 else
 vlapic_clear_vector(vec, >regs->data[APIC_TMR]);
 
+if ( callback )
+vlapic_set_callback(vlapic, vec, callback, data);
+
 if ( hvm_funcs.update_eoi_exit_bitmap )
-alternative_vcall(hvm_funcs.update_eoi_exit_bitmap, target, vec, trig);
+alternative_vcall(hvm_funcs.update_eoi_exit_bitmap, target, vec,
+  trig || callback);
 
 if ( hvm_funcs.deliver_posted_intr )
 alternative_vcall(hvm_funcs.deliver_posted_intr, target, vec);
@@ -461,11 +495,24 @@ void vlapic_handle_EOI(struct vlapic *vlapic, u8 vector)
 {
 struct vcpu *v = vlapic_vcpu(vlapic);
 struct domain *d = v->domain;
+vlapic_eoi_callback_t *callback;
+void *data;
+unsigned long flags;
+unsigned int index = vector - 16;
 
 if ( vlapic_test_vector(vector, >regs->data[APIC_TMR]) )
 vioapic_update_EOI(d, vector);
 
 hvm_dpci_msi_eoi(d, vector);
+
+spin_lock_irqsave(>callback_lock, flags);
+callback = vlapic->callbacks[index].callback;
+vlapic->callbacks[index].callback = NULL;
+data = vlapic->callbacks[index].data;
+spin_unlock_irqrestore(>callback_lock, flags);
+
+if ( callback )
+callback(v, vector, data);
 }
 
 static bool_t is_multicast_dest(struct vlapic *vlapic, unsigned int short_hand,
@@ -1623,9 +1670,19 @@ int vlapic_init(struct vcpu *v)
 
 clear_page(vlapic->regs);
 
+vlapic->callbacks = xzalloc_array(typeof(*vlapic->callbacks),
+

[PATCH v4 01/12] x86/rtc: drop code related to strict mode

2021-04-20 Thread Roger Pau Monne

Xen has been for a long time setting the WAET ACPI table "RTC good"
flag, which implies there's no need to perform a read of the RTC REG_C
register in order to get further interrupts after having received one.
This is hardcoded in the static ACPI tables, and in the RTC emulation
in Xen.

Drop the support for the alternative (strict) mode, it's been unused
for a long (since Xen 4.3) time without any complains.

Signed-off-by: Roger Pau Monné 
---
Further changes in the series will require that no registering or
unregistering of callback is done inside of the handlers themselves,
like it was done in rtc_pf_callback when in strict_mode.
---
Changes since v3:
 - New in this version.
---
 xen/arch/x86/hvm/rtc.c | 27 +--
 xen/arch/x86/hvm/vpt.c |  4 +---
 2 files changed, 2 insertions(+), 29 deletions(-)

diff --git a/xen/arch/x86/hvm/rtc.c b/xen/arch/x86/hvm/rtc.c
index 3150f5f1479..9992595c45a 100644
--- a/xen/arch/x86/hvm/rtc.c
+++ b/xen/arch/x86/hvm/rtc.c
@@ -46,15 +46,6 @@
 #define epoch_year 1900
 #define get_year(x)(x + epoch_year)
 
-enum rtc_mode {
-   rtc_mode_no_ack,
-   rtc_mode_strict
-};
-
-/* This must be in sync with how hvmloader sets the ACPI WAET flags. */
-#define mode_is(d, m) ((void)(d), rtc_mode_##m == rtc_mode_no_ack)
-#define rtc_mode_is(s, m) mode_is(vrtc_domain(s), m)
-
 static void rtc_copy_date(RTCState *s);
 static void rtc_set_time(RTCState *s);
 static inline int from_bcd(RTCState *s, int a);
@@ -64,9 +55,6 @@ static void rtc_update_irq(RTCState *s)
 {
 ASSERT(spin_is_locked(>lock));
 
-if ( rtc_mode_is(s, strict) && (s->hw.cmos_data[RTC_REG_C] & RTC_IRQF) )
-return;
-
 /* IRQ is raised if any source is both raised & enabled */
 if ( !(s->hw.cmos_data[RTC_REG_B] &
s->hw.cmos_data[RTC_REG_C] &
@@ -74,8 +62,7 @@ static void rtc_update_irq(RTCState *s)
 return;
 
 s->hw.cmos_data[RTC_REG_C] |= RTC_IRQF;
-if ( rtc_mode_is(s, no_ack) )
-hvm_isa_irq_deassert(vrtc_domain(s), RTC_IRQ);
+hvm_isa_irq_deassert(vrtc_domain(s), RTC_IRQ);
 hvm_isa_irq_assert(vrtc_domain(s), RTC_IRQ, NULL);
 }
 
@@ -86,19 +73,7 @@ static void rtc_pf_callback(struct vcpu *v, void *opaque)
 RTCState *s = opaque;
 
 spin_lock(>lock);
-
-if ( !rtc_mode_is(s, no_ack)
- && (s->hw.cmos_data[RTC_REG_C] & RTC_IRQF)
- && ++(s->pt_dead_ticks) >= 10 )
-{
-/* VM is ignoring its RTC; no point in running the timer */
-TRACE_0D(TRC_HVM_EMUL_RTC_STOP_TIMER);
-destroy_periodic_time(>pt);
-s->period = 0;
-}
-
 s->hw.cmos_data[RTC_REG_C] |= RTC_PF|RTC_IRQF;
-
 spin_unlock(>lock);
 }
 
diff --git a/xen/arch/x86/hvm/vpt.c b/xen/arch/x86/hvm/vpt.c
index 4cc0a0848bd..24d90c0a186 100644
--- a/xen/arch/x86/hvm/vpt.c
+++ b/xen/arch/x86/hvm/vpt.c
@@ -21,7 +21,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 
 #define mode_is(d, name) \
@@ -337,8 +336,7 @@ int pt_update_irq(struct vcpu *v)
 {
 if ( pt->pending_intr_nr )
 {
-/* RTC code takes care of disabling the timer itself. */
-if ( (pt->irq != RTC_IRQ || !pt->priv) && pt_irq_masked(pt) &&
+if ( pt_irq_masked(pt) &&
  /* Level interrupts should be asserted even if masked. */
  !pt->level )
 {
-- 
2.30.1

[PATCH v4 00/12] x86/intr: introduce EOI callbacks and fix vPT

2021-04-20 Thread Roger Pau Monne

Hello,

The following series introduces EOI callbacks and switches a number of
subsystems to use them. The first one is vmsi and dpci, which are quite
straight forward to switch since they used to open code hooks in the EOI
handlers.

Finally HVM virtual timers are also switched to a different model where
EOI callbacks are used instead of checking at every return to guest
whether a timer interrupt is being injected. Note that such change also
fixes a bug in virtual periodic timers that prevented injecting to a vCPU
different than the one where the timer is assigned (and that would
currently match the i8259 target).

Those changes are paired together so that practical applications of
having EOI callbacks can be seen in action.

Note that further cleanup can be done, but I think the series is already
big enough and provides a clear benefit.

A couple of notes from the testing performed:
 - PVH dom0.
 - Windows guest, limited to 2% capacity to test delay for missed ticks
   mode - time is consistent in the guest.
 - Windows guest migration.
 - PCI passthrough to a guest.

Thanks, Roger.

Roger Pau Monne (12):
  x86/rtc: drop code related to strict mode
  x86/vlapic: introduce an EOI callback mechanism
  x86/vmsi: use the newly introduced EOI callbacks
  x86/vioapic: switch to use the EOI callback mechanism
  x86/hvm: allowing registering EOI callbacks for GSIs
  x86/dpci: move code
  x86/dpci: switch to use a GSI EOI callback
  x86/vpt: switch interrupt injection model
  x86/irq: remove unused parameter from hvm_isa_irq_assert
  x86/irq: drop return value from hvm_ioapic_assert
  x86/vpt: remove vPT timers per-vCPU lists
  x86/vpt: introduce a per-vPT lock

 xen/arch/x86/domain.c |   4 +-
 xen/arch/x86/emul-i8254.c |   1 +
 xen/arch/x86/hvm/dm.c |   2 +-
 xen/arch/x86/hvm/hpet.c   |   8 +-
 xen/arch/x86/hvm/hvm.c|  23 +-
 xen/arch/x86/hvm/irq.c|  93 +-
 xen/arch/x86/hvm/pmtimer.c|   2 +-
 xen/arch/x86/hvm/rtc.c|  30 +-
 xen/arch/x86/hvm/svm/intr.c   |   3 -
 xen/arch/x86/hvm/vioapic.c| 144 ++---
 xen/arch/x86/hvm/vlapic.c |  72 -
 xen/arch/x86/hvm/vmsi.c   |  35 +-
 xen/arch/x86/hvm/vmx/intr.c   |  59 
 xen/arch/x86/hvm/vpic.c   |   8 +-
 xen/arch/x86/hvm/vpt.c| 514 +-
 xen/drivers/passthrough/x86/hvm.c | 223 -
 xen/include/asm-x86/hvm/io.h  |   3 +-
 xen/include/asm-x86/hvm/irq.h |  28 +-
 xen/include/asm-x86/hvm/vcpu.h|   4 -
 xen/include/asm-x86/hvm/vlapic.h  |  19 +-
 xen/include/asm-x86/hvm/vpt.h |  32 +-
 21 files changed, 630 insertions(+), 677 deletions(-)

-- 
2.30.1

[xen-4.12-testing test] 161304: regressions - FAIL

2021-04-20 Thread osstest service owner

flight 161304 xen-4.12-testing real [real]
flight 161327 xen-4.12-testing real-retest [real]
http://logs.test-lab.xenproject.org/osstest/logs/161304/
http://logs.test-lab.xenproject.org/osstest/logs/161327/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-amd64-xl-qcow2 19 guest-localmigrate/x10 fail in 161261 REGR. vs. 
159418

Tests which are failing intermittently (not blocking):
 test-amd64-amd64-xl-qcow218 guest-saverestore.2fail pass in 161261

Tests which did not succeed, but are not blocking:
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 159418
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 159418
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 159418
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 159418
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 159418
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 159418
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 159418
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 159418
 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 159418
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 159418
 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 159418
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-arm64-arm64-xl-seattle  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass

version targeted for testing:
 xen  5b280a59c4dd8dad6cc8da28db981b193d10acee
baseline version:
 xen  4cf5929606adc2fb1ab4e2921c14ba4b8046ecd1

Last test of basis   159418  2021-02-16 15:06:11 Z   62 days
Failing since160128  2021-03-18 14:36:18 Z   32 days   43

Re: [PATCH] xen/arm: smmuv1: Revert associating the group pointer with the S2CR

2021-04-20 Thread Rahul Singh

Hi Julien,

> On 20 Apr 2021, at 1:39 pm, Julien Grall  wrote:
> 
> Hi,
> 
> On 19/04/2021 17:21, Stefano Stabellini wrote:
>> On Mon, 19 Apr 2021, Rahul Singh wrote:
>>> Hi Julien,
>>> 
 On 18 Apr 2021, at 6:48 pm, Julien Grall  wrote:
 
 
 
 On 16/04/2021 17:41, Rahul Singh wrote:
> Hi Julien
 
 Hi Rahul,
 
>> On 16 Apr 2021, at 5:08 pm, Julien Grall  wrote:
>> 
>> 
>> 
>> On 16/04/2021 17:05, Rahul Singh wrote:
 On 16 Apr 2021, at 4:23 pm, Julien Grall  wrote:
 
 
 
 On 16/04/2021 16:01, Rahul Singh wrote:
> Hi Julien,
 
 Hi Rahul,
 
>> On 16 Apr 2021, at 3:35 pm, Julien Grall  wrote:
>> 
>> Hi,
>> 
>> On 16/04/2021 12:25, Rahul Singh wrote:
>>> Revert the code that associates the group pointer with the S2CR as 
>>> this
>>> code causing an issue when the SMMU device has more than one master
>>> device.
>> 
>> It is not clear to me why this change was first added. Are we 
>> missing any feature when reverting it?
> This feature was added when we backported the code from Linux to fix 
> the stream match conflict issue
> as part of commit "xen/arm: smmuv1: Intelligent SMR allocation”.
> This is an extra feature added to allocate IOMMU group based on 
> stream-id. If two device has the
> same stream-id then we assign those devices to the same group.
 
 If we revert the patch, then it would not be possible to use the SMMU 
 if two devices use the same stream-id. Is that correct?
>>> No. If we revert the patch we can use the SMMU if two devices use the 
>>> same stream-id without any issue but each device will be in a separate 
>>> group.This is same behaviour before the code is merged.
>> 
>> Ok. So there is no change in behavior. Good. Can you propose a commit 
>> message clarifying that?
> Please have a look if it make sense.
> xen/arm: smmuv1: Revert associating the group pointer with the S2CR
> Revert the code that associates the group pointer with the S2CR as this
> code causing an issue when the SMMU device has more than one master
> device with same stream-id. This issue is introduced by the below commit:
> “0435784cc75dcfef3b5f59c29deb1dbb84265ddb:xen/arm: smmuv1: Intelligent 
> SMR allocation”
>  Reverting the code will not impact to use of SMMU if two devices use the
> same stream-id but each device will be in a separate group. This is the 
> same
> behaviour before the code is merged.
 
 Look good to me. Is this patch to be applied on top of Stefano's series? 
 If not, is there going to be more clash?
 
>>> 
>>> As per Stefano's mail he already tested his patch series on top of this 
>>> patch. I think this patch has to merged before Stefano’s patch series
>>> Let Stefano also confirm that.
>>> 
>>> I think there will be no more clashes.
>> Yes, this patch is to be committed *before* my series and I have already
>> tested this patch alone and with my series on top. Both cases work fine.
> 
> Cool. Thanks for the confirmation. I have committed the patch with the new 
> commit message (although, I tweaked a little bit to use the abbreviated 
> version of the commit ID).
> 

Thanks! 

Regards,
Rahul
> Cheers,
> 
> -- 
> Julien Grall

[PATCH] tools/xenstored: Remove unused prototype

2021-04-20 Thread Julien Grall

From: Julien Grall 

A prototype for dump_conn() has been present for quite a long time
but there are no implementation. Even, AFAICT in the patch that
introduced it. So drop it.

Signed-off-by: Julien Grall 
---
 tools/xenstore/xenstored_core.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
index d54a6042a9f7..591b28e4552f 100644
--- a/tools/xenstore/xenstored_core.c
+++ b/tools/xenstore/xenstored_core.c
@@ -2085,7 +2085,6 @@ static struct option options[] = {
 #endif
{ NULL, 0, NULL, 0 } };
 
-extern void dump_conn(struct connection *conn); 
 int dom0_domid = 0;
 int dom0_event = 0;
 int priv_domid = 0;
-- 
2.17.1

Re: swiotlb cleanups v3

2021-04-20 Thread Tom Lendacky

On 4/20/21 4:23 AM, Christoph Hellwig wrote:
> On Sat, Apr 17, 2021 at 11:39:22AM -0500, Tom Lendacky wrote:
>> Somewhere between the 1st and 2nd patch, specifying a specific swiotlb
>> for an SEV guest is no longer honored. For example, if I start an SEV
>> guest with 16GB of memory and specify swiotlb=131072 I used to get a
>> 256MB SWIOTLB. However, after the 2nd patch, the swiotlb=131072 is no
>> longer honored and I get a 982MB SWIOTLB (as set via sev_setup_arch() in
>> arch/x86/mm/mem_encrypt.c).
>>
>> I can't be sure which patch caused the issue since an SEV guest fails to
>> boot with the 1st patch but can boot with the 2nd patch, at which point
>> the SWIOTLB comes in at 982MB (I haven't had a chance to debug it and so
>> I'm hoping you might be able to quickly spot what's going on).
> 
> Can you try this patch?

Thanks, Christoph. This works for honoring the command line value with SEV
guests.

There was still a reference to default_nslabs in setup_io_tlb_npages()
that I'm not sure how you want to handle. I just commented it out for now
to let the code compile to test the intent of the patch.

Thanks,
Tom

> 
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index 0a5b6f7e75bce6..ac81ef97df32f5 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -71,15 +71,17 @@ struct io_tlb_mem *io_tlb_default_mem;
>   */
>  static unsigned int max_segment;
>  
> -static unsigned long default_nslabs = IO_TLB_DEFAULT_SIZE >> IO_TLB_SHIFT;
> +static unsigned long swiotlb_cmdline_size;
>  
>  static int __init
>  setup_io_tlb_npages(char *str)
>  {
>   if (isdigit(*str)) {
>   /* avoid tail segment of size < IO_TLB_SEGSIZE */
> - default_nslabs =
> - ALIGN(simple_strtoul(str, , 0), IO_TLB_SEGSIZE);
> + unsigned long nslabs = simple_strtoul(str, , 0);
> +
> + swiotlb_cmdline_size =
> + ALIGN(nslabs, IO_TLB_SEGSIZE) << IO_TLB_SHIFT;
>   }
>   if (*str == ',')
>   ++str;
> @@ -108,7 +110,9 @@ void swiotlb_set_max_segment(unsigned int val)
>  
>  unsigned long swiotlb_size_or_default(void)
>  {
> - return default_nslabs << IO_TLB_SHIFT;
> + if (swiotlb_cmdline_size)
> + return swiotlb_cmdline_size;
> + return IO_TLB_DEFAULT_SIZE;
>  }
>  
>  void __init swiotlb_adjust_size(unsigned long size)
> @@ -118,9 +122,10 @@ void __init swiotlb_adjust_size(unsigned long size)
>* architectures such as those supporting memory encryption to
>* adjust/expand SWIOTLB size for their use.
>*/
> - size = ALIGN(size, IO_TLB_SIZE);
> - default_nslabs = ALIGN(size >> IO_TLB_SHIFT, IO_TLB_SEGSIZE);
> - pr_info("SWIOTLB bounce buffer size adjusted to %luMB", size >> 20);
> + if (!swiotlb_cmdline_size)
> + swiotlb_cmdline_size = ALIGN(size, IO_TLB_SIZE);
> + pr_info("SWIOTLB bounce buffer size adjusted to %luMB",
> + swiotlb_cmdline_size >> 20);
>  }
>  
>  void swiotlb_print_info(void)
> @@ -209,7 +214,7 @@ int __init swiotlb_init_with_tbl(char *tlb, unsigned long 
> nslabs, int verbose)
>  void  __init
>  swiotlb_init(int verbose)
>  {
> - size_t bytes = PAGE_ALIGN(default_nslabs << IO_TLB_SHIFT);
> + size_t bytes = PAGE_ALIGN(swiotlb_size_or_default());
>   void *tlb;
>  
>   if (swiotlb_force == SWIOTLB_NO_FORCE)
> @@ -219,7 +224,7 @@ swiotlb_init(int verbose)
>   tlb = memblock_alloc_low(bytes, PAGE_SIZE);
>   if (!tlb)
>   goto fail;
> - if (swiotlb_init_with_tbl(tlb, default_nslabs, verbose))
> + if (swiotlb_init_with_tbl(tlb, bytes >> IO_TLB_SHIFT, verbose))
>   goto fail_free_mem;
>   return;
>  
>

Re: [PATCH v2] xen/pci: Refactor PCI MSI interrupts related code

2021-04-20 Thread Rahul Singh

Hi Jan

> On 19 Apr 2021, at 1:33 pm, Jan Beulich  wrote:
> 
> On 19.04.2021 13:54, Julien Grall wrote:
>> For the time being, I think move this code in x86 is a lot better than 
>> #ifdef or keep the code in common code.
> 
> Well, I would perhaps agree if it ended up being #ifdef CONFIG_X86.
> I would perhaps not agree if there was a new CONFIG_* which other
> (future) arch-es could select if desired.

I agree with Julien moving the code to x86 file as currently it is referenced 
only in x86 code
and as of now we are not sure how other architecture will implement the 
Interrupt remapping
(via IOMMU or any other means).  

Let me know if you are ok with moving the code to x86.

Regards,
Rahul
> 
> Jan

Re: [PATCH 5/9] arm/mm: Get rid of READ/WRITE_SYSREG32

2021-04-20 Thread Julien Grall


Hi Michal,

On 20/04/2021 08:08, Michal Orzel wrote:

AArch64 system registers are 64bit whereas AArch32 ones
are 32bit or 64bit. MSR/MRS are expecting 64bit values thus
we should get rid of helpers READ/WRITE_SYSREG32
in favour of using READ/WRITE_SYSREG.
We should also use register_t type when reading sysregs
which can correspond to uint64_t or uint32_t.
Even though many AArch64 sysregs have upper 32bit reserved
it does not mean that they can't be widen in the future.

Modify SCTLR_EL2 accesses to use READ/WRITE_SYSREG.


SCTLR_EL2 already has bits defined in the range [32:63]. So this change 
is going to have a side effect as AFAICT head.S will not touch those 
bits. So they are now going to be preserved.


The Arm Arm defines them as unknown if implemented. Therefore shouldn't 
we zero them somewhere else?


In any case, I think the commit message ought to contain an analysis for 
system registers that happened to have bits defined in the range [32:63].




Signed-off-by: Michal Orzel 
---
  xen/arch/arm/mm.c| 2 +-
  xen/arch/arm/traps.c | 2 +-
  2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
index 59f8a3f15f..0e07335291 100644
--- a/xen/arch/arm/mm.c
+++ b/xen/arch/arm/mm.c
@@ -613,7 +613,7 @@ void __init remove_early_mappings(void)
   */
  static void xen_pt_enforce_wnx(void)
  {
-WRITE_SYSREG32(READ_SYSREG32(SCTLR_EL2) | SCTLR_Axx_ELx_WXN, SCTLR_EL2);
+WRITE_SYSREG(READ_SYSREG(SCTLR_EL2) | SCTLR_Axx_ELx_WXN, SCTLR_EL2);
  /*
   * The TLBs may cache SCTLR_EL2.WXN. So ensure it is synchronized
   * before flushing the TLBs.
diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
index c7acdb2087..e7384381cc 100644
--- a/xen/arch/arm/traps.c
+++ b/xen/arch/arm/traps.c
@@ -915,7 +915,7 @@ static void _show_registers(const struct cpu_user_regs 
*regs,
  printk(" VTTBR_EL2: %016"PRIx64"\n", ctxt->vttbr_el2);
  printk("\n");
  
-printk(" SCTLR_EL2: %08"PRIx32"\n", READ_SYSREG32(SCTLR_EL2));

+printk(" SCTLR_EL2: %"PRIregister"\n", READ_SYSREG(SCTLR_EL2));
  printk("   HCR_EL2: %"PRIregister"\n", READ_SYSREG(HCR_EL2));
  printk(" TTBR0_EL2: %016"PRIx64"\n", READ_SYSREG64(TTBR0_EL2));
  printk("\n");



Cheers,

--
Julien Grall

Re: [PATCH 4/9] arm/p2m: Get rid of READ/WRITE_SYSREG32

2021-04-20 Thread Julien Grall


Hi Michal,

On 20/04/2021 08:08, Michal Orzel wrote:

AArch64 system registers are 64bit whereas AArch32 ones
are 32bit or 64bit. MSR/MRS are expecting 64bit values thus
we should get rid of helpers READ/WRITE_SYSREG32
in favour of using READ/WRITE_SYSREG.
We should also use register_t type when reading sysregs
which can correspond to uint64_t or uint32_t.
Even though many AArch64 sysregs have upper 32bit reserved
it does not mean that they can't be widen in the future.

Modify type of vtcr to register_t.

Signed-off-by: Michal Orzel 


Reviewed-by: Julien Grall 

Cheers,

--
Julien Grall

Re: [PATCH 3/9] arm/gic: Get rid of READ/WRITE_SYSREG32

2021-04-20 Thread Julien Grall


Hi Michal,

On 20/04/2021 08:08, Michal Orzel wrote:

AArch64 system registers are 64bit whereas AArch32 ones
are 32bit or 64bit. MSR/MRS are expecting 64bit values thus
we should get rid of helpers READ/WRITE_SYSREG32
in favour of using READ/WRITE_SYSREG.
We should also use register_t type when reading sysregs
which can correspond to uint64_t or uint32_t.
Even though many AArch64 sysregs have upper 32bit reserved
it does not mean that they can't be widen in the future.

Modify types of following members of struct gic_v3 to register_t:
-hcr(not used at all in Xen)


It looks like we never used it (even in the patch introducing it). So I 
would suggest to remove it in a patch before this one.



-vmcr
-sre_el1
-apr0
-apr1

Signed-off-by: Michal Orzel 
---
  xen/arch/arm/gic-v3-lpi.c |  2 +-
  xen/arch/arm/gic-v3.c | 96 ---
  xen/include/asm-arm/gic.h |  6 +--
  3 files changed, 54 insertions(+), 50 deletions(-)

diff --git a/xen/arch/arm/gic-v3-lpi.c b/xen/arch/arm/gic-v3-lpi.c
index 869bc97fa1..e1594dd20e 100644
--- a/xen/arch/arm/gic-v3-lpi.c
+++ b/xen/arch/arm/gic-v3-lpi.c
@@ -178,7 +178,7 @@ void gicv3_do_LPI(unsigned int lpi)
  irq_enter();
  
  /* EOI the LPI already. */

-WRITE_SYSREG32(lpi, ICC_EOIR1_EL1);
+WRITE_SYSREG(lpi, ICC_EOIR1_EL1);
  
  /* Find out if a guest mapped something to this physical LPI. */

  hlpip = gic_get_host_lpi(lpi);
diff --git a/xen/arch/arm/gic-v3.c b/xen/arch/arm/gic-v3.c
index ac28013c19..0634013a67 100644
--- a/xen/arch/arm/gic-v3.c
+++ b/xen/arch/arm/gic-v3.c
@@ -246,12 +246,12 @@ static void gicv3_ich_write_lr(int lr, uint64_t val)
   */
  static void gicv3_enable_sre(void)
  {
-uint32_t val;
+register_t val;
  
-val = READ_SYSREG32(ICC_SRE_EL2);

+val = READ_SYSREG(ICC_SRE_EL2);
  val |= GICC_SRE_EL2_SRE;
  
-WRITE_SYSREG32(val, ICC_SRE_EL2);

+WRITE_SYSREG(val, ICC_SRE_EL2);
  isb();
  }
  
@@ -315,16 +315,16 @@ static void restore_aprn_regs(const union gic_state_data *d)

  switch ( gicv3.nr_priorities )
  {
  case 7:
-WRITE_SYSREG32(d->v3.apr0[2], ICH_AP0R2_EL2);
-WRITE_SYSREG32(d->v3.apr1[2], ICH_AP1R2_EL2);
+WRITE_SYSREG(d->v3.apr0[2], ICH_AP0R2_EL2);
+WRITE_SYSREG(d->v3.apr1[2], ICH_AP1R2_EL2);
  /* Fall through */
  case 6:
-WRITE_SYSREG32(d->v3.apr0[1], ICH_AP0R1_EL2);
-WRITE_SYSREG32(d->v3.apr1[1], ICH_AP1R1_EL2);
+WRITE_SYSREG(d->v3.apr0[1], ICH_AP0R1_EL2);
+WRITE_SYSREG(d->v3.apr1[1], ICH_AP1R1_EL2);
  /* Fall through */
  case 5:
-WRITE_SYSREG32(d->v3.apr0[0], ICH_AP0R0_EL2);
-WRITE_SYSREG32(d->v3.apr1[0], ICH_AP1R0_EL2);
+WRITE_SYSREG(d->v3.apr0[0], ICH_AP0R0_EL2);
+WRITE_SYSREG(d->v3.apr1[0], ICH_AP1R0_EL2);
  break;
  default:
  BUG();
@@ -338,16 +338,16 @@ static void save_aprn_regs(union gic_state_data *d)
  switch ( gicv3.nr_priorities )
  {
  case 7:
-d->v3.apr0[2] = READ_SYSREG32(ICH_AP0R2_EL2);
-d->v3.apr1[2] = READ_SYSREG32(ICH_AP1R2_EL2);
+d->v3.apr0[2] = READ_SYSREG(ICH_AP0R2_EL2);
+d->v3.apr1[2] = READ_SYSREG(ICH_AP1R2_EL2);
  /* Fall through */
  case 6:
-d->v3.apr0[1] = READ_SYSREG32(ICH_AP0R1_EL2);
-d->v3.apr1[1] = READ_SYSREG32(ICH_AP1R1_EL2);
+d->v3.apr0[1] = READ_SYSREG(ICH_AP0R1_EL2);
+d->v3.apr1[1] = READ_SYSREG(ICH_AP1R1_EL2);
  /* Fall through */
  case 5:
-d->v3.apr0[0] = READ_SYSREG32(ICH_AP0R0_EL2);
-d->v3.apr1[0] = READ_SYSREG32(ICH_AP1R0_EL2);
+d->v3.apr0[0] = READ_SYSREG(ICH_AP0R0_EL2);
+d->v3.apr1[0] = READ_SYSREG(ICH_AP1R0_EL2);
  break;
  default:
  BUG();
@@ -371,15 +371,15 @@ static void gicv3_save_state(struct vcpu *v)
  dsb(sy);
  gicv3_save_lrs(v);
  save_aprn_regs(>arch.gic);
-v->arch.gic.v3.vmcr = READ_SYSREG32(ICH_VMCR_EL2);
-v->arch.gic.v3.sre_el1 = READ_SYSREG32(ICC_SRE_EL1);
+v->arch.gic.v3.vmcr = READ_SYSREG(ICH_VMCR_EL2);
+v->arch.gic.v3.sre_el1 = READ_SYSREG(ICC_SRE_EL1);
  }
  
  static void gicv3_restore_state(const struct vcpu *v)

  {
-uint32_t val;
+register_t val;
  
-val = READ_SYSREG32(ICC_SRE_EL2);

+val = READ_SYSREG(ICC_SRE_EL2);
  /*
   * Don't give access to system registers when the guest is using
   * GICv2
@@ -388,7 +388,7 @@ static void gicv3_restore_state(const struct vcpu *v)
  val &= ~GICC_SRE_EL2_ENEL1;
  else
  val |= GICC_SRE_EL2_ENEL1;
-WRITE_SYSREG32(val, ICC_SRE_EL2);
+WRITE_SYSREG(val, ICC_SRE_EL2);
  
  /*

   * VFIQEn is RES1 if ICC_SRE_EL1.SRE is 1. This causes a Group0
@@ -398,9 +398,9 @@ static void gicv3_restore_state(const struct vcpu *v)
   * want before starting to mess with the rest of the GIC, and
   * VMCR_EL1 in particular.
   */
-WRITE_SYSREG32(v->arch.gic.v3.sre_el1,

Re: [PATCH 2/9] arm/domain: Get rid of READ/WRITE_SYSREG32

2021-04-20 Thread Julien Grall


Hi Michal,

On 20/04/2021 08:08, Michal Orzel wrote:

AArch64 system registers are 64bit whereas AArch32 ones
are 32bit or 64bit. MSR/MRS are expecting 64bit values thus
we should get rid of helpers READ/WRITE_SYSREG32
in favour of using READ/WRITE_SYSREG.
We should also use register_t type when reading sysregs
which can correspond to uint64_t or uint32_t.
Even though many AArch64 sysregs have upper 32bit reserved
it does not mean that they can't be widen in the future.

Modify type of registers: actlr, cntkctl to register_t.


ACTLR is implementation defined, so in theory there might already bits 
already defined in the range [32:63]. So I would consider to split it 
from the patch so we can backport it.




Signed-off-by: Michal Orzel 
---
  xen/arch/arm/domain.c| 20 ++--
  xen/include/asm-arm/domain.h |  4 ++--
  2 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
index bdd3d3e5b5..c021a03c61 100644
--- a/xen/arch/arm/domain.c
+++ b/xen/arch/arm/domain.c
@@ -113,13 +113,13 @@ static void ctxt_switch_from(struct vcpu *p)
  p->arch.tpidr_el1 = READ_SYSREG(TPIDR_EL1);
  
  /* Arch timer */

-p->arch.cntkctl = READ_SYSREG32(CNTKCTL_EL1);
+p->arch.cntkctl = READ_SYSREG(CNTKCTL_EL1);
  virt_timer_save(p);
  
  if ( is_32bit_domain(p->domain) && cpu_has_thumbee )

  {
-p->arch.teecr = READ_SYSREG32(TEECR32_EL1);
-p->arch.teehbr = READ_SYSREG32(TEEHBR32_EL1);
+p->arch.teecr = READ_SYSREG(TEECR32_EL1);
+p->arch.teehbr = READ_SYSREG(TEEHBR32_EL1);


It feels strange you converted cntkctl and actlr to use register_t but 
not teecr and teehbr. Can you explain why?


Cheers,

--
Julien Grall

Re: [PATCH 1/9] arm64/vfp: Get rid of READ/WRITE_SYSREG32

2021-04-20 Thread Julien Grall


Hi Michal,

On 20/04/2021 08:08, Michal Orzel wrote:

AArch64 system registers are 64bit whereas AArch32 ones
are 32bit or 64bit. MSR/MRS are expecting 64bit values thus
we should get rid of helpers READ/WRITE_SYSREG32
in favour of using READ/WRITE_SYSREG.
We should also use register_t type when reading sysregs
which can correspond to uint64_t or uint32_t.
Even though many AArch64 sysregs have upper 32bit reserved
it does not mean that they can't be widen in the future.

Modify type of FPCR, FPSR, FPEXC32_EL2 to register_t.

Signed-off-by: Michal Orzel 


Reviewed-by: Julien Grall 

Cheers,

--
Julien Grall

Re: [PATCH v2] VT-d: Don't assume register-based invalidation is always supported

2021-04-20 Thread Julien Grall





On 20/04/2021 13:50, Jan Beulich wrote:

Alternatively, you could be a bit more
verbose in your request so the other understand the reasoning behind it.


Well, yes, perhaps. But then there's the desire to not repeat oneself
all the time.


Most likely, the time you try to save not expanding your thought are 
going to be lost when the contributor will come back asking why you are 
requesting it. ;)


Cheers,

--
Julien Grall

Re: [PATCH v2] VT-d: Don't assume register-based invalidation is always supported

2021-04-20 Thread Jan Beulich

On 20.04.2021 14:39, Julien Grall wrote:
> On 20/04/2021 13:25, Jan Beulich wrote:
>> On 20.04.2021 14:14, Julien Grall wrote:
>>> It is not really my area of expertise but I wanted to jump on one
>>> comment below...
>>>
>>> On 20/04/2021 12:38, Jan Beulich wrote:
 On 01.04.2020 22:06, Chao Gao wrote:
> ---
> Changes in v2:
>- verify system suspension and resumption with this patch applied
>- don't fall back to register-based interface if enabling qinval 
> failed.
>  see the change in init_vtd_hw().
>- remove unnecessary "queued_inval_supported" variable
>- constify the "struct vtd_iommu *" of 
> has_register_based_invalidation()
>- coding-style changes

 ... while this suggests this is v2 of a recently sent patch, the
 submission is dated a little over a year ago. This is confusing.
 It is additionally confusing that there were two copies of it in
 my inbox, despite mails coming from a list normally getting
 de-duplicated somewhere at our end (I believe).

> --- a/xen/drivers/passthrough/vtd/iommu.c
> +++ b/xen/drivers/passthrough/vtd/iommu.c
> @@ -1193,6 +1193,14 @@ int __init iommu_alloc(struct acpi_drhd_unit *drhd)
>
>iommu->cap = dmar_readq(iommu->reg, DMAR_CAP_REG);
>iommu->ecap = dmar_readq(iommu->reg, DMAR_ECAP_REG);
> +iommu->version = dmar_readl(iommu->reg, DMAR_VER_REG);
> +
> +if ( !iommu_qinval && !has_register_based_invalidation(iommu) )
> +{
> +printk(XENLOG_WARNING VTDPREFIX "IOMMU %d: cannot disable Queued 
> Invalidation.\n",
> +   iommu->index);

 Here (and at least once more yet further down): We don't normally end
 log messages with a full stop. Easily addressable while committing, of
 course.
>>>
>>> I can find a large number of cases where log messages are ended with a
>>> full stop... Actually it looks more natural to me than your suggestion.
>>
>> Interesting. From purely a language perspective it indeed looks more
>> natural, I agree. But when considering (serial) console bandwidth, we
>> ought to try to eliminate _any_ output that's there without conveying
>> information or making the conveyed information understandable. In fact
>> I recall a number of cases (without having commit hashes to hand)
>> where we deliberately dropped full stops. (The messages here aren't at
>> risk of causing bandwidth issues, but as with any such generic item I
>> think the goal ought to be consistency, and hence either full stops
>> everywhere, or nowhere. If bandwidth was an issue here, I might also
>> have suggested to shorten "Queued Invalidation" to just "QI".)
> I wasn't aware of such requirement in Xen... Although, I can see how 
> this can be a concern. If you really want to enforce it, then it should 
> be written in the CODING_STYLE.

Agreed, but since I've had no success with prior adjustments to that
file (not even worth a reply to tell me why a change might be a bad
one, in at least some of the cases), I'm afraid I've given up making
attempts to get adjustments into there.

> Alternatively, you could be a bit more 
> verbose in your request so the other understand the reasoning behind it.

Well, yes, perhaps. But then there's the desire to not repeat oneself
all the time.

Jan

Re: [PATCH v2 2/2] x86/cpuid: support LFENCE always serializing CPUID bit

2021-04-20 Thread Andrew Cooper

On 15/04/2021 15:47, Roger Pau Monne wrote:
> AMD Milan (Zen3) CPUs have an LFENCE Always Serializing CPUID bit in
> leaf 8021.eax. Previous AMD versions used to have a user settable
> bit in DE_CFG MSR to select whether LFENCE was dispatch serializing,
> which Xen always attempts to set. The forcefully always on setting is
> due to the addition of SEV-SNP so that a VMM cannot break the
> confidentiality of a guest.
>
> In order to support this new CPUID bit move the LFENCE_DISPATCH
> synthetic CPUID bit to map the hardware bit (leaving a hole in the
> synthetic range) and either rely on the bit already being set by the
> native CPUID output, or attempt to fake it in Xen by modifying the
> DE_CFG MSR. This requires adding one more entry to the featureset to
> support leaf 8021.eax.
>
> The bit is exposed to guests by default if the underlying hardware
> supports leaf 8021, as a way to signal that LFENCE is always
> serializing. Hardware that doesn't have the leaf might also get the
> bit set because Xen has performed the necessary arrangements, but
> that's not yet exposed to guests. Note that Xen doesn't allow guests
> to change the DE_CFG value, so once set by Xen LFENCE will always be
> serializing.
>
> Note that the access to DE_CFG by guests is left as-is: reads will
> unconditionally return LFENCE_SERIALISE bit set, while writes are
> silently dropped.
>
> Suggested-by: Andrew Cooper 
> Signed-off-by: Roger Pau Monné 
> ---
> Changes since v1:
>  - Rename to lfence+.
>  - Add feature to libxl_cpuid.c.
>  - Reword commit message.
> ---
> Note this doesn't yet expose the bit on hardware that doesn't support
> leaf 8021. It's still TBD whether we want to hardcode this support
> manually, or instead rely on a more general approach like the one
> suggested by the shrink max CPUID leaf patch from Jan.

I'm going to insist on using the manual approach.  Upping max leaf is
strictly opposite to shrinking logic.

It's very rare that we'll want to extend max leaf beyond what hardware
supports, and it wants calling out clearly, along with identifying why
it is safe to do so in this specific case.

It is not safe or sensible to blindly escalate to the compile time max. 
The only cases where the differ are bugs needing fixing - the manual
approach has the special case clearly called out, while the blindly
escalate case has the bug hidden in derivation logic somewhere else.

~Andrew

Re: [PATCH] xen/arm: smmuv1: Revert associating the group pointer with the S2CR

2021-04-20 Thread Julien Grall


Hi,

On 19/04/2021 17:21, Stefano Stabellini wrote:

On Mon, 19 Apr 2021, Rahul Singh wrote:

Hi Julien,


On 18 Apr 2021, at 6:48 pm, Julien Grall  wrote:



On 16/04/2021 17:41, Rahul Singh wrote:

Hi Julien


Hi Rahul,


On 16 Apr 2021, at 5:08 pm, Julien Grall  wrote:



On 16/04/2021 17:05, Rahul Singh wrote:

On 16 Apr 2021, at 4:23 pm, Julien Grall  wrote:



On 16/04/2021 16:01, Rahul Singh wrote:

Hi Julien,


Hi Rahul,


On 16 Apr 2021, at 3:35 pm, Julien Grall  wrote:

Hi,

On 16/04/2021 12:25, Rahul Singh wrote:

Revert the code that associates the group pointer with the S2CR as this
code causing an issue when the SMMU device has more than one master
device.


It is not clear to me why this change was first added. Are we missing any 
feature when reverting it?

This feature was added when we backported the code from Linux to fix the stream 
match conflict issue
as part of commit "xen/arm: smmuv1: Intelligent SMR allocation”.
This is an extra feature added to allocate IOMMU group based on stream-id. If 
two device has the
same stream-id then we assign those devices to the same group.


If we revert the patch, then it would not be possible to use the SMMU if two 
devices use the same stream-id. Is that correct?

No. If we revert the patch we can use the SMMU if two devices use the same 
stream-id without any issue but each device will be in a separate group.This is 
same behaviour before the code is merged.


Ok. So there is no change in behavior. Good. Can you propose a commit message 
clarifying that?

Please have a look if it make sense.
xen/arm: smmuv1: Revert associating the group pointer with the S2CR
Revert the code that associates the group pointer with the S2CR as this
code causing an issue when the SMMU device has more than one master
device with same stream-id. This issue is introduced by the below commit:
“0435784cc75dcfef3b5f59c29deb1dbb84265ddb:xen/arm: smmuv1: Intelligent SMR 
allocation”
  Reverting the code will not impact to use of SMMU if two devices use the
same stream-id but each device will be in a separate group. This is the same
behaviour before the code is merged.


Look good to me. Is this patch to be applied on top of Stefano's series? If 
not, is there going to be more clash?



As per Stefano's mail he already tested his patch series on top of this patch. 
I think this patch has to merged before Stefano’s patch series
Let Stefano also confirm that.

I think there will be no more clashes.


Yes, this patch is to be committed *before* my series and I have already
tested this patch alone and with my series on top. Both cases work fine.


Cool. Thanks for the confirmation. I have committed the patch with the 
new commit message (although, I tweaked a little bit to use the 
abbreviated version of the commit ID).


Cheers,

--
Julien Grall

Re: [PATCH v2] VT-d: Don't assume register-based invalidation is always supported

2021-04-20 Thread Julien Grall


Hi,

On 20/04/2021 13:25, Jan Beulich wrote:

On 20.04.2021 14:14, Julien Grall wrote:

Hi,

It is not really my area of expertise but I wanted to jump on one
comment below...

On 20/04/2021 12:38, Jan Beulich wrote:

On 01.04.2020 22:06, Chao Gao wrote:

---
Changes in v2:
   - verify system suspension and resumption with this patch applied
   - don't fall back to register-based interface if enabling qinval failed.
 see the change in init_vtd_hw().
   - remove unnecessary "queued_inval_supported" variable
   - constify the "struct vtd_iommu *" of has_register_based_invalidation()
   - coding-style changes


... while this suggests this is v2 of a recently sent patch, the
submission is dated a little over a year ago. This is confusing.
It is additionally confusing that there were two copies of it in
my inbox, despite mails coming from a list normally getting
de-duplicated somewhere at our end (I believe).


--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -1193,6 +1193,14 @@ int __init iommu_alloc(struct acpi_drhd_unit *drhd)
   
   iommu->cap = dmar_readq(iommu->reg, DMAR_CAP_REG);

   iommu->ecap = dmar_readq(iommu->reg, DMAR_ECAP_REG);
+iommu->version = dmar_readl(iommu->reg, DMAR_VER_REG);
+
+if ( !iommu_qinval && !has_register_based_invalidation(iommu) )
+{
+printk(XENLOG_WARNING VTDPREFIX "IOMMU %d: cannot disable Queued 
Invalidation.\n",
+   iommu->index);


Here (and at least once more yet further down): We don't normally end
log messages with a full stop. Easily addressable while committing, of
course.


I can find a large number of cases where log messages are ended with a
full stop... Actually it looks more natural to me than your suggestion.


Interesting. From purely a language perspective it indeed looks more
natural, I agree. But when considering (serial) console bandwidth, we
ought to try to eliminate _any_ output that's there without conveying
information or making the conveyed information understandable. In fact
I recall a number of cases (without having commit hashes to hand)
where we deliberately dropped full stops. (The messages here aren't at
risk of causing bandwidth issues, but as with any such generic item I
think the goal ought to be consistency, and hence either full stops
everywhere, or nowhere. If bandwidth was an issue here, I might also
have suggested to shorten "Queued Invalidation" to just "QI".)
I wasn't aware of such requirement in Xen... Although, I can see how 
this can be a concern. If you really want to enforce it, then it should 
be written in the CODING_STYLE. Alternatively, you could be a bit more 
verbose in your request so the other understand the reasoning behind it.




Also, when you say "large number" - did you compare to the number of
cases where there are no full stops? (I sincerely hope we don't have
that many full stops left.)


42sh> ack "printk\(\"" | ack "\..n\"" | wc -l
130
42sh> ack "printk\(\"" | wc -l
1708

So ~7.5% of the printks.

--
Julien Grall

Re: [PATCH v2] VT-d: Don't assume register-based invalidation is always supported

2021-04-20 Thread Chao Gao

On Tue, Apr 20, 2021 at 01:38:26PM +0200, Jan Beulich wrote:
>On 01.04.2020 22:06, Chao Gao wrote:
>> According to Intel VT-d SPEC rev3.3 Section 6.5, Register-based Invalidation
>> isn't supported by Intel VT-d version 6 and beyond.
>> 
>> This hardware change impacts following two scenarios: admin can disable
>> queued invalidation via 'qinval' cmdline and use register-based interface;
>> VT-d switches to register-based invalidation when queued invalidation needs
>> to be disabled, for example, during disabling x2apic or during system
>> suspension or after enabling queued invalidation fails.
>> 
>> To deal with this hardware change, if register-based invalidation isn't
>> supported, queued invalidation cannot be disabled through Xen cmdline; and
>> if queued invalidation has to be disabled temporarily in some scenarios,
>> VT-d won't switch to register-based interface but use some dummy functions
>> to catch errors in case there is any invalidation request issued when queued
>> invalidation is disabled.
>> 
>> Signed-off-by: Chao Gao 
>
>In principle (with a minor nit further down)
>Reviewed-by: Jan Beulich 
>
>However, ...
>
>> ---
>> Changes in v2:
>>  - verify system suspension and resumption with this patch applied
>>  - don't fall back to register-based interface if enabling qinval failed.
>>see the change in init_vtd_hw().
>>  - remove unnecessary "queued_inval_supported" variable
>>  - constify the "struct vtd_iommu *" of has_register_based_invalidation()
>>  - coding-style changes
>
>... while this suggests this is v2 of a recently sent patch, the
>submission is dated a little over a year ago. This is confusing.
>It is additionally confusing that there were two copies of it in
>my inbox, despite mails coming from a list normally getting
>de-duplicated somewhere at our end (I believe).

You are right. I messed the system time of my server somehow. Sorry for this.
If it is possible, please help to update the date of this patch also.

>
>> --- a/xen/drivers/passthrough/vtd/iommu.c
>> +++ b/xen/drivers/passthrough/vtd/iommu.c
>> @@ -1193,6 +1193,14 @@ int __init iommu_alloc(struct acpi_drhd_unit *drhd)
>>  
>>  iommu->cap = dmar_readq(iommu->reg, DMAR_CAP_REG);
>>  iommu->ecap = dmar_readq(iommu->reg, DMAR_ECAP_REG);
>> +iommu->version = dmar_readl(iommu->reg, DMAR_VER_REG);
>> +
>> +if ( !iommu_qinval && !has_register_based_invalidation(iommu) )
>> +{
>> +printk(XENLOG_WARNING VTDPREFIX "IOMMU %d: cannot disable Queued 
>> Invalidation.\n",
>> +   iommu->index);
>
>Here (and at least once more yet further down): We don't normally end
>log messages with a full stop. Easily addressable while committing, of
>course.

Okay. Please go ahead.

Thanks
Chao

[ovmf test] 161312: all pass - PUSHED

2021-04-20 Thread osstest service owner

flight 161312 ovmf real [real]
http://logs.test-lab.xenproject.org/osstest/logs/161312/

Perfect :-)
All tests in this flight passed as required
version targeted for testing:
 ovmf 0bbc20727598421c4e47d46b982246217df8c6bc
baseline version:
 ovmf 64138c95db5a7a3e4768d8a01ba71dc3475e6524

Last test of basis   161301  2021-04-19 21:40:07 Z0 days
Testing same since   161312  2021-04-20 04:56:06 Z0 days1 attempts


People who touched revisions under test:
  Jason 
  Jason Lou 
  Kun Qin 

jobs:
 build-amd64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 pass
 test-amd64-i386-xl-qemuu-ovmf-amd64  pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/osstest/ovmf.git
   64138c95db..0bbc207275  0bbc20727598421c4e47d46b982246217df8c6bc -> 
xen-tested-master

Re: [PATCH v9 10/13] x86/smpboot: switch clone_mapping() to new APIs

2021-04-20 Thread Jan Beulich

On 06.04.2021 13:05, Hongyan Xia wrote:
> @@ -742,51 +742,58 @@ static int clone_mapping(const void *ptr, 
> root_pgentry_t *rpt)
>  }
>  }
>  
> +UNMAP_DOMAIN_PAGE(pl1e);
> +UNMAP_DOMAIN_PAGE(pl2e);
> +UNMAP_DOMAIN_PAGE(pl3e);

Just one minor remark: A pedantic(?) compiler might warn about the
setting to NULL of pl3e here, when

>  if ( !(root_get_flags(rpt[root_table_offset(linear)]) & _PAGE_PRESENT) )
>  {
> -pl3e = alloc_xen_pagetable();
> +mfn_t l3mfn;
> +
> +pl3e = alloc_map_clear_xen_pt();
>  rc = -ENOMEM;
>  if ( !pl3e )
>  goto out;
> -clear_page(pl3e);
>  l4e_write([root_table_offset(linear)],
> -  l4e_from_paddr(__pa(pl3e), __PAGE_HYPERVISOR));
> +  l4e_from_mfn(l3mfn, __PAGE_HYPERVISOR));
>  }
>  else
> -pl3e = l4e_to_l3e(rpt[root_table_offset(linear)]);
> +pl3e = map_l3t_from_l4e(rpt[root_table_offset(linear)]);

... it is guaranteed to get initialized again before any further
consumption. IOW strictly speaking the last of those three would
want to be unmap_domain_page(), just like you have ...

> @@ -802,6 +809,9 @@ static int clone_mapping(const void *ptr, root_pgentry_t 
> *rpt)
>  
>  rc = 0;
>   out:
> +unmap_domain_page(pl1e);
> +unmap_domain_page(pl2e);
> +unmap_domain_page(pl3e);
>  return rc;
>  }

... here.

Jan

Re: [PATCH v9 09/13] x86/smpboot: add exit path for clone_mapping()

2021-04-20 Thread Jan Beulich

On 06.04.2021 13:05, Hongyan Xia wrote:
> From: Wei Liu 
> 
> We will soon need to clean up page table mappings in the exit path.
> 
> No functional change.
> 
> Signed-off-by: Wei Liu 
> Signed-off-by: Hongyan Xia 

Acked-by: Jan Beulich

Re: [PATCH v2] VT-d: Don't assume register-based invalidation is always supported

2021-04-20 Thread Jan Beulich

On 20.04.2021 14:14, Julien Grall wrote:
> Hi,
> 
> It is not really my area of expertise but I wanted to jump on one 
> comment below...
> 
> On 20/04/2021 12:38, Jan Beulich wrote:
>> On 01.04.2020 22:06, Chao Gao wrote:
>>> ---
>>> Changes in v2:
>>>   - verify system suspension and resumption with this patch applied
>>>   - don't fall back to register-based interface if enabling qinval failed.
>>> see the change in init_vtd_hw().
>>>   - remove unnecessary "queued_inval_supported" variable
>>>   - constify the "struct vtd_iommu *" of has_register_based_invalidation()
>>>   - coding-style changes
>>
>> ... while this suggests this is v2 of a recently sent patch, the
>> submission is dated a little over a year ago. This is confusing.
>> It is additionally confusing that there were two copies of it in
>> my inbox, despite mails coming from a list normally getting
>> de-duplicated somewhere at our end (I believe).
>>
>>> --- a/xen/drivers/passthrough/vtd/iommu.c
>>> +++ b/xen/drivers/passthrough/vtd/iommu.c
>>> @@ -1193,6 +1193,14 @@ int __init iommu_alloc(struct acpi_drhd_unit *drhd)
>>>   
>>>   iommu->cap = dmar_readq(iommu->reg, DMAR_CAP_REG);
>>>   iommu->ecap = dmar_readq(iommu->reg, DMAR_ECAP_REG);
>>> +iommu->version = dmar_readl(iommu->reg, DMAR_VER_REG);
>>> +
>>> +if ( !iommu_qinval && !has_register_based_invalidation(iommu) )
>>> +{
>>> +printk(XENLOG_WARNING VTDPREFIX "IOMMU %d: cannot disable Queued 
>>> Invalidation.\n",
>>> +   iommu->index);
>>
>> Here (and at least once more yet further down): We don't normally end
>> log messages with a full stop. Easily addressable while committing, of
>> course.
> 
> I can find a large number of cases where log messages are ended with a 
> full stop... Actually it looks more natural to me than your suggestion.

Interesting. From purely a language perspective it indeed looks more
natural, I agree. But when considering (serial) console bandwidth, we
ought to try to eliminate _any_ output that's there without conveying
information or making the conveyed information understandable. In fact
I recall a number of cases (without having commit hashes to hand)
where we deliberately dropped full stops. (The messages here aren't at
risk of causing bandwidth issues, but as with any such generic item I
think the goal ought to be consistency, and hence either full stops
everywhere, or nowhere. If bandwidth was an issue here, I might also
have suggested to shorten "Queued Invalidation" to just "QI".)

Also, when you say "large number" - did you compare to the number of
cases where there are no full stops? (I sincerely hope we don't have
that many full stops left.)

Jan

Re: [PATCH v9 01/13] x86/mm: rewrite virt_to_xen_l*e

2021-04-20 Thread Jan Beulich

On 06.04.2021 13:05, Hongyan Xia wrote:
> From: Wei Liu 
> 
> Rewrite those functions to use the new APIs. Modify its callers to unmap
> the pointer returned. Since alloc_xen_pagetable_new() is almost never
> useful unless accompanied by page clearing and a mapping, introduce a
> helper alloc_map_clear_xen_pt() for this sequence.
> 
> Signed-off-by: Wei Liu 
> Signed-off-by: Hongyan Xia 
> 
> ---
> Changed in v9:
> - use domain_page_map_to_mfn() around the L3 table locking logic.
> - remove vmap_to_mfn() changes since we now use xen_map_to_mfn().
> 
> Changed in v8:
> - s/virtual address/linear address/.
> - BUG_ON() on NULL return in vmap_to_mfn().
> 
> Changed in v7:
> - remove a comment.
> - use l1e_get_mfn() instead of converting things back and forth.
> - add alloc_map_clear_xen_pt().

I realize this was in v7 already, but at v6 time the name I suggested
was

void *alloc_mapped_pagetable(mfn_t *mfn);

"alloc_map_clear_xen", for my taste at least, is too strange. It
doesn't really matter whether it's a page table for Xen's own use
(it typically will be), so "xen" could be dropped. Clearing of a
page table ought to also be the rule rather than the exception, so
I'd see "clear" also dropped. I'd be fine with alloc_map_pt() or
about any intermediate variant between that and my originally
suggested name.

> @@ -5108,7 +5140,8 @@ int map_pages_to_xen(
>  unsigned int flags)
>  {
>  bool locking = system_state > SYS_STATE_boot;
> -l2_pgentry_t *pl2e, ol2e;
> +l3_pgentry_t *pl3e = NULL, ol3e;
> +l2_pgentry_t *pl2e = NULL, ol2e;
>  l1_pgentry_t *pl1e, ol1e;
>  unsigned int  i;
>  int rc = -ENOMEM;
> @@ -5132,15 +5165,16 @@ int map_pages_to_xen(
>  
>  while ( nr_mfns != 0 )
>  {
> -l3_pgentry_t *pl3e, ol3e;
> -
> +/* Clean up the previous iteration. */
>  L3T_UNLOCK(current_l3page);
> +UNMAP_DOMAIN_PAGE(pl3e);
> +UNMAP_DOMAIN_PAGE(pl2e);

Doing this here suggests the lower-case equivalent is needed at the
out label, even without looking at the rest of the function (doing
so confirms the suspicion, as there's at least one "goto out" with
pl2e clearly still mapped).

> @@ -5305,6 +5339,8 @@ int map_pages_to_xen(
>  pl1e = virt_to_xen_l1e(virt);
>  if ( pl1e == NULL )
>  goto out;
> +
> +UNMAP_DOMAIN_PAGE(pl1e);
>  }

Unmapping the page right after mapping it looks suspicious. I see that
further down we have

pl1e  = l2e_to_l1e(*pl2e) + l1_table_offset(virt);

but don't you need to also change that? Actually, you do in patch 2,
but the latest then the double mapping should imo be avoided.

> @@ -5505,6 +5542,7 @@ int populate_pt_range(unsigned long virt, unsigned long 
> nr_mfns)
>  int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
>  {
>  bool locking = system_state > SYS_STATE_boot;
> +l3_pgentry_t *pl3e = NULL;
>  l2_pgentry_t *pl2e;
>  l1_pgentry_t *pl1e;
>  unsigned int  i;

And here we have the opposite situation: You don't set pl2e to NULL
and the function only uses l3e_to_l2e() to initialize the variable,
yet ...

> @@ -5761,6 +5799,8 @@ int modify_xen_mappings(unsigned long s, unsigned long 
> e, unsigned int nf)
>  
>   out:
>  L3T_UNLOCK(current_l3page);
> +unmap_domain_page(pl2e);
> +unmap_domain_page(pl3e);

... here you try to unmap it. Did the two respective hunks somehow
magically get swapped?

> --- a/xen/common/vmap.c
> +++ b/xen/common/vmap.c
> @@ -1,6 +1,7 @@
>  #ifdef VMAP_VIRT_START
>  #include 
>  #include 
> +#include 

Why is this needed? (Looks like a now stale change.)

Jan

Re: [PATCH v4] xen/arm64: Place a speculation barrier following an ret instruction

2021-04-20 Thread Julien Grall


Hi,

On 19/04/2021 19:24, Stefano Stabellini wrote:

On Mon, 19 Apr 2021, Bertrand Marquis wrote:

Hi Julien,


On 18 Apr 2021, at 19:03, Julien Grall  wrote:

From: Julien Grall 

Some CPUs can speculate past a RET instruction and potentially perform
speculative accesses to memory before processing the return.

There is no known gadget available after the RET instruction today.
However some of the registers (such as in check_pending_guest_serror())
may contain a value provided by the guest.

In order to harden the code, it would be better to add a speculation
barrier after each RET instruction. The performance impact is meant to
be negligeable as the speculation barrier is not meant to be
architecturally executed.

Rather than manually inserting a speculation barrier, use a macro
which overrides the mnemonic RET and replace with RET + SB. We need to
use the opcode for RET to prevent any macro recursion.

This patch is only covering the assembly code. C code would need to be
covered separately using the compiler support.

Note that the definition of the macros sb needs to be moved earlier in
asm-arm/macros.h so it can be used by the new macro.

This is part of the work to mitigate straight-line speculation.

Signed-off-by: Julien Grall 

Reviewed-by: Bertrand Marquis 


Acked-by: Stefano Stabellini 


Thanks both! I have committed the patch.

Cheers,

--
Julien Grall

Re: [PATCH v2] VT-d: Don't assume register-based invalidation is always supported

2021-04-20 Thread Julien Grall


Hi,

It is not really my area of expertise but I wanted to jump on one 
comment below...


On 20/04/2021 12:38, Jan Beulich wrote:

On 01.04.2020 22:06, Chao Gao wrote:

---
Changes in v2:
  - verify system suspension and resumption with this patch applied
  - don't fall back to register-based interface if enabling qinval failed.
see the change in init_vtd_hw().
  - remove unnecessary "queued_inval_supported" variable
  - constify the "struct vtd_iommu *" of has_register_based_invalidation()
  - coding-style changes


... while this suggests this is v2 of a recently sent patch, the
submission is dated a little over a year ago. This is confusing.
It is additionally confusing that there were two copies of it in
my inbox, despite mails coming from a list normally getting
de-duplicated somewhere at our end (I believe).


--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -1193,6 +1193,14 @@ int __init iommu_alloc(struct acpi_drhd_unit *drhd)
  
  iommu->cap = dmar_readq(iommu->reg, DMAR_CAP_REG);

  iommu->ecap = dmar_readq(iommu->reg, DMAR_ECAP_REG);
+iommu->version = dmar_readl(iommu->reg, DMAR_VER_REG);
+
+if ( !iommu_qinval && !has_register_based_invalidation(iommu) )
+{
+printk(XENLOG_WARNING VTDPREFIX "IOMMU %d: cannot disable Queued 
Invalidation.\n",
+   iommu->index);


Here (and at least once more yet further down): We don't normally end
log messages with a full stop. Easily addressable while committing, of
course.


I can find a large number of cases where log messages are ended with a 
full stop... Actually it looks more natural to me than your suggestion.


Cheers,

--
Julien Grall

Re: [PATCH v2] VT-d: Don't assume register-based invalidation is always supported

2021-04-20 Thread Jan Beulich

On 01.04.2020 22:06, Chao Gao wrote:
> According to Intel VT-d SPEC rev3.3 Section 6.5, Register-based Invalidation
> isn't supported by Intel VT-d version 6 and beyond.
> 
> This hardware change impacts following two scenarios: admin can disable
> queued invalidation via 'qinval' cmdline and use register-based interface;
> VT-d switches to register-based invalidation when queued invalidation needs
> to be disabled, for example, during disabling x2apic or during system
> suspension or after enabling queued invalidation fails.
> 
> To deal with this hardware change, if register-based invalidation isn't
> supported, queued invalidation cannot be disabled through Xen cmdline; and
> if queued invalidation has to be disabled temporarily in some scenarios,
> VT-d won't switch to register-based interface but use some dummy functions
> to catch errors in case there is any invalidation request issued when queued
> invalidation is disabled.
> 
> Signed-off-by: Chao Gao 

In principle (with a minor nit further down)
Reviewed-by: Jan Beulich 

However, ...

> ---
> Changes in v2:
>  - verify system suspension and resumption with this patch applied
>  - don't fall back to register-based interface if enabling qinval failed.
>see the change in init_vtd_hw().
>  - remove unnecessary "queued_inval_supported" variable
>  - constify the "struct vtd_iommu *" of has_register_based_invalidation()
>  - coding-style changes

... while this suggests this is v2 of a recently sent patch, the
submission is dated a little over a year ago. This is confusing.
It is additionally confusing that there were two copies of it in
my inbox, despite mails coming from a list normally getting
de-duplicated somewhere at our end (I believe).

> --- a/xen/drivers/passthrough/vtd/iommu.c
> +++ b/xen/drivers/passthrough/vtd/iommu.c
> @@ -1193,6 +1193,14 @@ int __init iommu_alloc(struct acpi_drhd_unit *drhd)
>  
>  iommu->cap = dmar_readq(iommu->reg, DMAR_CAP_REG);
>  iommu->ecap = dmar_readq(iommu->reg, DMAR_ECAP_REG);
> +iommu->version = dmar_readl(iommu->reg, DMAR_VER_REG);
> +
> +if ( !iommu_qinval && !has_register_based_invalidation(iommu) )
> +{
> +printk(XENLOG_WARNING VTDPREFIX "IOMMU %d: cannot disable Queued 
> Invalidation.\n",
> +   iommu->index);

Here (and at least once more yet further down): We don't normally end
log messages with a full stop. Easily addressable while committing, of
course.

Jan

Re: [PATCH v2 2/2] x86/cpuid: support LFENCE always serializing CPUID bit

2021-04-20 Thread Jan Beulich

On 20.04.2021 12:47, Roger Pau Monné wrote:
> On Tue, Apr 20, 2021 at 12:35:54PM +0200, Jan Beulich wrote:
>> I'd like to give Andrew a day or two more to respond there in case he
>> continues to see an issue, before I commit that with your R-b and this
>> one here. I'll assume you'll subsequently take care of that missing
>> piece then - if not, i.e. if e.g. I should, please let me know.
> 
> I think it should be something like the above,

Right (assuming you meant "below).

> in fact I think it
> would be perfectly fine to merge that chunk into your patch?

I'd rather not, so that this change can have its own reasoning in its
description.

Jan

> diff --git a/xen/arch/x86/cpuid.c b/xen/arch/x86/cpuid.c
> index 050cd5713e2..daf501779fe 100644
> --- a/xen/arch/x86/cpuid.c
> +++ b/xen/arch/x86/cpuid.c
> @@ -314,12 +314,9 @@ static void __init calculate_host_policy(void)
>  
>  *p = raw_cpuid_policy;
>  
> -p->basic.max_leaf =
> -min_t(uint32_t, p->basic.max_leaf,   ARRAY_SIZE(p->basic.raw) - 1);
> -p->feat.max_subleaf =
> -min_t(uint32_t, p->feat.max_subleaf, ARRAY_SIZE(p->feat.raw) - 1);
> -p->extd.max_leaf = 0x8000 | min_t(uint32_t, p->extd.max_leaf & 
> 0x,
> -  ARRAY_SIZE(p->extd.raw) - 1);
> +p->basic.max_leaf = ARRAY_SIZE(p->basic.raw) - 1;
> +p->feat.max_subleaf = ARRAY_SIZE(p->feat.raw) - 1;
> +p->extd.max_leaf = 0x8000 | ARRAY_SIZE(p->extd.raw) - 1;
>  
>  cpuid_featureset_to_policy(boot_cpu_data.x86_capability, p);
>  recalculate_xstate(p);
>

[PATCH v2] VT-d: Don't assume register-based invalidation is always supported

2021-04-20 Thread Chao Gao

According to Intel VT-d SPEC rev3.3 Section 6.5, Register-based Invalidation
isn't supported by Intel VT-d version 6 and beyond.

This hardware change impacts following two scenarios: admin can disable
queued invalidation via 'qinval' cmdline and use register-based interface;
VT-d switches to register-based invalidation when queued invalidation needs
to be disabled, for example, during disabling x2apic or during system
suspension or after enabling queued invalidation fails.

To deal with this hardware change, if register-based invalidation isn't
supported, queued invalidation cannot be disabled through Xen cmdline; and
if queued invalidation has to be disabled temporarily in some scenarios,
VT-d won't switch to register-based interface but use some dummy functions
to catch errors in case there is any invalidation request issued when queued
invalidation is disabled.

Signed-off-by: Chao Gao 
---
Changes in v2:
 - verify system suspension and resumption with this patch applied
 - don't fall back to register-based interface if enabling qinval failed.
   see the change in init_vtd_hw().
 - remove unnecessary "queued_inval_supported" variable
 - constify the "struct vtd_iommu *" of has_register_based_invalidation()
 - coding-style changes
---
 docs/misc/xen-command-line.pandoc|  4 +++-
 xen/drivers/passthrough/vtd/iommu.c  | 27 +--
 xen/drivers/passthrough/vtd/iommu.h  |  7 ++
 xen/drivers/passthrough/vtd/qinval.c | 33 ++--
 4 files changed, 66 insertions(+), 5 deletions(-)

diff --git a/docs/misc/xen-command-line.pandoc 
b/docs/misc/xen-command-line.pandoc
index deef6d0b4c..4ff4a87844 100644
--- a/docs/misc/xen-command-line.pandoc
+++ b/docs/misc/xen-command-line.pandoc
@@ -1442,7 +1442,9 @@ The following options are specific to Intel VT-d hardware:
 *   The `qinval` boolean controls the Queued Invalidation sub-feature, and is
 active by default on compatible hardware.  Queued Invalidation is a
 feature in second-generation IOMMUs and is a functional prerequisite for
-Interrupt Remapping.
+Interrupt Remapping. Note that Xen disregards this setting for Intel VT-d
+version 6 and greater as Registered-Based Invalidation isn't supported
+by them.
 
 *   The `igfx` boolean is active by default, and controls whether the IOMMU in
 front of an Intel Graphics Device is enabled or not.
diff --git a/xen/drivers/passthrough/vtd/iommu.c 
b/xen/drivers/passthrough/vtd/iommu.c
index 6428c8fe3e..94d1372903 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -1193,6 +1193,14 @@ int __init iommu_alloc(struct acpi_drhd_unit *drhd)
 
 iommu->cap = dmar_readq(iommu->reg, DMAR_CAP_REG);
 iommu->ecap = dmar_readq(iommu->reg, DMAR_ECAP_REG);
+iommu->version = dmar_readl(iommu->reg, DMAR_VER_REG);
+
+if ( !iommu_qinval && !has_register_based_invalidation(iommu) )
+{
+printk(XENLOG_WARNING VTDPREFIX "IOMMU %d: cannot disable Queued 
Invalidation.\n",
+   iommu->index);
+iommu_qinval = true;
+}
 
 if ( iommu_verbose )
 {
@@ -2141,6 +2149,10 @@ static int __must_check init_vtd_hw(bool resume)
  */
 if ( enable_qinval(iommu) != 0 )
 {
+/* Ensure register-based invalidation is available */
+if ( !has_register_based_invalidation(iommu) )
+return -EIO;
+
 iommu->flush.context = vtd_flush_context_reg;
 iommu->flush.iotlb   = vtd_flush_iotlb_reg;
 }
@@ -2231,6 +2243,7 @@ static int __init vtd_setup(void)
 struct acpi_drhd_unit *drhd;
 struct vtd_iommu *iommu;
 int ret;
+bool reg_inval_supported = true;
 
 if ( list_empty(_drhd_units) )
 {
@@ -2252,8 +2265,8 @@ static int __init vtd_setup(void)
 }
 
 /* We enable the following features only if they are supported by all VT-d
- * engines: Snoop Control, DMA passthrough, Queued Invalidation, Interrupt
- * Remapping, and Posted Interrupt
+ * engines: Snoop Control, DMA passthrough, Register-based Invalidation,
+ * Queued Invalidation, Interrupt Remapping, and Posted Interrupt.
  */
 for_each_drhd_unit ( drhd )
 {
@@ -2275,6 +2288,9 @@ static int __init vtd_setup(void)
 if ( iommu_qinval && !ecap_queued_inval(iommu->ecap) )
 iommu_qinval = 0;
 
+if ( !has_register_based_invalidation(iommu) )
+reg_inval_supported = false;
+
 if ( iommu_intremap && !ecap_intr_remap(iommu->ecap) )
 iommu_intremap = iommu_intremap_off;
 
@@ -2301,6 +2317,13 @@ static int __init vtd_setup(void)
 
 softirq_tasklet_init(_fault_tasklet, do_iommu_page_fault, NULL);
 
+if ( !iommu_qinval && !reg_inval_supported )
+{
+dprintk(XENLOG_ERR VTDPREFIX, "No available invalidation 
interface.\n");
+ret = -ENODEV;
+goto error;
+}
+
 if ( !iommu_qinval && iommu_intremap )
 {

Re: [PATCH] xen-mapcache: avoid a race on memory map while using MAP_FIXED

2021-04-20 Thread Anthony PERARD

On Tue, Apr 20, 2021 at 10:51:47AM +0100, Igor Druzhinin wrote:
> On 20/04/2021 04:39, no-re...@patchew.org wrote:
> > === OUTPUT BEGIN ===
> > ERROR: Author email address is mangled by the mailing list
> > #2:
> > Author: Igor Druzhinin via 
> > 
> > total: 1 errors, 0 warnings, 21 lines checked
> > 
> 
> Anthony,
> 
> Is there any action required from me here? I don't completely understand how
> that happened. But I found this:

No action, I just need to remember to fix the patch's author before
submitting a pull request.

Citrix's policy is just too strict and don't even allow space changes in
some headers.

-- 
Anthony PERARD

Re: [PATCH v2 2/2] x86/cpuid: support LFENCE always serializing CPUID bit

2021-04-20 Thread Roger Pau Monné

On Tue, Apr 20, 2021 at 12:35:54PM +0200, Jan Beulich wrote:
> On 15.04.2021 16:47, Roger Pau Monne wrote:
> > AMD Milan (Zen3) CPUs have an LFENCE Always Serializing CPUID bit in
> > leaf 8021.eax. Previous AMD versions used to have a user settable
> > bit in DE_CFG MSR to select whether LFENCE was dispatch serializing,
> > which Xen always attempts to set. The forcefully always on setting is
> > due to the addition of SEV-SNP so that a VMM cannot break the
> > confidentiality of a guest.
> > 
> > In order to support this new CPUID bit move the LFENCE_DISPATCH
> > synthetic CPUID bit to map the hardware bit (leaving a hole in the
> > synthetic range) and either rely on the bit already being set by the
> > native CPUID output, or attempt to fake it in Xen by modifying the
> > DE_CFG MSR. This requires adding one more entry to the featureset to
> > support leaf 8021.eax.
> > 
> > The bit is exposed to guests by default if the underlying hardware
> > supports leaf 8021, as a way to signal that LFENCE is always
> > serializing. Hardware that doesn't have the leaf might also get the
> > bit set because Xen has performed the necessary arrangements, but
> > that's not yet exposed to guests. Note that Xen doesn't allow guests
> > to change the DE_CFG value, so once set by Xen LFENCE will always be
> > serializing.
> > 
> > Note that the access to DE_CFG by guests is left as-is: reads will
> > unconditionally return LFENCE_SERIALISE bit set, while writes are
> > silently dropped.
> > 
> > Suggested-by: Andrew Cooper 
> > Signed-off-by: Roger Pau Monné 
> 
> Reviewed-by: Jan Beulich 
> 
> > ---
> > Note this doesn't yet expose the bit on hardware that doesn't support
> > leaf 8021. It's still TBD whether we want to hardcode this support
> > manually, or instead rely on a more general approach like the one
> > suggested by the shrink max CPUID leaf patch from Jan.
> 
> I'd like to give Andrew a day or two more to respond there in case he
> continues to see an issue, before I commit that with your R-b and this
> one here. I'll assume you'll subsequently take care of that missing
> piece then - if not, i.e. if e.g. I should, please let me know.

I think it should be something like the above, in fact I think it
would be perfectly fine to merge that chunk into your patch?

Thanks, Roger.
---8<---
diff --git a/xen/arch/x86/cpuid.c b/xen/arch/x86/cpuid.c
index 050cd5713e2..daf501779fe 100644
--- a/xen/arch/x86/cpuid.c
+++ b/xen/arch/x86/cpuid.c
@@ -314,12 +314,9 @@ static void __init calculate_host_policy(void)
 
 *p = raw_cpuid_policy;
 
-p->basic.max_leaf =
-min_t(uint32_t, p->basic.max_leaf,   ARRAY_SIZE(p->basic.raw) - 1);
-p->feat.max_subleaf =
-min_t(uint32_t, p->feat.max_subleaf, ARRAY_SIZE(p->feat.raw) - 1);
-p->extd.max_leaf = 0x8000 | min_t(uint32_t, p->extd.max_leaf & 0x,
-  ARRAY_SIZE(p->extd.raw) - 1);
+p->basic.max_leaf = ARRAY_SIZE(p->basic.raw) - 1;
+p->feat.max_subleaf = ARRAY_SIZE(p->feat.raw) - 1;
+p->extd.max_leaf = 0x8000 | ARRAY_SIZE(p->extd.raw) - 1;
 
 cpuid_featureset_to_policy(boot_cpu_data.x86_capability, p);
 recalculate_xstate(p);

Re: [PATCH v2 2/2] x86/cpuid: support LFENCE always serializing CPUID bit

2021-04-20 Thread Jan Beulich

On 15.04.2021 16:47, Roger Pau Monne wrote:
> AMD Milan (Zen3) CPUs have an LFENCE Always Serializing CPUID bit in
> leaf 8021.eax. Previous AMD versions used to have a user settable
> bit in DE_CFG MSR to select whether LFENCE was dispatch serializing,
> which Xen always attempts to set. The forcefully always on setting is
> due to the addition of SEV-SNP so that a VMM cannot break the
> confidentiality of a guest.
> 
> In order to support this new CPUID bit move the LFENCE_DISPATCH
> synthetic CPUID bit to map the hardware bit (leaving a hole in the
> synthetic range) and either rely on the bit already being set by the
> native CPUID output, or attempt to fake it in Xen by modifying the
> DE_CFG MSR. This requires adding one more entry to the featureset to
> support leaf 8021.eax.
> 
> The bit is exposed to guests by default if the underlying hardware
> supports leaf 8021, as a way to signal that LFENCE is always
> serializing. Hardware that doesn't have the leaf might also get the
> bit set because Xen has performed the necessary arrangements, but
> that's not yet exposed to guests. Note that Xen doesn't allow guests
> to change the DE_CFG value, so once set by Xen LFENCE will always be
> serializing.
> 
> Note that the access to DE_CFG by guests is left as-is: reads will
> unconditionally return LFENCE_SERIALISE bit set, while writes are
> silently dropped.
> 
> Suggested-by: Andrew Cooper 
> Signed-off-by: Roger Pau Monné 

Reviewed-by: Jan Beulich 

> ---
> Note this doesn't yet expose the bit on hardware that doesn't support
> leaf 8021. It's still TBD whether we want to hardcode this support
> manually, or instead rely on a more general approach like the one
> suggested by the shrink max CPUID leaf patch from Jan.

I'd like to give Andrew a day or two more to respond there in case he
continues to see an issue, before I commit that with your R-b and this
one here. I'll assume you'll subsequently take care of that missing
piece then - if not, i.e. if e.g. I should, please let me know.

Jan

Re: [PATCH v2 3/3] docs/doxygen: doxygen documentation for grant_table.h

2021-04-20 Thread Jan Beulich

On 20.04.2021 11:42, Luca Fancellu wrote:
>> On 20 Apr 2021, at 10:14, Jan Beulich  wrote:
>> On 20.04.2021 10:46, Luca Fancellu wrote:
 On 19 Apr 2021, at 12:05, Jan Beulich  wrote:
 On 19.04.2021 11:12, Luca Fancellu wrote:
> - */
> -
> -/*
> - * Reference to a grant entry in a specified domain's grant table.
> - */
> -typedef uint32_t grant_ref_t;

 Why does this get moved ...

> -
> -/*
> + *
> * A grant table comprises a packed array of grant entries in one or more
> * page frames shared between Xen and a guest.
> + *
> * [XEN]: This field is written by Xen and read by the sharing guest.
> + *
> * [GST]: This field is written by the guest and read by Xen.
> + *
> + * @addtogroup grant_table Grant Tables
> + * @{
> */
>
> -/*
> - * Version 1 of the grant table entry structure is maintained purely
> - * for backwards compatibility.  New guests should use version 2.
> +/**
> + * Reference to a grant entry in a specified domain's grant table.
> */
> +typedef uint32_t grant_ref_t;

 ... here, past a comment unrelated to it?
>>>
>>> The comment “Version 1 of the grant table entry […]” was moved on top of 
>>> the struct grant_entry_v1, in this way
>>> Doxygen associate the comment to that structure.
>>>
>>> The comment “Reference to a grant entry in a specified domain's grant 
>>> table.” Is moved on top of typedef uint32_t grant_ref_t
>>> for the same reason above
>>
>> But it's the other comment ("A grant table comprises ...") that I
>> was asking about.
> 
> I thought it was part of the brief about grant tables, am I wrong? For that 
> reason I moved it.

Well, the comment talks about grant table entries (the layout of which
gets defined immediately below, as visible in the original patch), not
grant references.

> @@ -243,23 +258,27 @@ union grant_entry_v2 {
> * In that case, the frame field has the same semantics as the
> * field of the same name in the V1 entry structure.
> */
> +/** @cond skip anonymous struct/union for doxygen */
>struct {
>grant_entry_header_t hdr;
>uint32_t pad0;
>uint64_t frame;
>} full_page;
> +/** @endcond */
>
>/*
> * If the grant type is GTF_grant_access and GTF_sub_page is set,
> * @domid is allowed to access bytes [@page_off,@page_off+@length)
> * in frame @frame.
> */
> +/** @cond skip anonymous struct/union for doxygen */
>struct {
>grant_entry_header_t hdr;
>uint16_t page_off;
>uint16_t length;
>uint64_t frame;
>} sub_page;
> +/** @endcond */
>
>/*
> * If the grant is GTF_transitive, @domid is allowed to use the
> @@ -270,12 +289,14 @@ union grant_entry_v2 {
> * The current version of Xen does not allow transitive grants
> * to be mapped.
> */
> +/** @cond skip anonymous struct/union for doxygen */
>struct {
>grant_entry_header_t hdr;
>domid_t trans_domid;
>uint16_t pad0;
>grant_ref_t gref;
>} transitive;
> +/** @endcond */

 While already better than the introduction of strange struct tags,
 I'm still not convinced we want this extra clutter (sorry). Plus -
 don't these additions mean the sub-structures then won't be
 represented in the generated doc, rendering it (partly) useless?
>>>
>>> Are you suggesting to remove the structure from docs?
>>
>> Just yet I'm not suggesting anything here - I was merely guessing at
>> the effect of "@cond" and the associated "skip ..." to mean that this
>> construct makes doxygen skip the enclosed section. If that's not the
>> effect, then maybe the comment wants rewording? (And then - what does
>> @cond mean?)
> 
> Yes you were right, in the documentation the structure grant_entry_v2 won’t 
> display the
> anonymous union.

In which case I'm now completely unclear about your prior question.

Jan

Re: [PATCH] xen-mapcache: avoid a race on memory map while using MAP_FIXED

2021-04-20 Thread Roger Pau Monné

On Tue, Apr 20, 2021 at 10:45:03AM +0100, Igor Druzhinin wrote:
> On 20/04/2021 09:53, Roger Pau Monné wrote:
> > On Tue, Apr 20, 2021 at 04:35:02AM +0100, Igor Druzhinin wrote:
> > > When we're replacing the existing mapping there is possibility of a race
> > > on memory map with other threads doing mmap operations - the address being
> > > unmapped/re-mapped could be occupied by another thread in between.
> > > 
> > > Linux mmap man page recommends keeping the existing mappings in place to
> > > reserve the place and instead utilize the fact that the next mmap 
> > > operation
> > > with MAP_FIXED flag passed will implicitly destroy the existing mappings
> > > behind the chosen address. This behavior is guaranteed by POSIX / BSD and
> > > therefore is portable.
> > > 
> > > Note that it wouldn't make the replacement atomic for parallel accesses to
> > > the replaced region - those might still fail with SIGBUS due to
> > > xenforeignmemory_map not being atomic. So we're still not expecting those.
> > > 
> > > Tested-by: Anthony PERARD 
> > > Signed-off-by: Igor Druzhinin 
> > 
> > Should we add a 'Fixes: 759235653de ...' or similar tag here?
> 
> I was thinking about it and decided - no. There wasn't a bug here until QEMU
> introduced usage of iothreads at the state loading phase. Originally this
> process was totally single-threaded. And it's hard to pinpoint the exact
> moment when it happened which is also not directly related to the change
> here.

Right, might be worth mentioning in the commit message then, that the
code was designed to be used without any parallelism, and that the
addition of iothreads is what actually triggered the bug here.

Thanks, Roger.

[linux-linus test] 161298: regressions - FAIL

2021-04-20 Thread osstest service owner

flight 161298 linux-linus real [real]
http://logs.test-lab.xenproject.org/osstest/logs/161298/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-i386-xl-qemuu-ws16-amd64  7 xen-install   fail REGR. vs. 152332
 test-amd64-i386-xl-qemuu-debianhvm-amd64-shadow 7 xen-install fail REGR. vs. 
152332
 test-amd64-i386-xl-xsm7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-qemut-rhel6hvm-intel  7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-xl-qemuu-dmrestrict-amd64-dmrestrict 7 xen-install fail REGR. 
vs. 152332
 test-amd64-i386-xl-qemut-debianhvm-amd64  7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-examine   6 xen-install  fail REGR. vs. 152332
 test-amd64-i386-libvirt   7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-xl-qemuu-debianhvm-i386-xsm 7 xen-install fail REGR. vs. 152332
 test-amd64-i386-xl-qemuu-debianhvm-amd64  7 xen-install  fail REGR. vs. 152332
 test-amd64-coresched-i386-xl  7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-pair 10 xen-install/src_host fail REGR. vs. 152332
 test-amd64-i386-pair 11 xen-install/dst_host fail REGR. vs. 152332
 test-amd64-i386-qemut-rhel6hvm-amd  7 xen-installfail REGR. vs. 152332
 test-amd64-i386-xl7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-xl-qemut-ws16-amd64  7 xen-install   fail REGR. vs. 152332
 test-amd64-i386-qemuu-rhel6hvm-amd  7 xen-installfail REGR. vs. 152332
 test-amd64-i386-qemuu-rhel6hvm-intel  7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-xl-pvshim 7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-freebsd10-amd64  7 xen-install   fail REGR. vs. 152332
 test-amd64-i386-xl-raw7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-xl-qemut-debianhvm-i386-xsm 7 xen-install fail REGR. vs. 152332
 test-amd64-i386-xl-shadow 7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-freebsd10-i386  7 xen-installfail REGR. vs. 152332
 test-amd64-i386-libvirt-xsm   7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-xl-qemuu-ovmf-amd64  7 xen-install   fail REGR. vs. 152332
 test-amd64-i386-xl-qemut-win7-amd64  7 xen-install   fail REGR. vs. 152332
 test-amd64-i386-libvirt-pair 10 xen-install/src_host fail REGR. vs. 152332
 test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm 7 xen-install fail REGR. 
vs. 152332
 test-amd64-i386-libvirt-pair 11 xen-install/dst_host fail REGR. vs. 152332
 test-amd64-i386-xl-qemuu-win7-amd64  7 xen-install   fail REGR. vs. 152332
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 7 xen-install fail REGR. vs. 
152332
 test-amd64-amd64-amd64-pvgrub 20 guest-stop  fail REGR. vs. 152332
 test-amd64-amd64-i386-pvgrub 20 guest-stop   fail REGR. vs. 152332

Tests which are failing intermittently (not blocking):
 test-amd64-amd64-examine4 memdisk-try-append fail in 161284 pass in 161298
 test-amd64-amd64-xl-qemut-debianhvm-i386-xsm 12 debian-hvm-install fail pass 
in 161284

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 152332
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 152332
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 152332
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 152332
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 152332
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 152332
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 152332
 test-arm64-arm64-xl-seattle  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass

Re: [PATCH] xen-mapcache: avoid a race on memory map while using MAP_FIXED

2021-04-20 Thread Igor Druzhinin


On 20/04/2021 04:39, no-re...@patchew.org wrote:

Patchew URL: 
https://patchew.org/QEMU/1618889702-13104-1-git-send-email-igor.druzhi...@citrix.com/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Type: series
Message-id: 1618889702-13104-1-git-send-email-igor.druzhi...@citrix.com
Subject: [PATCH] xen-mapcache: avoid a race on memory map while using MAP_FIXED

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
 From https://github.com/patchew-project/qemu
  * [new tag] 
patchew/1618889702-13104-1-git-send-email-igor.druzhi...@citrix.com -> 
patchew/1618889702-13104-1-git-send-email-igor.druzhi...@citrix.com
Switched to a new branch 'test'
3102519 xen-mapcache: avoid a race on memory map while using MAP_FIXED

=== OUTPUT BEGIN ===
ERROR: Author email address is mangled by the mailing list
#2:
Author: Igor Druzhinin via 

total: 1 errors, 0 warnings, 21 lines checked



Anthony,

Is there any action required from me here? I don't completely understand 
how that happened. But I found this:


"In some cases the Author: email address in patches submitted to the
list gets mangled such that it says

John Doe via Qemu-devel 

This change is a result of workarounds for DMARC policies.

Subsystem maintainers accepting patches need to catch these and fix
them before sending pull requests, so a checkpatch.pl test is highly
desirable."

Igor

Re: [PATCH] xen-mapcache: avoid a race on memory map while using MAP_FIXED

2021-04-20 Thread Igor Druzhinin


On 20/04/2021 09:53, Roger Pau Monné wrote:

On Tue, Apr 20, 2021 at 04:35:02AM +0100, Igor Druzhinin wrote:

When we're replacing the existing mapping there is possibility of a race
on memory map with other threads doing mmap operations - the address being
unmapped/re-mapped could be occupied by another thread in between.

Linux mmap man page recommends keeping the existing mappings in place to
reserve the place and instead utilize the fact that the next mmap operation
with MAP_FIXED flag passed will implicitly destroy the existing mappings
behind the chosen address. This behavior is guaranteed by POSIX / BSD and
therefore is portable.

Note that it wouldn't make the replacement atomic for parallel accesses to
the replaced region - those might still fail with SIGBUS due to
xenforeignmemory_map not being atomic. So we're still not expecting those.

Tested-by: Anthony PERARD 
Signed-off-by: Igor Druzhinin 


Should we add a 'Fixes: 759235653de ...' or similar tag here?


I was thinking about it and decided - no. There wasn't a bug here until 
QEMU introduced usage of iothreads at the state loading phase. 
Originally this process was totally single-threaded. And it's hard to 
pinpoint the exact moment when it happened which is also not directly 
related to the change here.


Igor

Re: [PATCH v2 3/3] docs/doxygen: doxygen documentation for grant_table.h

2021-04-20 Thread Luca Fancellu




> On 20 Apr 2021, at 10:14, Jan Beulich  wrote:
> 
> On 20.04.2021 10:46, Luca Fancellu wrote:
>>> On 19 Apr 2021, at 12:05, Jan Beulich  wrote:
>>> 
>>> On 19.04.2021 11:12, Luca Fancellu wrote:
 Modification to include/public/grant_table.h:
 
 1) Add doxygen tags to:
 - Create Grant tables section
 - include variables in the generated documentation
 2) Add .rst file for grant table for Arm64
>>> 
>>> I'm missing some reasoning about at least some of the changes done
>>> to grant_table.h. Looking at this and the earlier patches I also
>>> couldn't spot any general outline of what is acceptable or even
>>> necessary in such a header to be understood by doxygen. Without
>>> this written down somewhere (or, if documented elsewhere, a
>>> pointer provided to that doc) I'm afraid things might get screwed
>>> up again later on.
>> 
>> Doxygen is a tool that generates documentation based on parsing the source 
>> code comments, it recognises some
>> commands in the comments and builds the documentation sticking to what you 
>> wrote.
>> 
>> Here the doxygen docs: https://www.doxygen.nl/manual/docblocks.html
>> 
>> Basically it doesn’t react to all comments, it parses only some well crafted 
>> comments as explained in its documentation.
> 
> Providing this link somewhere might be helpful. However, also seeing
> some of your further comments, it feels like to edit a public header
> enabled for doxygen one has to know fair parts of this documentation.
> While I'm certainly in favor of having a way to generate docs from
> the headers, I'm afraid I don't think such knowledge should become a
> prereq to e.g. adding a new sub-function of a hypercall. So far I was
> assuming that the formatting requirements would be quite limited, and
> that it would hence be possible to just glean them from existing code.
> But e.g. the "/**<" notation isn't going to be obvious to spot.

Yes I see, some knowledge of the documentation tool is required but I’m sure 
that if anyone is able to
put his/her hands on the xen code, he/she will be able to use some of the 
doxygen tag too, as you can
see from this patch the usage is easy and some less obvious thing can be 
understood looking the source
code and the corresponding generated doc page.


> 
 --- a/docs/hypercall-interfaces/arm64.rst
 +++ b/docs/hypercall-interfaces/arm64.rst
 @@ -8,6 +8,7 @@ Starting points
 .. toctree::
   :maxdepth: 2
 
 +   arm64/grant_tables
 
 
 Functions
 diff --git a/docs/hypercall-interfaces/arm64/grant_tables.rst 
 b/docs/hypercall-interfaces/arm64/grant_tables.rst
 new file mode 100644
 index 00..8955ec5812
 --- /dev/null
 +++ b/docs/hypercall-interfaces/arm64/grant_tables.rst
 @@ -0,0 +1,8 @@
 +.. SPDX-License-Identifier: CC-BY-4.0
 +
 +Grant Tables
 +
 +
 +.. doxygengroup:: grant_table
>>> 
>>> Why is this Arm64-specific?
>> 
>> This particular one is Arm64 specific, but it doesn’t mean that grant tables 
>> are arm specific, it is only that for now I’m
>> Introducing a partial documentation in the arm side. Any other user can in 
>> the future add more documentation for
>> each platform.
> 
> I'm of the pretty strong opinion that common hypercalls should be
> documented as common, and hence not live in an arch-specific
> section.

Yes sure, I agree, but this is the first step to enable a kind of 
documentation, it’s a work in progress.
On a future serie this can be modify. For now I’ve tried to reproduce this 
page: https://xenbits.xen.org/docs/unstable/
In the section “Hypercall Interfaces"

> 
 @@ -73,20 +75,25 @@
 *   frame, or zero if none.
 *  3. Write memory barrier (WMB).
 *  4. Write ent->flags, inc. valid type.
 + * @endcode
 *
 * Invalidating an unused GTF_permit_access entry:
 + * @code
 *  1. flags = ent->flags.
 *  2. Observe that !(flags & (GTF_reading|GTF_writing)).
 *  3. Check result of SMP-safe CMPXCHG(>flags, flags, 0).
 *  NB. No need for WMB as reuse of entry is control-dependent on success of
 *  step 3, and all architectures guarantee ordering of ctrl-dep writes.
 + * @endcode
 *
 * Invalidating an in-use GTF_permit_access entry:
 + *
 *  This cannot be done directly. Request assistance from the domain 
 controller
 *  which can set a timeout on the use of a grant entry and take necessary
 *  action. (NB. This is not yet implemented!).
 *
 * Invalidating an unused GTF_accept_transfer entry:
 + * @code
 *  1. flags = ent->flags.
 *  2. Observe that !(flags & GTF_transfer_committed). [*]
 *  3. Check result of SMP-safe CMPXCHG(>flags, flags, 0).
 @@ -97,47 +104,55 @@
 *  transferred frame is written. It is safe for the guest to spin 
 waiting
 *  for this to occur (detect by observing GTF_transfer_completed in
 *  ent->flags).

Re: [PATCH v3 5/6] x86/vpic: issue dpci EOI for cleared pins at ICW1

2021-04-20 Thread Jan Beulich

On 26.01.2021 17:57, Jan Beulich wrote:
> On 26.01.2021 14:45, Roger Pau Monne wrote:
>> When pins are cleared from either ISR or IRR as part of the
>> initialization sequence forward the clearing of those pins to the dpci
>> EOI handler, as it is equivalent to an EOI. Not doing so can bring the
>> interrupt controller state out of sync with the dpci handling logic,
>> that expects a notification when a pin has been EOI'ed.
>>
>> Fixes: 7b3cb5e5416 ('IRQ injection changes for HVM PCI passthru.')
>> Signed-off-by: Roger Pau Monné 
> 
> As said before, under the assumption that the clearing of IRR
> and ISR that we do is correct, I agree with the change. I'd
> like to give it some time though before giving my R-b here, for
> the inquiry to hopefully get answered. After all there's still
> the possibility of us needing to instead squash that clearing
> (which then would seem to result in getting things in sync the
> other way around).

Still haven't heard anything, so

Reviewed-by: Jan Beulich 

In the worst case we'd need to consider reverting later on.

Jan

[qemu-mainline bisection] complete test-arm64-arm64-libvirt-xsm

2021-04-20 Thread osstest service owner

branch xen-unstable
xenbranch xen-unstable
job test-arm64-arm64-libvirt-xsm
testid guest-start

Tree: libvirt git://xenbits.xen.org/libvirt.git
Tree: libvirt_keycodemapdb https://gitlab.com/keycodemap/keycodemapdb.git
Tree: linux git://xenbits.xen.org/linux-pvops.git
Tree: linuxfirmware git://xenbits.xen.org/osstest/linux-firmware.git
Tree: ovmf git://xenbits.xen.org/osstest/ovmf.git
Tree: qemuu git://git.qemu.org/qemu.git
Tree: seabios git://xenbits.xen.org/osstest/seabios.git
Tree: xen git://xenbits.xen.org/xen.git

*** Found and reproduced problem changeset ***

  Bug is in tree:  qemuu git://git.qemu.org/qemu.git
  Bug introduced:  8d17adf34f501ded65a106572740760f0a75577c
  Bug not present: e67d8e2928200e24ecb47c7be3ea8270077f2996
  Last fail repro: http://logs.test-lab.xenproject.org/osstest/logs/161318/


  commit 8d17adf34f501ded65a106572740760f0a75577c
  Author: Daniel P. BerrangÃ© 
  Date:   Mon Feb 22 11:16:32 2021 +
  
  block: remove support for using "file" driver with block/char devices
  
  The 'host_device' and 'host_cdrom' drivers must be used instead.
  
  Reviewed-by: Eric Blake 
  Signed-off-by: Daniel P. BerrangÃ© 


For bisection revision-tuple graph see:
   
http://logs.test-lab.xenproject.org/osstest/results/bisect/qemu-mainline/test-arm64-arm64-libvirt-xsm.guest-start.html
Revision IDs in each graph node refer, respectively, to the Trees above.


Running cs-bisection-step 
--graph-out=/home/logs/results/bisect/qemu-mainline/test-arm64-arm64-libvirt-xsm.guest-start
 --summary-out=tmp/161318.bisection-summary --basis-template=152631 
--blessings=real,real-bisect,real-retry qemu-mainline 
test-arm64-arm64-libvirt-xsm guest-start
Searching for failure / basis pass:
 161290 fail [host=rochester1] / 160125 [host=laxton0] 160119 [host=rochester0] 
160113 [host=laxton1] 160104 ok.
Failure / basis pass flights: 161290 / 160104
Tree: libvirt git://xenbits.xen.org/libvirt.git
Tree: libvirt_keycodemapdb https://gitlab.com/keycodemap/keycodemapdb.git
Tree: linux git://xenbits.xen.org/linux-pvops.git
Tree: linuxfirmware git://xenbits.xen.org/osstest/linux-firmware.git
Tree: ovmf git://xenbits.xen.org/osstest/ovmf.git
Tree: qemuu git://git.qemu.org/qemu.git
Tree: seabios git://xenbits.xen.org/osstest/seabios.git
Tree: xen git://xenbits.xen.org/xen.git
Latest 2c846fa6bcc11929c9fb857a22430fb9945654ad 
27acf0ef828bf719b2053ba398b195829413dbdd 
a6c5dd1dbaffe4cc398d8454546ba9246b9a95c9 
c530a75c1e6a472b0eb9558310b518f0dfcd8860 
99e7e48cc7117c37fc1c08a741872d0875595796 
8fe9f1f891eff4e37f82622b7480ee748bf4af74 
b0d61ecef66eb05bd7a4eb7ada88ec5dab06dfee 
dd22a64de7e02b48312839a15179528c8f7db5c6
Basis pass 2c846fa6bcc11929c9fb857a22430fb9945654ad 
27acf0ef828bf719b2053ba398b195829413dbdd 
a6c5dd1dbaffe4cc398d8454546ba9246b9a95c9 
c530a75c1e6a472b0eb9558310b518f0dfcd8860 
f3bdfc41866edf7c256e689deb9d091a950c8fca 
5b7f5586d182b0cafb1f8d558992a14763e2953e 
b0d61ecef66eb05bd7a4eb7ada88ec5dab06dfee 
b4011741e6b39a8fd0ed5aded96c16c45ead5888
Generating revisions with ./adhoc-revtuple-generator  
git://xenbits.xen.org/libvirt.git#2c846fa6bcc11929c9fb857a22430fb9945654ad-2c846fa6bcc11929c9fb857a22430fb9945654ad
 
https://gitlab.com/keycodemap/keycodemapdb.git#27acf0ef828bf719b2053ba398b195829413dbdd-27acf0ef828bf719b2053ba398b195829413dbdd
 
git://xenbits.xen.org/linux-pvops.git#a6c5dd1dbaffe4cc398d8454546ba9246b9a95c9-a6c5dd1dbaffe4cc398d8454546ba9246b9a95c9
 
git://xenbits.xen.org/osstest/linux-firmware.git#c530a75c1e6a472b0eb9558310b518f0\
 dfcd8860-c530a75c1e6a472b0eb9558310b518f0dfcd8860 
git://xenbits.xen.org/osstest/ovmf.git#f3bdfc41866edf7c256e689deb9d091a950c8fca-99e7e48cc7117c37fc1c08a741872d0875595796
 
git://git.qemu.org/qemu.git#5b7f5586d182b0cafb1f8d558992a14763e2953e-8fe9f1f891eff4e37f82622b7480ee748bf4af74
 
git://xenbits.xen.org/osstest/seabios.git#b0d61ecef66eb05bd7a4eb7ada88ec5dab06dfee-b0d61ecef66eb05bd7a4eb7ada88ec5dab06dfee
 
git://xenbits.xen.org/xen.git#b4011741e6b39a8fd0ed5aded96c16c45ead5888-dd22a64de7e02b48312839a1\
 5179528c8f7db5c6
Loaded 44247 nodes in revision graph
Searching for test results:
 160104 pass 2c846fa6bcc11929c9fb857a22430fb9945654ad 
27acf0ef828bf719b2053ba398b195829413dbdd 
a6c5dd1dbaffe4cc398d8454546ba9246b9a95c9 
c530a75c1e6a472b0eb9558310b518f0dfcd8860 
f3bdfc41866edf7c256e689deb9d091a950c8fca 
5b7f5586d182b0cafb1f8d558992a14763e2953e 
b0d61ecef66eb05bd7a4eb7ada88ec5dab06dfee 
b4011741e6b39a8fd0ed5aded96c16c45ead5888
 160113 [host=laxton1]
 160119 [host=rochester0]
 160125 [host=laxton0]
 160134 fail irrelevant
 160147 fail 2c846fa6bcc11929c9fb857a22430fb9945654ad 
27acf0ef828bf719b2053ba398b195829413dbdd 
a6c5dd1dbaffe4cc398d8454546ba9246b9a95c9 
c530a75c1e6a472b0eb9558310b518f0dfcd8860 
eb07bfb09ef5483ad58ed0eba713f32fb0c909f9 
2e1293cbaac75e84f541f9acfa8e26749f4c3562 
b0d61ecef66eb05bd7a4eb7ada88ec5dab06dfee 
dae3c3e8b257cd27d6b35a467a34bf79a6650340
 160167 fail

Re: swiotlb cleanups v3

2021-04-20 Thread Christoph Hellwig

On Sat, Apr 17, 2021 at 11:39:22AM -0500, Tom Lendacky wrote:
> Somewhere between the 1st and 2nd patch, specifying a specific swiotlb
> for an SEV guest is no longer honored. For example, if I start an SEV
> guest with 16GB of memory and specify swiotlb=131072 I used to get a
> 256MB SWIOTLB. However, after the 2nd patch, the swiotlb=131072 is no
> longer honored and I get a 982MB SWIOTLB (as set via sev_setup_arch() in
> arch/x86/mm/mem_encrypt.c).
> 
> I can't be sure which patch caused the issue since an SEV guest fails to
> boot with the 1st patch but can boot with the 2nd patch, at which point
> the SWIOTLB comes in at 982MB (I haven't had a chance to debug it and so
> I'm hoping you might be able to quickly spot what's going on).

Can you try this patch?

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 0a5b6f7e75bce6..ac81ef97df32f5 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -71,15 +71,17 @@ struct io_tlb_mem *io_tlb_default_mem;
  */
 static unsigned int max_segment;
 
-static unsigned long default_nslabs = IO_TLB_DEFAULT_SIZE >> IO_TLB_SHIFT;
+static unsigned long swiotlb_cmdline_size;
 
 static int __init
 setup_io_tlb_npages(char *str)
 {
if (isdigit(*str)) {
/* avoid tail segment of size < IO_TLB_SEGSIZE */
-   default_nslabs =
-   ALIGN(simple_strtoul(str, , 0), IO_TLB_SEGSIZE);
+   unsigned long nslabs = simple_strtoul(str, , 0);
+
+   swiotlb_cmdline_size =
+   ALIGN(nslabs, IO_TLB_SEGSIZE) << IO_TLB_SHIFT;
}
if (*str == ',')
++str;
@@ -108,7 +110,9 @@ void swiotlb_set_max_segment(unsigned int val)
 
 unsigned long swiotlb_size_or_default(void)
 {
-   return default_nslabs << IO_TLB_SHIFT;
+   if (swiotlb_cmdline_size)
+   return swiotlb_cmdline_size;
+   return IO_TLB_DEFAULT_SIZE;
 }
 
 void __init swiotlb_adjust_size(unsigned long size)
@@ -118,9 +122,10 @@ void __init swiotlb_adjust_size(unsigned long size)
 * architectures such as those supporting memory encryption to
 * adjust/expand SWIOTLB size for their use.
 */
-   size = ALIGN(size, IO_TLB_SIZE);
-   default_nslabs = ALIGN(size >> IO_TLB_SHIFT, IO_TLB_SEGSIZE);
-   pr_info("SWIOTLB bounce buffer size adjusted to %luMB", size >> 20);
+   if (!swiotlb_cmdline_size)
+   swiotlb_cmdline_size = ALIGN(size, IO_TLB_SIZE);
+   pr_info("SWIOTLB bounce buffer size adjusted to %luMB",
+   swiotlb_cmdline_size >> 20);
 }
 
 void swiotlb_print_info(void)
@@ -209,7 +214,7 @@ int __init swiotlb_init_with_tbl(char *tlb, unsigned long 
nslabs, int verbose)
 void  __init
 swiotlb_init(int verbose)
 {
-   size_t bytes = PAGE_ALIGN(default_nslabs << IO_TLB_SHIFT);
+   size_t bytes = PAGE_ALIGN(swiotlb_size_or_default());
void *tlb;
 
if (swiotlb_force == SWIOTLB_NO_FORCE)
@@ -219,7 +224,7 @@ swiotlb_init(int verbose)
tlb = memblock_alloc_low(bytes, PAGE_SIZE);
if (!tlb)
goto fail;
-   if (swiotlb_init_with_tbl(tlb, default_nslabs, verbose))
+   if (swiotlb_init_with_tbl(tlb, bytes >> IO_TLB_SHIFT, verbose))
goto fail_free_mem;
return;

[libvirt test] 161311: regressions - FAIL

2021-04-20 Thread osstest service owner

flight 161311 libvirt real [real]
http://logs.test-lab.xenproject.org/osstest/logs/161311/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-armhf-libvirt   6 libvirt-buildfail REGR. vs. 151777
 build-amd64-libvirt   6 libvirt-buildfail REGR. vs. 151777
 build-arm64-libvirt   6 libvirt-buildfail REGR. vs. 151777
 build-i386-libvirt6 libvirt-buildfail REGR. vs. 151777

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-pair  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked n/a
 test-amd64-amd64-libvirt-vhd  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-xsm  1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt   1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-pair  1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked n/a
 test-amd64-i386-libvirt-xsm   1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt  1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt-qcow2  1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt-xsm  1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt  1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt-raw  1 build-check(1)   blocked  n/a

version targeted for testing:
 libvirt  1c34211c22de28127a509edbf2cf2f44cb0d891e
baseline version:
 libvirt  2c846fa6bcc11929c9fb857a22430fb9945654ad

Last test of basis   151777  2020-07-10 04:19:19 Z  284 days
Failing since151818  2020-07-11 04:18:52 Z  283 days  276 attempts
Testing same since   161311  2021-04-20 04:18:55 Z0 days1 attempts


People who touched revisions under test:
  Adolfo Jayme Barrientos 
  Aleksandr Alekseev 
  Aleksei Zakharov 
  Andika Triwidada 
  Andrea Bolognani 
  Balázs Meskó 
  Barrett Schonefeld 
  Bastien Orivel 
  BiaoXiang Ye 
  Bihong Yu 
  Binfeng Wu 
  Boris Fiuczynski 
  Brian Turek 
  Bruno Haible 
  Chris Mayo 
  Christian Ehrhardt 
  Christian Schoenebeck 
  Cole Robinson 
  Collin Walling 
  Cornelia Huck 
  Cédric Bosdonnat 
  Côme Borsoi 
  Daniel Henrique Barboza 
  Daniel Letai 
  Daniel P. Berrange 
  Daniel P. Berrangé 
  Dmytro Linkin 
  Eiichi Tsukata 
  Eric Farman 
  Erik Skultety 
  Fabian Affolter 
  Fabian Freyer 
  Fangge Jin 
  Farhan Ali 
  Fedora Weblate Translation 
  gongwei 
  Guoyi Tu
  Göran Uddeborg 
  Halil Pasic 
  Han Han 
  Hao Wang 
  Hela Basa 
  Helmut Grohne 
  Ian Wienand 
  Jakob Meng 
  Jamie Strandboge 
  Jamie Strandboge 
  Jan Kuparinen 
  Jean-Baptiste Holcroft 
  Jianan Gao 
  Jim Fehlig 
  Jin Yan 
  Jiri Denemark 
  John Ferlan 
  Jonathan Watt 
  Jonathon Jongsma 
  Julio Faracco 
  Ján Tomko 
  Kashyap Chamarthy 
  Kevin Locke 
  Kristina Hanicova 
  Laine Stump 
  Laszlo Ersek 
  Liao Pingfang 
  Lin Ma 
  Lin Ma 
  Lin Ma 
  Luke Yue 
  Luyao Zhong 
  Marc Hartmayer 
  Marc-André Lureau 
  Marek Marczykowski-Górecki 
  Markus Schade 
  Martin Kletzander 
  Masayoshi Mizuma 
  Matt Coleman 
  Matt Coleman 
  Mauro Matteo Cascella 
  Meina Li 
  Michal Privoznik 
  Michał Smyk 
  Milo Casagrande 
  Moshe Levi 
  Muha Aliss 
  Neal Gompa 
  Nick Shyrokovskiy 
  Nickys Music Group 
  Nico Pache 
  Nikolay Shirokovskiy 
  Olaf Hering 
  Olesya Gerasimenko 
  Orion Poplawski 
  Pany 
  Patrick Magauran 
  Paulo de Rezende Pinatti 
  Pavel Hrdina 
  Peng Liang 
  Peter Krempa 
  Pino Toscano 
  Pino Toscano 
  Piotr Drąg 
  Prathamesh Chavan 
  Ricky Tigg 
  Roman Bogorodskiy 
  Roman Bolshakov 
  Ryan Gahagan 
  Ryan Schmidt 
  Sam Hartman 
  Scott Shambarger 
  Sebastian Mitterle 
  SeongHyun Jo 
  Shalini Chellathurai Saroja 
  Shaojun Yang 
  Shi Lei 
  simmon 
  Simon Gaiser 
  Stefan Bader 
  Stefan Berger 
  Stefan Berger 
  Szymon Scholz 
  Thomas Huth 
  Tim Wiederhake 
  Tomáš Golembiovský 
  Tomáš Janoušek 
  Tuguoyi 
  Ville Skyttä 
  Wang Xin 
  WangJian 
  Weblate 
  Yalei Li <274268...@qq.com>
  Yalei Li 
  Yang Hang 
  Yanqiu Zhang 
  Yaroslav Kargin 
  Yi Li 
  Yi Wang 
  Yuri Chornoivan 
  Zheng Chuan 
  zhenwei pi 
  Zhenyu Zheng 

jobs:
 build-amd64-xsm  pass
 build-arm64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-arm64  pass
 build-armhf  pass
 build-i386

Re: [PATCH v2 3/3] docs/doxygen: doxygen documentation for grant_table.h

2021-04-20 Thread Jan Beulich

On 20.04.2021 10:46, Luca Fancellu wrote:
>> On 19 Apr 2021, at 12:05, Jan Beulich  wrote:
>>
>> On 19.04.2021 11:12, Luca Fancellu wrote:
>>> Modification to include/public/grant_table.h:
>>>
>>> 1) Add doxygen tags to:
>>> - Create Grant tables section
>>> - include variables in the generated documentation
>>> 2) Add .rst file for grant table for Arm64
>>
>> I'm missing some reasoning about at least some of the changes done
>> to grant_table.h. Looking at this and the earlier patches I also
>> couldn't spot any general outline of what is acceptable or even
>> necessary in such a header to be understood by doxygen. Without
>> this written down somewhere (or, if documented elsewhere, a
>> pointer provided to that doc) I'm afraid things might get screwed
>> up again later on.
> 
> Doxygen is a tool that generates documentation based on parsing the source 
> code comments, it recognises some
> commands in the comments and builds the documentation sticking to what you 
> wrote.
> 
> Here the doxygen docs: https://www.doxygen.nl/manual/docblocks.html
> 
> Basically it doesn’t react to all comments, it parses only some well crafted 
> comments as explained in its documentation.

Providing this link somewhere might be helpful. However, also seeing
some of your further comments, it feels like to edit a public header
enabled for doxygen one has to know fair parts of this documentation.
While I'm certainly in favor of having a way to generate docs from
the headers, I'm afraid I don't think such knowledge should become a
prereq to e.g. adding a new sub-function of a hypercall. So far I was
assuming that the formatting requirements would be quite limited, and
that it would hence be possible to just glean them from existing code.
But e.g. the "/**<" notation isn't going to be obvious to spot.

>>> --- a/docs/hypercall-interfaces/arm64.rst
>>> +++ b/docs/hypercall-interfaces/arm64.rst
>>> @@ -8,6 +8,7 @@ Starting points
>>> .. toctree::
>>>:maxdepth: 2
>>>
>>> +   arm64/grant_tables
>>>
>>>
>>> Functions
>>> diff --git a/docs/hypercall-interfaces/arm64/grant_tables.rst 
>>> b/docs/hypercall-interfaces/arm64/grant_tables.rst
>>> new file mode 100644
>>> index 00..8955ec5812
>>> --- /dev/null
>>> +++ b/docs/hypercall-interfaces/arm64/grant_tables.rst
>>> @@ -0,0 +1,8 @@
>>> +.. SPDX-License-Identifier: CC-BY-4.0
>>> +
>>> +Grant Tables
>>> +
>>> +
>>> +.. doxygengroup:: grant_table
>>
>> Why is this Arm64-specific?
> 
> This particular one is Arm64 specific, but it doesn’t mean that grant tables 
> are arm specific, it is only that for now I’m
> Introducing a partial documentation in the arm side. Any other user can in 
> the future add more documentation for
> each platform.

I'm of the pretty strong opinion that common hypercalls should be
documented as common, and hence not live in an arch-specific
section.

>>> @@ -73,20 +75,25 @@
>>>  *   frame, or zero if none.
>>>  *  3. Write memory barrier (WMB).
>>>  *  4. Write ent->flags, inc. valid type.
>>> + * @endcode
>>>  *
>>>  * Invalidating an unused GTF_permit_access entry:
>>> + * @code
>>>  *  1. flags = ent->flags.
>>>  *  2. Observe that !(flags & (GTF_reading|GTF_writing)).
>>>  *  3. Check result of SMP-safe CMPXCHG(>flags, flags, 0).
>>>  *  NB. No need for WMB as reuse of entry is control-dependent on success of
>>>  *  step 3, and all architectures guarantee ordering of ctrl-dep writes.
>>> + * @endcode
>>>  *
>>>  * Invalidating an in-use GTF_permit_access entry:
>>> + *
>>>  *  This cannot be done directly. Request assistance from the domain 
>>> controller
>>>  *  which can set a timeout on the use of a grant entry and take necessary
>>>  *  action. (NB. This is not yet implemented!).
>>>  *
>>>  * Invalidating an unused GTF_accept_transfer entry:
>>> + * @code
>>>  *  1. flags = ent->flags.
>>>  *  2. Observe that !(flags & GTF_transfer_committed). [*]
>>>  *  3. Check result of SMP-safe CMPXCHG(>flags, flags, 0).
>>> @@ -97,47 +104,55 @@
>>>  *  transferred frame is written. It is safe for the guest to spin 
>>> waiting
>>>  *  for this to occur (detect by observing GTF_transfer_completed in
>>>  *  ent->flags).
>>> + * @endcode
>>>  *
>>>  * Invalidating a committed GTF_accept_transfer entry:
>>>  *  1. Wait for (ent->flags & GTF_transfer_completed).
>>>  *
>>>  * Changing a GTF_permit_access from writable to read-only:
>>> + *
>>>  *  Use SMP-safe CMPXCHG to set GTF_readonly, while checking !GTF_writing.
>>>  *
>>>  * Changing a GTF_permit_access from read-only to writable:
>>> + *
>>>  *  Use SMP-safe bit-setting instruction.
>>
>> For example - are the blank lines you add necessary or merely nice
>> to have in your personal opinion?
> 
> The blank lines makes the docs output more good looking

I'm inclined to suggest to split beautification from basic enabling.

>>> - */
>>> -
>>> -/*
>>> - * Reference to a grant entry in a specified domain's grant table.
>>> - */
>>> -typedef uint32_t

Re: [PATCH] xen-mapcache: avoid a race on memory map while using MAP_FIXED

2021-04-20 Thread Roger Pau Monné

On Tue, Apr 20, 2021 at 04:35:02AM +0100, Igor Druzhinin wrote:
> When we're replacing the existing mapping there is possibility of a race
> on memory map with other threads doing mmap operations - the address being
> unmapped/re-mapped could be occupied by another thread in between.
> 
> Linux mmap man page recommends keeping the existing mappings in place to
> reserve the place and instead utilize the fact that the next mmap operation
> with MAP_FIXED flag passed will implicitly destroy the existing mappings
> behind the chosen address. This behavior is guaranteed by POSIX / BSD and
> therefore is portable.
> 
> Note that it wouldn't make the replacement atomic for parallel accesses to
> the replaced region - those might still fail with SIGBUS due to
> xenforeignmemory_map not being atomic. So we're still not expecting those.
> 
> Tested-by: Anthony PERARD 
> Signed-off-by: Igor Druzhinin 

Should we add a 'Fixes: 759235653de ...' or similar tag here?

> ---
>  hw/i386/xen/xen-mapcache.c | 15 ++-
>  1 file changed, 14 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/i386/xen/xen-mapcache.c b/hw/i386/xen/xen-mapcache.c
> index 5b120ed..e82b7dc 100644
> --- a/hw/i386/xen/xen-mapcache.c
> +++ b/hw/i386/xen/xen-mapcache.c
> @@ -171,7 +171,20 @@ static void xen_remap_bucket(MapCacheEntry *entry,
>  if (!(entry->flags & XEN_MAPCACHE_ENTRY_DUMMY)) {
>  ram_block_notify_remove(entry->vaddr_base, entry->size);
>  }
> -if (munmap(entry->vaddr_base, entry->size) != 0) {
> +
> +/*
> + * If an entry is being replaced by another mapping and we're using
> + * MAP_FIXED flag for it - there is possibility of a race for vaddr
> + * address with another thread doing an mmap call itself
> + * (see man 2 mmap). To avoid that we skip explicit unmapping here
> + * and allow the kernel to destroy the previous mappings by replacing
> + * them in mmap call later.
> + *
> + * Non-identical replacements are not allowed therefore.
> + */
> +assert(!vaddr || (entry->vaddr_base == vaddr && entry->size == 
> size));
> +
> +if (!vaddr && munmap(entry->vaddr_base, entry->size) != 0) {

Oh, so remappings of foreign addresses must always use the same
virtual address space, I somehow assumed that you could remap an
existing foreign map (or dummy mapping) into a different virtual
address if you so wanted.

Thanks, Roger.

Re: [PATCH v2 3/3] docs/doxygen: doxygen documentation for grant_table.h

2021-04-20 Thread Luca Fancellu




> On 19 Apr 2021, at 12:05, Jan Beulich  wrote:
> 
> On 19.04.2021 11:12, Luca Fancellu wrote:
>> Modification to include/public/grant_table.h:
>> 
>> 1) Add doxygen tags to:
>> - Create Grant tables section
>> - include variables in the generated documentation
>> 2) Add .rst file for grant table for Arm64
> 
> I'm missing some reasoning about at least some of the changes done
> to grant_table.h. Looking at this and the earlier patches I also
> couldn't spot any general outline of what is acceptable or even
> necessary in such a header to be understood by doxygen. Without
> this written down somewhere (or, if documented elsewhere, a
> pointer provided to that doc) I'm afraid things might get screwed
> up again later on.

Hi Jan,

Doxygen is a tool that generates documentation based on parsing the source code 
comments, it recognises some
commands in the comments and builds the documentation sticking to what you 
wrote.

Here the doxygen docs: https://www.doxygen.nl/manual/docblocks.html

Basically it doesn’t react to all comments, it parses only some well crafted 
comments as explained in its documentation.

> 
>> --- a/docs/hypercall-interfaces/arm64.rst
>> +++ b/docs/hypercall-interfaces/arm64.rst
>> @@ -8,6 +8,7 @@ Starting points
>> .. toctree::
>>:maxdepth: 2
>> 
>> +   arm64/grant_tables
>> 
>> 
>> Functions
>> diff --git a/docs/hypercall-interfaces/arm64/grant_tables.rst 
>> b/docs/hypercall-interfaces/arm64/grant_tables.rst
>> new file mode 100644
>> index 00..8955ec5812
>> --- /dev/null
>> +++ b/docs/hypercall-interfaces/arm64/grant_tables.rst
>> @@ -0,0 +1,8 @@
>> +.. SPDX-License-Identifier: CC-BY-4.0
>> +
>> +Grant Tables
>> +
>> +
>> +.. doxygengroup:: grant_table
> 
> Why is this Arm64-specific?

This particular one is Arm64 specific, but it doesn’t mean that grant tables 
are arm specific, it is only that for now I’m
Introducing a partial documentation in the arm side. Any other user can in the 
future add more documentation for
each platform.

> 
>> @@ -73,20 +75,25 @@
>>  *   frame, or zero if none.
>>  *  3. Write memory barrier (WMB).
>>  *  4. Write ent->flags, inc. valid type.
>> + * @endcode
>>  *
>>  * Invalidating an unused GTF_permit_access entry:
>> + * @code
>>  *  1. flags = ent->flags.
>>  *  2. Observe that !(flags & (GTF_reading|GTF_writing)).
>>  *  3. Check result of SMP-safe CMPXCHG(>flags, flags, 0).
>>  *  NB. No need for WMB as reuse of entry is control-dependent on success of
>>  *  step 3, and all architectures guarantee ordering of ctrl-dep writes.
>> + * @endcode
>>  *
>>  * Invalidating an in-use GTF_permit_access entry:
>> + *
>>  *  This cannot be done directly. Request assistance from the domain 
>> controller
>>  *  which can set a timeout on the use of a grant entry and take necessary
>>  *  action. (NB. This is not yet implemented!).
>>  *
>>  * Invalidating an unused GTF_accept_transfer entry:
>> + * @code
>>  *  1. flags = ent->flags.
>>  *  2. Observe that !(flags & GTF_transfer_committed). [*]
>>  *  3. Check result of SMP-safe CMPXCHG(>flags, flags, 0).
>> @@ -97,47 +104,55 @@
>>  *  transferred frame is written. It is safe for the guest to spin 
>> waiting
>>  *  for this to occur (detect by observing GTF_transfer_completed in
>>  *  ent->flags).
>> + * @endcode
>>  *
>>  * Invalidating a committed GTF_accept_transfer entry:
>>  *  1. Wait for (ent->flags & GTF_transfer_completed).
>>  *
>>  * Changing a GTF_permit_access from writable to read-only:
>> + *
>>  *  Use SMP-safe CMPXCHG to set GTF_readonly, while checking !GTF_writing.
>>  *
>>  * Changing a GTF_permit_access from read-only to writable:
>> + *
>>  *  Use SMP-safe bit-setting instruction.
> 
> For example - are the blank lines you add necessary or merely nice
> to have in your personal opinion?

The blank lines makes the docs output more good looking

> 
>> - */
>> -
>> -/*
>> - * Reference to a grant entry in a specified domain's grant table.
>> - */
>> -typedef uint32_t grant_ref_t;
> 
> Why does this get moved ...
> 
>> -
>> -/*
>> + *
>>  * A grant table comprises a packed array of grant entries in one or more
>>  * page frames shared between Xen and a guest.
>> + *
>>  * [XEN]: This field is written by Xen and read by the sharing guest.
>> + *
>>  * [GST]: This field is written by the guest and read by Xen.
>> + *
>> + * @addtogroup grant_table Grant Tables
>> + * @{
>>  */
>> 
>> -/*
>> - * Version 1 of the grant table entry structure is maintained purely
>> - * for backwards compatibility.  New guests should use version 2.
>> +/**
>> + * Reference to a grant entry in a specified domain's grant table.
>>  */
>> +typedef uint32_t grant_ref_t;
> 
> ... here, past a comment unrelated to it?

The comment “Version 1 of the grant table entry […]” was moved on top of the 
struct grant_entry_v1, in this way
Doxygen associate the comment to that structure.

The comment “Reference to a grant entry in a specified domain's grant

Re: [PATCH v3] x86/CPUID: shrink max_{,sub}leaf fields according to actual leaf contents

2021-04-20 Thread Roger Pau Monné

On Mon, Apr 19, 2021 at 02:29:38PM +0200, Jan Beulich wrote:
> On 19.04.2021 14:09, Roger Pau Monné wrote:
> > On Mon, Apr 19, 2021 at 01:46:02PM +0200, Jan Beulich wrote:
> >> On 19.04.2021 11:16, Roger Pau Monné wrote:
> >>> Adding Paul also for the Viridian part.
> >>>
> >>> On Fri, Apr 16, 2021 at 03:16:41PM +0200, Jan Beulich wrote:
>  Zapping leaf data for out of range leaves is just one half of it: To
>  avoid guests (bogusly or worse) inferring information from mere leaf
>  presence, also shrink maximum indicators such that the respective
>  trailing entry is not all blank (unless of course it's the initial
>  subleaf of a leaf that's not the final one).
> 
>  This is also in preparation of bumping the maximum basic leaf we
>  support, to ensure guests not getting exposed related features won't
>  observe a change in behavior.
> 
>  Signed-off-by: Jan Beulich 
>  ---
>  v3: Record the actual non-empty subleaf in p->basic.raw[0x7], rather
>  than subleaf 0. Re-base over Viridian leaf 4005 addition.
>  v2: New.
> 
>  --- a/tools/tests/cpu-policy/test-cpu-policy.c
>  +++ b/tools/tests/cpu-policy/test-cpu-policy.c
>  @@ -8,10 +8,13 @@
>   #include 
>   
>   #include 
>  +#include 
>   #include 
>   #include 
>   #include 
>   
>  +#define XSTATE_FP_SSE  (X86_XCR0_FP | X86_XCR0_SSE)
>  +
>   static unsigned int nr_failures;
>   #define fail(fmt, ...)  \
>   ({  \
>  @@ -553,6 +556,103 @@ static void test_cpuid_out_of_range_clea
>   }
>   }
>   
>  +static void test_cpuid_maximum_leaf_shrinking(void)
>  +{
>  +static const struct test {
>  +const char *name;
>  +struct cpuid_policy p;
>  +} tests[] = {
>  +{
>  +.name = "basic",
>  +.p = {
>  +/* Very basic information only. */
>  +.basic.max_leaf = 1,
>  +.basic.raw_fms = 0xc2,
>  +},
>  +},
>  +{
>  +.name = "cache",
>  +.p = {
>  +/* Cache subleaves present. */
>  +.basic.max_leaf = 4,
>  +.cache.subleaf[0].type = 1,
> >>>
> >>> On a private conversation with Andrew he raised the issue that the
> >>> shrinking might be overly simplistic. For example if the x2APIC
> >>> feature bit in leaf 1 is set then the max leaf should be at least 0xb
> >>> in order to be able to fetch the x2APIC ID, even if it's 0.
> >>
> >> But in such a case the "type" field of leaf 0xb's first sub-leaf is
> >> going to be non-zero, isn't it?
> > 
> > Right, as type 0 is invalid according to Intel SDM, so you will never
> > be able to shrink below 0xb while having x2APIC set.
> > 
> > I still wonder however if there's any other such dependency, where
> > shrinking the max cpuid leaf could force us to drop features exposed
> > in inferior leaves.
> 
> My take is that, just like for the x2APIC case, such leaves won't be
> all blank if the qualifying bit is set. Or if one ends up being all
> blank, a bug likely sits elsewhere.

Reviewed-by: Roger Pau Monné 

I'm not able to spot any more dependencies ATM, so worse case we
discover some of those along the way and add the missing logic if
required, in any case seems like a fine starting point.

Thanks, Roger.

[xen-unstable test] 161296: tolerable FAIL - PUSHED

2021-04-20 Thread osstest service owner

flight 161296 xen-unstable real [real]
flight 161315 xen-unstable real-retest [real]
http://logs.test-lab.xenproject.org/osstest/logs/161296/
http://logs.test-lab.xenproject.org/osstest/logs/161315/

Failures :-/ but no regressions.

Tests which are failing intermittently (not blocking):
 test-amd64-amd64-xl-qemut-debianhvm-i386-xsm 12 debian-hvm-install fail pass 
in 161315-retest

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 161281
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 161281
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 161281
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 161281
 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 161281
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 161281
 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 161281
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 161281
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 161281
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 161281
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 161281
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-arm64-arm64-xl-seattle  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  b5b93627dd1c398a90b832af765b4720fc71814e
baseline version:
 xen  05031fa87357fad155f659cfc2dcce6614834684

Last test of basis   161281  2021-04-19 04:14:43 Z1 days
Testing same since   161296  2021-04-19 17:07:45 Z0 days1 attempts


People who touched revisions under test:
  Andrew Cooper 

jobs:

[PATCH 7/9] arm/time,vtimer: Get rid of READ/WRITE_SYSREG32

2021-04-20 Thread Michal Orzel

AArch64 system registers are 64bit whereas AArch32 ones
are 32bit or 64bit. MSR/MRS are expecting 64bit values thus
we should get rid of helpers READ/WRITE_SYSREG32
in favour of using READ/WRITE_SYSREG.
We should also use register_t type when reading sysregs
which can correspond to uint64_t or uint32_t.
Even though many AArch64 sysregs have upper 32bit reserved
it does not mean that they can't be widen in the future.

Modify type of vtimer structure's member: ctl to register_t.

Signed-off-by: Michal Orzel 
---
 xen/arch/arm/time.c  | 28 ++--
 xen/arch/arm/vtimer.c| 10 +-
 xen/include/asm-arm/domain.h |  2 +-
 3 files changed, 20 insertions(+), 20 deletions(-)

diff --git a/xen/arch/arm/time.c b/xen/arch/arm/time.c
index b0021c2c69..e6c96eb90c 100644
--- a/xen/arch/arm/time.c
+++ b/xen/arch/arm/time.c
@@ -145,7 +145,7 @@ void __init preinit_xen_time(void)
 preinit_acpi_xen_time();
 
 if ( !cpu_khz )
-cpu_khz = READ_SYSREG32(CNTFRQ_EL0) / 1000;
+cpu_khz = (unsigned long)(READ_SYSREG(CNTFRQ_EL0) / 1000);
 
 res = platform_init_time();
 if ( res )
@@ -205,13 +205,13 @@ int reprogram_timer(s_time_t timeout)
 
 if ( timeout == 0 )
 {
-WRITE_SYSREG32(0, CNTHP_CTL_EL2);
+WRITE_SYSREG(0, CNTHP_CTL_EL2);
 return 1;
 }
 
 deadline = ns_to_ticks(timeout) + boot_count;
 WRITE_SYSREG64(deadline, CNTHP_CVAL_EL2);
-WRITE_SYSREG32(CNTx_CTL_ENABLE, CNTHP_CTL_EL2);
+WRITE_SYSREG(CNTx_CTL_ENABLE, CNTHP_CTL_EL2);
 isb();
 
 /* No need to check for timers in the past; the Generic Timer fires
@@ -223,23 +223,23 @@ int reprogram_timer(s_time_t timeout)
 static void timer_interrupt(int irq, void *dev_id, struct cpu_user_regs *regs)
 {
 if ( irq == (timer_irq[TIMER_HYP_PPI]) &&
- READ_SYSREG32(CNTHP_CTL_EL2) & CNTx_CTL_PENDING )
+ READ_SYSREG(CNTHP_CTL_EL2) & CNTx_CTL_PENDING )
 {
 perfc_incr(hyp_timer_irqs);
 /* Signal the generic timer code to do its work */
 raise_softirq(TIMER_SOFTIRQ);
 /* Disable the timer to avoid more interrupts */
-WRITE_SYSREG32(0, CNTHP_CTL_EL2);
+WRITE_SYSREG(0, CNTHP_CTL_EL2);
 }
 
 if ( irq == (timer_irq[TIMER_PHYS_NONSECURE_PPI]) &&
- READ_SYSREG32(CNTP_CTL_EL0) & CNTx_CTL_PENDING )
+ READ_SYSREG(CNTP_CTL_EL0) & CNTx_CTL_PENDING )
 {
 perfc_incr(phys_timer_irqs);
 /* Signal the generic timer code to do its work */
 raise_softirq(TIMER_SOFTIRQ);
 /* Disable the timer to avoid more interrupts */
-WRITE_SYSREG32(0, CNTP_CTL_EL0);
+WRITE_SYSREG(0, CNTP_CTL_EL0);
 }
 }
 
@@ -260,8 +260,8 @@ static void vtimer_interrupt(int irq, void *dev_id, struct 
cpu_user_regs *regs)
 
 perfc_incr(virt_timer_irqs);
 
-current->arch.virt_timer.ctl = READ_SYSREG32(CNTV_CTL_EL0);
-WRITE_SYSREG32(current->arch.virt_timer.ctl | CNTx_CTL_MASK, CNTV_CTL_EL0);
+current->arch.virt_timer.ctl = READ_SYSREG(CNTV_CTL_EL0);
+WRITE_SYSREG(current->arch.virt_timer.ctl | CNTx_CTL_MASK, CNTV_CTL_EL0);
 vgic_inject_irq(current->domain, current, current->arch.virt_timer.irq, 
true);
 }
 
@@ -297,9 +297,9 @@ void init_timer_interrupt(void)
 /* Sensible defaults */
 WRITE_SYSREG64(0, CNTVOFF_EL2); /* No VM-specific offset */
 /* Do not let the VMs program the physical timer, only read the physical 
counter */
-WRITE_SYSREG32(CNTHCTL_EL2_EL1PCTEN, CNTHCTL_EL2);
-WRITE_SYSREG32(0, CNTP_CTL_EL0);/* Physical timer disabled */
-WRITE_SYSREG32(0, CNTHP_CTL_EL2);   /* Hypervisor's timer disabled */
+WRITE_SYSREG(CNTHCTL_EL2_EL1PCTEN, CNTHCTL_EL2);
+WRITE_SYSREG(0, CNTP_CTL_EL0);/* Physical timer disabled */
+WRITE_SYSREG(0, CNTHP_CTL_EL2);   /* Hypervisor's timer disabled */
 isb();
 
 request_irq(timer_irq[TIMER_HYP_PPI], 0, timer_interrupt,
@@ -320,8 +320,8 @@ void init_timer_interrupt(void)
  */
 static void deinit_timer_interrupt(void)
 {
-WRITE_SYSREG32(0, CNTP_CTL_EL0);/* Disable physical timer */
-WRITE_SYSREG32(0, CNTHP_CTL_EL2);   /* Disable hypervisor's timer */
+WRITE_SYSREG(0, CNTP_CTL_EL0);/* Disable physical timer */
+WRITE_SYSREG(0, CNTHP_CTL_EL2);   /* Disable hypervisor's timer */
 isb();
 
 release_irq(timer_irq[TIMER_HYP_PPI], NULL);
diff --git a/xen/arch/arm/vtimer.c b/xen/arch/arm/vtimer.c
index c2b27915c6..167fc6127a 100644
--- a/xen/arch/arm/vtimer.c
+++ b/xen/arch/arm/vtimer.c
@@ -138,8 +138,8 @@ void virt_timer_save(struct vcpu *v)
 {
 ASSERT(!is_idle_vcpu(v));
 
-v->arch.virt_timer.ctl = READ_SYSREG32(CNTV_CTL_EL0);
-WRITE_SYSREG32(v->arch.virt_timer.ctl & ~CNTx_CTL_ENABLE, CNTV_CTL_EL0);
+v->arch.virt_timer.ctl = READ_SYSREG(CNTV_CTL_EL0);
+WRITE_SYSREG(v->arch.virt_timer.ctl & ~CNTx_CTL_ENABLE, CNTV_CTL_EL0);
 v->arch.virt_timer.cval = READ_SYSREG64(CNTV_CVAL_EL0);
 if (

[PATCH 6/9] arm/page: Get rid of READ/WRITE_SYSREG32

2021-04-20 Thread Michal Orzel

AArch64 system registers are 64bit whereas AArch32 ones
are 32bit or 64bit. MSR/MRS are expecting 64bit values thus
we should get rid of helpers READ/WRITE_SYSREG32
in favour of using READ/WRITE_SYSREG.
We should also use register_t type when reading sysregs
which can correspond to uint64_t or uint32_t.
Even though many AArch64 sysregs have upper 32bit reserved
it does not mean that they can't be widen in the future.

Modify accesses to CTR_EL0 to use READ/WRITE_SYSREG.

Signed-off-by: Michal Orzel 
---
 xen/include/asm-arm/page.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/xen/include/asm-arm/page.h b/xen/include/asm-arm/page.h
index 131507a517..c6f9fb0d4e 100644
--- a/xen/include/asm-arm/page.h
+++ b/xen/include/asm-arm/page.h
@@ -137,10 +137,10 @@ extern size_t dcache_line_bytes;
 
 static inline size_t read_dcache_line_bytes(void)
 {
-uint32_t ctr;
+register_t ctr;
 
 /* Read CTR */
-ctr = READ_SYSREG32(CTR_EL0);
+ctr = READ_SYSREG(CTR_EL0);
 
 /* Bits 16-19 are the log2 number of words in the cacheline. */
 return (size_t) (4 << ((ctr >> 16) & 0xf));
-- 
2.29.0

[PATCH 5/9] arm/mm: Get rid of READ/WRITE_SYSREG32

2021-04-20 Thread Michal Orzel

AArch64 system registers are 64bit whereas AArch32 ones
are 32bit or 64bit. MSR/MRS are expecting 64bit values thus
we should get rid of helpers READ/WRITE_SYSREG32
in favour of using READ/WRITE_SYSREG.
We should also use register_t type when reading sysregs
which can correspond to uint64_t or uint32_t.
Even though many AArch64 sysregs have upper 32bit reserved
it does not mean that they can't be widen in the future.

Modify SCTLR_EL2 accesses to use READ/WRITE_SYSREG.

Signed-off-by: Michal Orzel 
---
 xen/arch/arm/mm.c| 2 +-
 xen/arch/arm/traps.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
index 59f8a3f15f..0e07335291 100644
--- a/xen/arch/arm/mm.c
+++ b/xen/arch/arm/mm.c
@@ -613,7 +613,7 @@ void __init remove_early_mappings(void)
  */
 static void xen_pt_enforce_wnx(void)
 {
-WRITE_SYSREG32(READ_SYSREG32(SCTLR_EL2) | SCTLR_Axx_ELx_WXN, SCTLR_EL2);
+WRITE_SYSREG(READ_SYSREG(SCTLR_EL2) | SCTLR_Axx_ELx_WXN, SCTLR_EL2);
 /*
  * The TLBs may cache SCTLR_EL2.WXN. So ensure it is synchronized
  * before flushing the TLBs.
diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
index c7acdb2087..e7384381cc 100644
--- a/xen/arch/arm/traps.c
+++ b/xen/arch/arm/traps.c
@@ -915,7 +915,7 @@ static void _show_registers(const struct cpu_user_regs 
*regs,
 printk(" VTTBR_EL2: %016"PRIx64"\n", ctxt->vttbr_el2);
 printk("\n");
 
-printk(" SCTLR_EL2: %08"PRIx32"\n", READ_SYSREG32(SCTLR_EL2));
+printk(" SCTLR_EL2: %"PRIregister"\n", READ_SYSREG(SCTLR_EL2));
 printk("   HCR_EL2: %"PRIregister"\n", READ_SYSREG(HCR_EL2));
 printk(" TTBR0_EL2: %016"PRIx64"\n", READ_SYSREG64(TTBR0_EL2));
 printk("\n");
-- 
2.29.0

[PATCH 4/9] arm/p2m: Get rid of READ/WRITE_SYSREG32

2021-04-20 Thread Michal Orzel

AArch64 system registers are 64bit whereas AArch32 ones
are 32bit or 64bit. MSR/MRS are expecting 64bit values thus
we should get rid of helpers READ/WRITE_SYSREG32
in favour of using READ/WRITE_SYSREG.
We should also use register_t type when reading sysregs
which can correspond to uint64_t or uint32_t.
Even though many AArch64 sysregs have upper 32bit reserved
it does not mean that they can't be widen in the future.

Modify type of vtcr to register_t.

Signed-off-by: Michal Orzel 
---
 xen/arch/arm/p2m.c   | 8 
 xen/arch/arm/traps.c | 2 +-
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
index ac50312620..d414c4feb9 100644
--- a/xen/arch/arm/p2m.c
+++ b/xen/arch/arm/p2m.c
@@ -1973,11 +1973,11 @@ void __init p2m_restrict_ipa_bits(unsigned int ipa_bits)
 }
 
 /* VTCR value to be configured by all CPUs. Set only once by the boot CPU */
-static uint32_t __read_mostly vtcr;
+static register_t __read_mostly vtcr;
 
 static void setup_virt_paging_one(void *data)
 {
-WRITE_SYSREG32(vtcr, VTCR_EL2);
+WRITE_SYSREG(vtcr, VTCR_EL2);
 
 /*
  * ARM64_WORKAROUND_AT_SPECULATE: We want to keep the TLBs free from
@@ -2000,7 +2000,7 @@ static void setup_virt_paging_one(void *data)
 void __init setup_virt_paging(void)
 {
 /* Setup Stage 2 address translation */
-unsigned long val = VTCR_RES1|VTCR_SH0_IS|VTCR_ORGN0_WBWA|VTCR_IRGN0_WBWA;
+register_t val = VTCR_RES1|VTCR_SH0_IS|VTCR_ORGN0_WBWA|VTCR_IRGN0_WBWA;
 
 #ifdef CONFIG_ARM_32
 if ( p2m_ipa_bits < 40 )
@@ -2089,7 +2089,7 @@ void __init setup_virt_paging(void)
pa_range_info[pa_range].pabits,
( MAX_VMID == MAX_VMID_16_BIT ) ? 16 : 8);
 #endif
-printk("P2M: %d levels with order-%d root, VTCR 0x%lx\n",
+printk("P2M: %d levels with order-%d root, VTCR 0x%"PRIregister"\n",
4 - P2M_ROOT_LEVEL, P2M_ROOT_ORDER, val);
 
 p2m_vmid_allocator_init();
diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
index ccc0827107..c7acdb2087 100644
--- a/xen/arch/arm/traps.c
+++ b/xen/arch/arm/traps.c
@@ -911,7 +911,7 @@ static void _show_registers(const struct cpu_user_regs 
*regs,
 show_registers_32(regs, ctxt, guest_mode, v);
 #endif
 }
-printk("  VTCR_EL2: %08"PRIx32"\n", READ_SYSREG32(VTCR_EL2));
+printk("  VTCR_EL2: %"PRIregister"\n", READ_SYSREG(VTCR_EL2));
 printk(" VTTBR_EL2: %016"PRIx64"\n", ctxt->vttbr_el2);
 printk("\n");
 
-- 
2.29.0

[PATCH 9/9] xen/arm64: Remove READ/WRITE_SYSREG32 helper macros

2021-04-20 Thread Michal Orzel

AArch64 system registers are 64bit whereas AArch32 ones
are 32bit or 64bit. MSR/MRS are expecting 64bit values thus
we should get rid of helpers READ/WRITE_SYSREG32
in favour of using READ/WRITE_SYSREG.
We should also use register_t type when reading sysregs
which can correspond to uint64_t or uint32_t.
Even though many AArch64 sysregs have upper 32bit reserved
it does not mean that they can't be widen in the future.

As there are no other places in the code using READ/WRITE_SYSREG32,
remove the helper macros.

Signed-off-by: Michal Orzel 
---
 xen/arch/arm/vcpreg.c   | 16 
 xen/include/asm-arm/arm64/sysregs.h |  5 -
 2 files changed, 16 insertions(+), 5 deletions(-)

diff --git a/xen/arch/arm/vcpreg.c b/xen/arch/arm/vcpreg.c
index c7f516ee0a..6cb7f3108c 100644
--- a/xen/arch/arm/vcpreg.c
+++ b/xen/arch/arm/vcpreg.c
@@ -48,6 +48,7 @@
  */
 
 /* The name is passed from the upper macro to workaround macro expansion. */
+#ifdef CONFIG_ARM_32
 #define TVM_REG(sz, func, reg...)   \
 static bool func(struct cpu_user_regs *regs, uint##sz##_t *r, bool read)\
 {   \
@@ -61,6 +62,21 @@ static bool func(struct cpu_user_regs *regs, uint##sz##_t 
*r, bool read)\
 \
 return true;\
 }
+#else /* CONFIG_ARM_64 */
+#define TVM_REG(sz, func, reg...)   \
+static bool func(struct cpu_user_regs *regs, uint##sz##_t *r, bool read)\
+{   \
+struct vcpu *v = current;   \
+bool cache_enabled = vcpu_has_cache_enabled(v); \
+\
+GUEST_BUG_ON(read); \
+WRITE_SYSREG(*r, reg);  \
+\
+p2m_toggle_cache(v, cache_enabled); \
+\
+return true;\
+}
+#endif
 
 #define TVM_REG32(regname, xreg) TVM_REG(32, vreg_emulate_##regname, xreg)
 #define TVM_REG64(regname, xreg) TVM_REG(64, vreg_emulate_##regname, xreg)
diff --git a/xen/include/asm-arm/arm64/sysregs.h 
b/xen/include/asm-arm/arm64/sysregs.h
index 077fd95fb7..83a1157ac4 100644
--- a/xen/include/asm-arm/arm64/sysregs.h
+++ b/xen/include/asm-arm/arm64/sysregs.h
@@ -86,11 +86,6 @@
 #endif
 
 /* Access to system registers */
-
-#define READ_SYSREG32(name) ((uint32_t)READ_SYSREG64(name))
-
-#define WRITE_SYSREG32(v, name) WRITE_SYSREG64((uint64_t)v, name)
-
 #define WRITE_SYSREG64(v, name) do {\
 uint64_t _r = v;\
 asm volatile("msr "__stringify(name)", %0" : : "r" (_r));   \
-- 
2.29.0

[PATCH 3/9] arm/gic: Get rid of READ/WRITE_SYSREG32

2021-04-20 Thread Michal Orzel

AArch64 system registers are 64bit whereas AArch32 ones
are 32bit or 64bit. MSR/MRS are expecting 64bit values thus
we should get rid of helpers READ/WRITE_SYSREG32
in favour of using READ/WRITE_SYSREG.
We should also use register_t type when reading sysregs
which can correspond to uint64_t or uint32_t.
Even though many AArch64 sysregs have upper 32bit reserved
it does not mean that they can't be widen in the future.

Modify types of following members of struct gic_v3 to register_t:
-hcr(not used at all in Xen)
-vmcr
-sre_el1
-apr0
-apr1

Signed-off-by: Michal Orzel 
---
 xen/arch/arm/gic-v3-lpi.c |  2 +-
 xen/arch/arm/gic-v3.c | 96 ---
 xen/include/asm-arm/gic.h |  6 +--
 3 files changed, 54 insertions(+), 50 deletions(-)

diff --git a/xen/arch/arm/gic-v3-lpi.c b/xen/arch/arm/gic-v3-lpi.c
index 869bc97fa1..e1594dd20e 100644
--- a/xen/arch/arm/gic-v3-lpi.c
+++ b/xen/arch/arm/gic-v3-lpi.c
@@ -178,7 +178,7 @@ void gicv3_do_LPI(unsigned int lpi)
 irq_enter();
 
 /* EOI the LPI already. */
-WRITE_SYSREG32(lpi, ICC_EOIR1_EL1);
+WRITE_SYSREG(lpi, ICC_EOIR1_EL1);
 
 /* Find out if a guest mapped something to this physical LPI. */
 hlpip = gic_get_host_lpi(lpi);
diff --git a/xen/arch/arm/gic-v3.c b/xen/arch/arm/gic-v3.c
index ac28013c19..0634013a67 100644
--- a/xen/arch/arm/gic-v3.c
+++ b/xen/arch/arm/gic-v3.c
@@ -246,12 +246,12 @@ static void gicv3_ich_write_lr(int lr, uint64_t val)
  */
 static void gicv3_enable_sre(void)
 {
-uint32_t val;
+register_t val;
 
-val = READ_SYSREG32(ICC_SRE_EL2);
+val = READ_SYSREG(ICC_SRE_EL2);
 val |= GICC_SRE_EL2_SRE;
 
-WRITE_SYSREG32(val, ICC_SRE_EL2);
+WRITE_SYSREG(val, ICC_SRE_EL2);
 isb();
 }
 
@@ -315,16 +315,16 @@ static void restore_aprn_regs(const union gic_state_data 
*d)
 switch ( gicv3.nr_priorities )
 {
 case 7:
-WRITE_SYSREG32(d->v3.apr0[2], ICH_AP0R2_EL2);
-WRITE_SYSREG32(d->v3.apr1[2], ICH_AP1R2_EL2);
+WRITE_SYSREG(d->v3.apr0[2], ICH_AP0R2_EL2);
+WRITE_SYSREG(d->v3.apr1[2], ICH_AP1R2_EL2);
 /* Fall through */
 case 6:
-WRITE_SYSREG32(d->v3.apr0[1], ICH_AP0R1_EL2);
-WRITE_SYSREG32(d->v3.apr1[1], ICH_AP1R1_EL2);
+WRITE_SYSREG(d->v3.apr0[1], ICH_AP0R1_EL2);
+WRITE_SYSREG(d->v3.apr1[1], ICH_AP1R1_EL2);
 /* Fall through */
 case 5:
-WRITE_SYSREG32(d->v3.apr0[0], ICH_AP0R0_EL2);
-WRITE_SYSREG32(d->v3.apr1[0], ICH_AP1R0_EL2);
+WRITE_SYSREG(d->v3.apr0[0], ICH_AP0R0_EL2);
+WRITE_SYSREG(d->v3.apr1[0], ICH_AP1R0_EL2);
 break;
 default:
 BUG();
@@ -338,16 +338,16 @@ static void save_aprn_regs(union gic_state_data *d)
 switch ( gicv3.nr_priorities )
 {
 case 7:
-d->v3.apr0[2] = READ_SYSREG32(ICH_AP0R2_EL2);
-d->v3.apr1[2] = READ_SYSREG32(ICH_AP1R2_EL2);
+d->v3.apr0[2] = READ_SYSREG(ICH_AP0R2_EL2);
+d->v3.apr1[2] = READ_SYSREG(ICH_AP1R2_EL2);
 /* Fall through */
 case 6:
-d->v3.apr0[1] = READ_SYSREG32(ICH_AP0R1_EL2);
-d->v3.apr1[1] = READ_SYSREG32(ICH_AP1R1_EL2);
+d->v3.apr0[1] = READ_SYSREG(ICH_AP0R1_EL2);
+d->v3.apr1[1] = READ_SYSREG(ICH_AP1R1_EL2);
 /* Fall through */
 case 5:
-d->v3.apr0[0] = READ_SYSREG32(ICH_AP0R0_EL2);
-d->v3.apr1[0] = READ_SYSREG32(ICH_AP1R0_EL2);
+d->v3.apr0[0] = READ_SYSREG(ICH_AP0R0_EL2);
+d->v3.apr1[0] = READ_SYSREG(ICH_AP1R0_EL2);
 break;
 default:
 BUG();
@@ -371,15 +371,15 @@ static void gicv3_save_state(struct vcpu *v)
 dsb(sy);
 gicv3_save_lrs(v);
 save_aprn_regs(>arch.gic);
-v->arch.gic.v3.vmcr = READ_SYSREG32(ICH_VMCR_EL2);
-v->arch.gic.v3.sre_el1 = READ_SYSREG32(ICC_SRE_EL1);
+v->arch.gic.v3.vmcr = READ_SYSREG(ICH_VMCR_EL2);
+v->arch.gic.v3.sre_el1 = READ_SYSREG(ICC_SRE_EL1);
 }
 
 static void gicv3_restore_state(const struct vcpu *v)
 {
-uint32_t val;
+register_t val;
 
-val = READ_SYSREG32(ICC_SRE_EL2);
+val = READ_SYSREG(ICC_SRE_EL2);
 /*
  * Don't give access to system registers when the guest is using
  * GICv2
@@ -388,7 +388,7 @@ static void gicv3_restore_state(const struct vcpu *v)
 val &= ~GICC_SRE_EL2_ENEL1;
 else
 val |= GICC_SRE_EL2_ENEL1;
-WRITE_SYSREG32(val, ICC_SRE_EL2);
+WRITE_SYSREG(val, ICC_SRE_EL2);
 
 /*
  * VFIQEn is RES1 if ICC_SRE_EL1.SRE is 1. This causes a Group0
@@ -398,9 +398,9 @@ static void gicv3_restore_state(const struct vcpu *v)
  * want before starting to mess with the rest of the GIC, and
  * VMCR_EL1 in particular.
  */
-WRITE_SYSREG32(v->arch.gic.v3.sre_el1, ICC_SRE_EL1);
+WRITE_SYSREG(v->arch.gic.v3.sre_el1, ICC_SRE_EL1);
 isb();
-WRITE_SYSREG32(v->arch.gic.v3.vmcr, ICH_VMCR_EL2);
+WRITE_SYSREG(v->arch.gic.v3.vmcr, ICH_VMCR_EL2);
 restore_aprn_regs(>arch.gic);
 gicv3_restore_lrs(v);

[PATCH 8/9] arm: Change type of hsr to register_t

2021-04-20 Thread Michal Orzel

AArch64 system registers are 64bit whereas AArch32 ones
are 32bit or 64bit. MSR/MRS are expecting 64bit values thus
we should get rid of helpers READ/WRITE_SYSREG32
in favour of using READ/WRITE_SYSREG.
We should also use register_t type when reading sysregs
which can correspond to uint64_t or uint32_t.
Even though many AArch64 sysregs have upper 32bit reserved
it does not mean that they can't be widen in the future.

Modify type of hsr to register_t.
When on AArch64 add 32bit RES0 members to structures inside hsr union.

Signed-off-by: Michal Orzel 
---
 xen/arch/arm/arm64/entry.S|  2 +-
 xen/arch/arm/arm64/traps.c|  2 +-
 xen/arch/arm/arm64/vsysreg.c  |  3 ++-
 xen/arch/arm/traps.c  | 20 +---
 xen/arch/arm/vcpreg.c | 13 +-
 xen/include/asm-arm/arm32/processor.h |  2 +-
 xen/include/asm-arm/arm64/processor.h |  5 ++--
 xen/include/asm-arm/hsr.h | 34 ++-
 8 files changed, 59 insertions(+), 22 deletions(-)

diff --git a/xen/arch/arm/arm64/entry.S b/xen/arch/arm/arm64/entry.S
index ab9a65fc14..218b063c97 100644
--- a/xen/arch/arm/arm64/entry.S
+++ b/xen/arch/arm/arm64/entry.S
@@ -155,7 +155,7 @@
 add x21, sp, #UREGS_CPSR
 mrs x22, spsr_el2
 mrs x23, esr_el2
-stp w22, w23, [x21]
+stp x22, x23, [x21]
 
 .endm
 
diff --git a/xen/arch/arm/arm64/traps.c b/xen/arch/arm/arm64/traps.c
index babfc1d884..9113a15c7a 100644
--- a/xen/arch/arm/arm64/traps.c
+++ b/xen/arch/arm/arm64/traps.c
@@ -36,7 +36,7 @@ void do_bad_mode(struct cpu_user_regs *regs, int reason)
 union hsr hsr = { .bits = regs->hsr };
 
 printk("Bad mode in %s handler detected\n", handler[reason]);
-printk("ESR=0x%08"PRIx32":  EC=%"PRIx32", IL=%"PRIx32", ISS=%"PRIx32"\n",
+printk("ESR=%#"PRIregister":  EC=%"PRIx32", IL=%"PRIx32", ISS=%"PRIx32"\n",
hsr.bits, hsr.ec, hsr.len, hsr.iss);
 
 local_irq_disable();
diff --git a/xen/arch/arm/arm64/vsysreg.c b/xen/arch/arm/arm64/vsysreg.c
index 41f18612c6..3c10d464e7 100644
--- a/xen/arch/arm/arm64/vsysreg.c
+++ b/xen/arch/arm/arm64/vsysreg.c
@@ -368,7 +368,8 @@ void do_sysreg(struct cpu_user_regs *regs,
  sysreg.op2,
  sysreg.read ? "=>" : "<=",
  sysreg.reg, regs->pc);
-gdprintk(XENLOG_ERR, "unhandled 64-bit sysreg access %#x\n",
+gdprintk(XENLOG_ERR, "unhandled 64-bit sysreg access"
+ " %#"PRIregister"\n",
  hsr.bits & HSR_SYSREG_REGS_MASK);
 inject_undef_exception(regs, hsr);
 return;
diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
index e7384381cc..db15a2c647 100644
--- a/xen/arch/arm/traps.c
+++ b/xen/arch/arm/traps.c
@@ -546,7 +546,7 @@ void inject_undef64_exception(struct cpu_user_regs *regs, 
int instr_len)
 PSR_IRQ_MASK | PSR_DBG_MASK;
 regs->pc = handler;
 
-WRITE_SYSREG32(esr.bits, ESR_EL1);
+WRITE_SYSREG(esr.bits, ESR_EL1);
 }
 
 /* Inject an abort exception into a 64 bit guest */
@@ -580,7 +580,7 @@ static void inject_abt64_exception(struct cpu_user_regs 
*regs,
 regs->pc = handler;
 
 WRITE_SYSREG(addr, FAR_EL1);
-WRITE_SYSREG32(esr.bits, ESR_EL1);
+WRITE_SYSREG(esr.bits, ESR_EL1);
 }
 
 static void inject_dabt64_exception(struct cpu_user_regs *regs,
@@ -919,7 +919,7 @@ static void _show_registers(const struct cpu_user_regs 
*regs,
 printk("   HCR_EL2: %"PRIregister"\n", READ_SYSREG(HCR_EL2));
 printk(" TTBR0_EL2: %016"PRIx64"\n", READ_SYSREG64(TTBR0_EL2));
 printk("\n");
-printk("   ESR_EL2: %08"PRIx32"\n", regs->hsr);
+printk("   ESR_EL2: %"PRIregister"\n", regs->hsr);
 printk(" HPFAR_EL2: %"PRIregister"\n", READ_SYSREG(HPFAR_EL2));
 
 #ifdef CONFIG_ARM_32
@@ -2004,13 +2004,15 @@ static void do_trap_stage2_abort_guest(struct 
cpu_user_regs *regs,
 
 break;
 default:
-gprintk(XENLOG_WARNING, "Unsupported FSC: HSR=%#x DFSC=%#x\n",
+gprintk(XENLOG_WARNING, "Unsupported FSC:"
+" HSR=%#"PRIregister" DFSC=%#x\n",
 hsr.bits, xabt.fsc);
 }
 
 inject_abt:
-gdprintk(XENLOG_DEBUG, "HSR=0x%x pc=%#"PRIregister" gva=%#"PRIvaddr
- " gpa=%#"PRIpaddr"\n", hsr.bits, regs->pc, gva, gpa);
+gdprintk(XENLOG_DEBUG, "HSR=%#"PRIregister" pc=%#"PRIregister""
+ " gva=%#"PRIvaddr" gpa=%#"PRIpaddr"\n",
+ hsr.bits, regs->pc, gva, gpa);
 if ( is_data )
 inject_dabt_exception(regs, gva, hsr.len);
 else
@@ -2204,7 +2206,8 @@ void do_trap_guest_sync(struct cpu_user_regs *regs)
 
 default:
 gprintk(XENLOG_WARNING,
-"Unknown Guest Trap. HSR=0x%x EC=0x%x IL=%x 
Syndrome=0x%"PRIx32"\n",
+"Unknown Guest Trap. HSR=%#"PRIregister" EC=0x%x IL=%x"
+" Syndrome=0x%"PRIx32"\n",
 hsr.bits, hsr.ec, hsr.len, hsr.iss);

[PATCH 2/9] arm/domain: Get rid of READ/WRITE_SYSREG32

2021-04-20 Thread Michal Orzel

AArch64 system registers are 64bit whereas AArch32 ones
are 32bit or 64bit. MSR/MRS are expecting 64bit values thus
we should get rid of helpers READ/WRITE_SYSREG32
in favour of using READ/WRITE_SYSREG.
We should also use register_t type when reading sysregs
which can correspond to uint64_t or uint32_t.
Even though many AArch64 sysregs have upper 32bit reserved
it does not mean that they can't be widen in the future.

Modify type of registers: actlr, cntkctl to register_t.

Signed-off-by: Michal Orzel 
---
 xen/arch/arm/domain.c| 20 ++--
 xen/include/asm-arm/domain.h |  4 ++--
 2 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
index bdd3d3e5b5..c021a03c61 100644
--- a/xen/arch/arm/domain.c
+++ b/xen/arch/arm/domain.c
@@ -113,13 +113,13 @@ static void ctxt_switch_from(struct vcpu *p)
 p->arch.tpidr_el1 = READ_SYSREG(TPIDR_EL1);
 
 /* Arch timer */
-p->arch.cntkctl = READ_SYSREG32(CNTKCTL_EL1);
+p->arch.cntkctl = READ_SYSREG(CNTKCTL_EL1);
 virt_timer_save(p);
 
 if ( is_32bit_domain(p->domain) && cpu_has_thumbee )
 {
-p->arch.teecr = READ_SYSREG32(TEECR32_EL1);
-p->arch.teehbr = READ_SYSREG32(TEEHBR32_EL1);
+p->arch.teecr = READ_SYSREG(TEECR32_EL1);
+p->arch.teehbr = READ_SYSREG(TEEHBR32_EL1);
 }
 
 #ifdef CONFIG_ARM_32
@@ -175,7 +175,7 @@ static void ctxt_switch_from(struct vcpu *p)
 
 static void ctxt_switch_to(struct vcpu *n)
 {
-uint32_t vpidr;
+register_t vpidr;
 
 /* When the idle VCPU is running, Xen will always stay in hypervisor
  * mode. Therefore we don't need to restore the context of an idle VCPU.
@@ -183,8 +183,8 @@ static void ctxt_switch_to(struct vcpu *n)
 if ( is_idle_vcpu(n) )
 return;
 
-vpidr = READ_SYSREG32(MIDR_EL1);
-WRITE_SYSREG32(vpidr, VPIDR_EL2);
+vpidr = READ_SYSREG(MIDR_EL1);
+WRITE_SYSREG(vpidr, VPIDR_EL2);
 WRITE_SYSREG(n->arch.vmpidr, VMPIDR_EL2);
 
 /* VGIC */
@@ -257,8 +257,8 @@ static void ctxt_switch_to(struct vcpu *n)
 
 if ( is_32bit_domain(n->domain) && cpu_has_thumbee )
 {
-WRITE_SYSREG32(n->arch.teecr, TEECR32_EL1);
-WRITE_SYSREG32(n->arch.teehbr, TEEHBR32_EL1);
+WRITE_SYSREG(n->arch.teecr, TEECR32_EL1);
+WRITE_SYSREG(n->arch.teehbr, TEEHBR32_EL1);
 }
 
 #ifdef CONFIG_ARM_32
@@ -274,7 +274,7 @@ static void ctxt_switch_to(struct vcpu *n)
 
 /* This is could trigger an hardware interrupt from the virtual
  * timer. The interrupt needs to be injected into the guest. */
-WRITE_SYSREG32(n->arch.cntkctl, CNTKCTL_EL1);
+WRITE_SYSREG(n->arch.cntkctl, CNTKCTL_EL1);
 virt_timer_restore(n);
 }
 
@@ -330,7 +330,7 @@ static void schedule_tail(struct vcpu *prev)
 
 static void continue_new_vcpu(struct vcpu *prev)
 {
-current->arch.actlr = READ_SYSREG32(ACTLR_EL1);
+current->arch.actlr = READ_SYSREG(ACTLR_EL1);
 processor_vcpu_initialise(current);
 
 schedule_tail(prev);
diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h
index 0a74df9931..2d4f38c669 100644
--- a/xen/include/asm-arm/domain.h
+++ b/xen/include/asm-arm/domain.h
@@ -156,7 +156,7 @@ struct arch_vcpu
 
 /* Control Registers */
 register_t sctlr;
-uint32_t actlr;
+register_t actlr;
 uint32_t cpacr;
 
 uint32_t contextidr;
@@ -190,7 +190,7 @@ struct arch_vcpu
 struct vgic_cpu vgic;
 
 /* Timer registers  */
-uint32_t cntkctl;
+register_t cntkctl;
 
 struct vtimer phys_timer;
 struct vtimer virt_timer;
-- 
2.29.0

[PATCH 0/9] xen/arm64: Get rid of READ/WRITE_SYSREG32

2021-04-20 Thread Michal Orzel

The purpose of this patch series is to remove 32bit helper
macros READ/WRITE_SYSREG32 on arm64 as the idea of them is
not following the latest ARMv8 specification and mrs/msr instructions
are expecting 64bit values.
According to ARM DDI 0487G.a all the AArch64 system registers
are 64bit wide even though many of them have upper 32bit reserved.
This does not mean that in the newer versions of ARMv8 or in the next
architecture, some of the sysregs will get widen.
Also when dealing with system registers we should use register_t
type.

This patch series removes the use of READ/WRITE_SYSREG32
and replaces these calls with READ/WRITE_SYSREG. The change was
splited into several small patches to make the review proces easier.

This patch series focuses on removing READ/WRITE_SYSREG32.
There are still some AArch64 system registers defined as 32bit like cpsr
that should be changed later on. They were not changed as they did not appear
inside READ/WRITE_SYSREG32. The next thing to do is to also get rid of
vreg_emulate_sysreg32.

Michal Orzel (9):
  arm64/vfp: Get rid of READ/WRITE_SYSREG32
  arm/domain: Get rid of READ/WRITE_SYSREG32
  arm/gic: Get rid of READ/WRITE_SYSREG32
  arm/p2m: Get rid of READ/WRITE_SYSREG32
  arm/mm: Get rid of READ/WRITE_SYSREG32
  arm/page: Get rid of READ/WRITE_SYSREG32
  arm/time,vtimer: Get rid of READ/WRITE_SYSREG32
  arm: Change type of hsr to register_t
  xen/arm64: Remove READ/WRITE_SYSREG32 helper macros

 xen/arch/arm/arm64/entry.S|  2 +-
 xen/arch/arm/arm64/traps.c|  2 +-
 xen/arch/arm/arm64/vfp.c  | 12 ++--
 xen/arch/arm/arm64/vsysreg.c  |  3 +-
 xen/arch/arm/domain.c | 20 +++---
 xen/arch/arm/gic-v3-lpi.c |  2 +-
 xen/arch/arm/gic-v3.c | 96 ++-
 xen/arch/arm/mm.c |  2 +-
 xen/arch/arm/p2m.c|  8 +--
 xen/arch/arm/time.c   | 28 
 xen/arch/arm/traps.c  | 24 ---
 xen/arch/arm/vcpreg.c | 29 ++--
 xen/arch/arm/vtimer.c | 10 +--
 xen/include/asm-arm/arm32/processor.h |  2 +-
 xen/include/asm-arm/arm64/processor.h |  5 +-
 xen/include/asm-arm/arm64/sysregs.h   |  5 --
 xen/include/asm-arm/arm64/vfp.h   |  6 +-
 xen/include/asm-arm/domain.h  |  6 +-
 xen/include/asm-arm/gic.h |  6 +-
 xen/include/asm-arm/hsr.h | 34 +-
 xen/include/asm-arm/page.h|  4 +-
 21 files changed, 179 insertions(+), 127 deletions(-)

-- 
2.29.0

[PATCH 1/9] arm64/vfp: Get rid of READ/WRITE_SYSREG32

2021-04-20 Thread Michal Orzel

AArch64 system registers are 64bit whereas AArch32 ones
are 32bit or 64bit. MSR/MRS are expecting 64bit values thus
we should get rid of helpers READ/WRITE_SYSREG32
in favour of using READ/WRITE_SYSREG.
We should also use register_t type when reading sysregs
which can correspond to uint64_t or uint32_t.
Even though many AArch64 sysregs have upper 32bit reserved
it does not mean that they can't be widen in the future.

Modify type of FPCR, FPSR, FPEXC32_EL2 to register_t.

Signed-off-by: Michal Orzel 
---
 xen/arch/arm/arm64/vfp.c| 12 ++--
 xen/include/asm-arm/arm64/vfp.h |  6 +++---
 2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/xen/arch/arm/arm64/vfp.c b/xen/arch/arm/arm64/vfp.c
index 999a0d58a5..47885e76ba 100644
--- a/xen/arch/arm/arm64/vfp.c
+++ b/xen/arch/arm/arm64/vfp.c
@@ -26,10 +26,10 @@ void vfp_save_state(struct vcpu *v)
  "stp q30, q31, [%1, #16 * 30]\n\t"
  : "=Q" (*v->arch.vfp.fpregs) : "r" (v->arch.vfp.fpregs));
 
-v->arch.vfp.fpsr = READ_SYSREG32(FPSR);
-v->arch.vfp.fpcr = READ_SYSREG32(FPCR);
+v->arch.vfp.fpsr = READ_SYSREG(FPSR);
+v->arch.vfp.fpcr = READ_SYSREG(FPCR);
 if ( is_32bit_domain(v->domain) )
-v->arch.vfp.fpexc32_el2 = READ_SYSREG32(FPEXC32_EL2);
+v->arch.vfp.fpexc32_el2 = READ_SYSREG(FPEXC32_EL2);
 }
 
 void vfp_restore_state(struct vcpu *v)
@@ -55,8 +55,8 @@ void vfp_restore_state(struct vcpu *v)
  "ldp q30, q31, [%1, #16 * 30]\n\t"
  : : "Q" (*v->arch.vfp.fpregs), "r" (v->arch.vfp.fpregs));
 
-WRITE_SYSREG32(v->arch.vfp.fpsr, FPSR);
-WRITE_SYSREG32(v->arch.vfp.fpcr, FPCR);
+WRITE_SYSREG(v->arch.vfp.fpsr, FPSR);
+WRITE_SYSREG(v->arch.vfp.fpcr, FPCR);
 if ( is_32bit_domain(v->domain) )
-WRITE_SYSREG32(v->arch.vfp.fpexc32_el2, FPEXC32_EL2);
+WRITE_SYSREG(v->arch.vfp.fpexc32_el2, FPEXC32_EL2);
 }
diff --git a/xen/include/asm-arm/arm64/vfp.h b/xen/include/asm-arm/arm64/vfp.h
index 6ab5d36c6c..e6e8c363bc 100644
--- a/xen/include/asm-arm/arm64/vfp.h
+++ b/xen/include/asm-arm/arm64/vfp.h
@@ -7,9 +7,9 @@
 struct vfp_state
 {
 uint64_t fpregs[64] __vfp_aligned;
-uint32_t fpcr;
-uint32_t fpexc32_el2;
-uint32_t fpsr;
+register_t fpcr;
+register_t fpexc32_el2;
+register_t fpsr;
 };
 
 #endif /* _ARM_ARM64_VFP_H */
-- 
2.29.0

Re: [PATCH] xen-mapcache: avoid a race on memory map while using MAP_FIXED

2021-04-20 Thread Paul Durrant


On 20/04/2021 04:35, Igor Druzhinin wrote:

When we're replacing the existing mapping there is possibility of a race
on memory map with other threads doing mmap operations - the address being
unmapped/re-mapped could be occupied by another thread in between.

Linux mmap man page recommends keeping the existing mappings in place to
reserve the place and instead utilize the fact that the next mmap operation
with MAP_FIXED flag passed will implicitly destroy the existing mappings
behind the chosen address. This behavior is guaranteed by POSIX / BSD and
therefore is portable.

Note that it wouldn't make the replacement atomic for parallel accesses to
the replaced region - those might still fail with SIGBUS due to
xenforeignmemory_map not being atomic. So we're still not expecting those.

Tested-by: Anthony PERARD 
Signed-off-by: Igor Druzhinin 


Reviewed-by: Paul Durrant 


---
  hw/i386/xen/xen-mapcache.c | 15 ++-
  1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/hw/i386/xen/xen-mapcache.c b/hw/i386/xen/xen-mapcache.c
index 5b120ed..e82b7dc 100644
--- a/hw/i386/xen/xen-mapcache.c
+++ b/hw/i386/xen/xen-mapcache.c
@@ -171,7 +171,20 @@ static void xen_remap_bucket(MapCacheEntry *entry,
  if (!(entry->flags & XEN_MAPCACHE_ENTRY_DUMMY)) {
  ram_block_notify_remove(entry->vaddr_base, entry->size);
  }
-if (munmap(entry->vaddr_base, entry->size) != 0) {
+
+/*
+ * If an entry is being replaced by another mapping and we're using
+ * MAP_FIXED flag for it - there is possibility of a race for vaddr
+ * address with another thread doing an mmap call itself
+ * (see man 2 mmap). To avoid that we skip explicit unmapping here
+ * and allow the kernel to destroy the previous mappings by replacing
+ * them in mmap call later.
+ *
+ * Non-identical replacements are not allowed therefore.
+ */
+assert(!vaddr || (entry->vaddr_base == vaddr && entry->size == size));
+
+if (!vaddr && munmap(entry->vaddr_base, entry->size) != 0) {
  perror("unmap fails");
  exit(-1);
  }

94 matches

Mail list logo