[xen-4.15-testing test] 161322: tolerable FAIL - PUSHED
flight 161322 xen-4.15-testing real [real] http://logs.test-lab.xenproject.org/osstest/logs/161322/ Failures :-/ but no regressions. Tests which did not succeed, but are not blocking: test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 161049 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 161049 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 161049 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 161049 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 161049 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 161049 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail like 161049 test-armhf-armhf-libvirt 16 saverestore-support-checkfail like 161049 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 161049 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 161049 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 161049 test-amd64-i386-libvirt-xsm 15 migrate-support-checkfail never pass test-amd64-amd64-libvirt 15 migrate-support-checkfail never pass test-arm64-arm64-xl-seattle 15 migrate-support-checkfail never pass test-arm64-arm64-xl-seattle 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail never pass test-amd64-i386-libvirt 15 migrate-support-checkfail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-amd64-i386-xl-pvshim14 guest-start fail never pass test-arm64-arm64-xl-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 16 saverestore-support-checkfail never pass test-arm64-arm64-xl 15 migrate-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-arm64-arm64-xl 16 saverestore-support-checkfail never pass test-armhf-armhf-xl 15 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 15 migrate-support-checkfail never pass test-armhf-armhf-xl 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-arndale 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit2 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit2 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit1 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit1 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 15 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit1 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit1 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-vhd 14 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 15 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail never pass test-armhf-armhf-libvirt 15 migrate-support-checkfail never pass version targeted for testing: xen eb1f325186be9e02c3e89619cd154eb57f202eba baseline version: xen c86d8ec3b816cc1317d9cc2fb0817e59b5bc4cfa Last test of basis 161049 2021-04-12 10:36:30 Z8 days Testing same since 161322 2021-04-20 10:06:36 Z0 days1 attempts People who touched revisions under test: Andrew Cooper Boris Ostrovsky Frédéric Pierret Jan Beulich jobs: build-amd64-xsm pass build-arm64-xsm pass build-i386-xsm pass
[linux-linus test] 161320: regressions - FAIL
flight 161320 linux-linus real [real] http://logs.test-lab.xenproject.org/osstest/logs/161320/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-i386-xl-qemuu-ws16-amd64 7 xen-install fail REGR. vs. 152332 test-amd64-i386-xl-qemuu-debianhvm-amd64-shadow 7 xen-install fail REGR. vs. 152332 test-amd64-i386-xl-xsm7 xen-install fail REGR. vs. 152332 test-amd64-i386-qemut-rhel6hvm-intel 7 xen-install fail REGR. vs. 152332 test-amd64-i386-xl-qemuu-dmrestrict-amd64-dmrestrict 7 xen-install fail REGR. vs. 152332 test-amd64-i386-xl-qemut-debianhvm-amd64 7 xen-install fail REGR. vs. 152332 test-amd64-i386-examine 6 xen-install fail REGR. vs. 152332 test-amd64-i386-libvirt 7 xen-install fail REGR. vs. 152332 test-amd64-i386-xl-qemuu-debianhvm-i386-xsm 7 xen-install fail REGR. vs. 152332 test-amd64-i386-xl-qemuu-debianhvm-amd64 7 xen-install fail REGR. vs. 152332 test-amd64-coresched-i386-xl 7 xen-install fail REGR. vs. 152332 test-amd64-i386-pair 10 xen-install/src_host fail REGR. vs. 152332 test-amd64-i386-pair 11 xen-install/dst_host fail REGR. vs. 152332 test-amd64-i386-qemut-rhel6hvm-amd 7 xen-installfail REGR. vs. 152332 test-amd64-i386-xl-qemut-ws16-amd64 7 xen-install fail REGR. vs. 152332 test-amd64-i386-qemuu-rhel6hvm-amd 7 xen-installfail REGR. vs. 152332 test-amd64-i386-xl7 xen-install fail REGR. vs. 152332 test-amd64-i386-qemuu-rhel6hvm-intel 7 xen-install fail REGR. vs. 152332 test-amd64-i386-xl-pvshim 7 xen-install fail REGR. vs. 152332 test-amd64-i386-freebsd10-amd64 7 xen-install fail REGR. vs. 152332 test-amd64-i386-xl-raw7 xen-install fail REGR. vs. 152332 test-amd64-i386-xl-qemut-debianhvm-i386-xsm 7 xen-install fail REGR. vs. 152332 test-amd64-i386-xl-shadow 7 xen-install fail REGR. vs. 152332 test-amd64-i386-freebsd10-i386 7 xen-installfail REGR. vs. 152332 test-amd64-i386-libvirt-xsm 7 xen-install fail REGR. vs. 152332 test-amd64-i386-xl-qemuu-ovmf-amd64 7 xen-install fail REGR. vs. 152332 test-amd64-i386-xl-qemut-win7-amd64 7 xen-install fail REGR. vs. 152332 test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm 7 xen-install fail REGR. vs. 152332 test-amd64-i386-xl-qemuu-win7-amd64 7 xen-install fail REGR. vs. 152332 test-amd64-i386-libvirt-pair 10 xen-install/src_host fail REGR. vs. 152332 test-amd64-i386-libvirt-pair 11 xen-install/dst_host fail REGR. vs. 152332 test-amd64-amd64-xl-pvshim8 xen-boot fail REGR. vs. 152332 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 7 xen-install fail REGR. vs. 152332 test-amd64-amd64-amd64-pvgrub 20 guest-stop fail REGR. vs. 152332 test-amd64-amd64-i386-pvgrub 20 guest-stop fail REGR. vs. 152332 Tests which did not succeed, but are not blocking: test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 152332 test-armhf-armhf-libvirt 16 saverestore-support-checkfail like 152332 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 152332 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 152332 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 152332 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail like 152332 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 152332 test-arm64-arm64-xl-seattle 15 migrate-support-checkfail never pass test-arm64-arm64-xl-seattle 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt 15 migrate-support-checkfail never pass test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl 15 migrate-support-checkfail never pass test-arm64-arm64-xl 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit1 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit1 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit2 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail never pass test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13
Re: [PATCH] xen/arm: Ensure the vCPU context is seen before clearing the _VPF_down
On Tue, 20 Apr 2021, Julien Grall wrote: > (+ Andrew and Jan) > > Hi Stefano, > > On 20/04/2021 20:38, Stefano Stabellini wrote: > > On Fri, 16 Apr 2021, Julien Grall wrote: > > > > I think your explanation is correct. However, don't we need a smp_rmb() > > > > barrier after reading v->is_initialised in xen/common/domctl.c:do_domctl > > > > ? That would be the barrier that pairs with smp_wmb in regards to > > > > v->is_initialised. > > > > > > There is already a smp_mb() in sync_vcpu_exec_state() which is called from > > > vcpu_pause() -> vcpu_sleep_sync(). > > > > But that's too late, isn't? The v->is_initialized check is done before > > the vcpu_pause() call. We might end up taking the wrong code path: > > > > https://gitlab.com/xen-project/xen/-/blob/staging/xen/common/domctl.c#L587 > > https://gitlab.com/xen-project/xen/-/blob/staging/xen/common/domctl.c#L598 > > I am a bit confused what you mean by "wrong path" here. There is no timing > guarantee with a memory barrier. What the barrier will guarantee you is > v->is_initialized will be read *before* anything after the barrier. > > Are you worried that some variables in vcpu_pause() may be read before > v->is_initialized? > > > > > > I don't think we can ever remove the memory barrier in > > > sync_vcpu_exec_state() > > > because the vCPU paused may have run (or initialized) on a different pCPU. > > > So > > > I would like to rely on the barrier rather than adding an extra one (even > > > thought it is not a fast path). > > > > > > I am thinking to add a comment on top of vcpu_pause() to clarify that > > > after > > > the call, the vCPU context will be observed without extra synchronization > > > required. > > > > Given that an if.. goto is involved, even if both code paths lead to > > good results, I think it would be best to have the smp_rmb() barrier > > above after the first v->is_initialised read to make sure we take the > > right path. > > I don't understand what you are referring by "wrong" and "right" path. The > processor will always execute/commit the path based on the value of > v->is_initialized. What may happen is the processor may start to speculate the > wrong path and then backtrack if it discovers the value is not the expected > one. But, AFAIK, smp_rmb() is not going to protect you against speculation... > smp_rmb() is only going to enforce a memory ordering between read. > > > Instead of having to make sure that even the "wrong" path > > behaves correctly. > Just for clarification, I think you meant writing the following code: > > if ( !v->is_initialized ) > goto getvcpucontext_out; > > smp_rmb(); No, sorry, I'll try to be clearer, see below > ... > > vcpu_pause(); > > AFAICT, there is nothing in the implementation of XEN_DOMCTL_getvcpucontext > that justify the extra barrier (assuming we consider vcpu_pause() as a full > memory barrier). > > From your e-mail, I also could not infer what is your exact concern regarding > the memory ordering. If you have something in mind, then would you mind to > provide a sketch showing what could go wrong? Let me start with what I had in mind: initialized = v->is_initialized; smp_rmb(); if ( !initialized ) goto getvcpucontext_out; Without smp_rmb there are no guarantees that we see an up-to-date value of v->is_initialized, right? This READ of v->is_initialized is supposed to pair with the WRITE in arch_set_info_guest. Relying on the the barrier in vcpu_pause is OK only if is_initialised was already read as "1". I think it might be OK in this case, because probably nothing wrong happens if we read the old value of v->is_initialized as "0" but it doesn't seem to be a good idea. Theoretically it could pass a very long time before we see v->is_initialized as "1" without the smp_rmb. Please let me know if I am missing something.
[xen-unstable-smoke test] 161338: tolerable all pass - PUSHED
flight 161338 xen-unstable-smoke real [real] http://logs.test-lab.xenproject.org/osstest/logs/161338/ Failures :-/ but no regressions. Tests which did not succeed, but are not blocking: test-amd64-amd64-libvirt 15 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 16 saverestore-support-checkfail never pass test-armhf-armhf-xl 15 migrate-support-checkfail never pass test-armhf-armhf-xl 16 saverestore-support-checkfail never pass version targeted for testing: xen aaa3eafb3ba8b544d19ca41cda1477640b22b8fc baseline version: xen 9f6cd4983715cb31f0ea540e6bbb63f799a35d8a Last test of basis 161332 2021-04-20 15:00:28 Z0 days Testing same since 161338 2021-04-20 20:01:32 Z0 days1 attempts People who touched revisions under test: Andrew Cooper Stefano Stabellini Stefano Stabellini Wei Liu jobs: build-arm64-xsm pass build-amd64 pass build-armhf pass build-amd64-libvirt pass test-armhf-armhf-xl pass test-arm64-arm64-xl-xsm pass test-amd64-amd64-xl-qemuu-debianhvm-amd64pass test-amd64-amd64-libvirt pass sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Pushing revision : To xenbits.xen.org:/home/xen/git/xen.git 9f6cd49837..aaa3eafb3b aaa3eafb3ba8b544d19ca41cda1477640b22b8fc -> smoke
[xen-unstable test] 161317: tolerable FAIL - PUSHED
flight 161317 xen-unstable real [real] http://logs.test-lab.xenproject.org/osstest/logs/161317/ Failures :-/ but no regressions. Regressions which are regarded as allowable (not blocking): test-armhf-armhf-xl-rtds18 guest-start/debian.repeat fail REGR. vs. 161296 Tests which did not succeed, but are not blocking: test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 161296 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 161296 test-armhf-armhf-libvirt 16 saverestore-support-checkfail like 161296 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 161296 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 161296 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 161296 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 161296 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail like 161296 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 161296 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 161296 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 161296 test-amd64-i386-xl-pvshim14 guest-start fail never pass test-arm64-arm64-xl-seattle 15 migrate-support-checkfail never pass test-arm64-arm64-xl-seattle 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt 15 migrate-support-checkfail never pass test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail never pass test-amd64-i386-libvirt-xsm 15 migrate-support-checkfail never pass test-amd64-i386-libvirt 15 migrate-support-checkfail never pass test-arm64-arm64-xl 15 migrate-support-checkfail never pass test-arm64-arm64-xl 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit2 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 16 saverestore-support-checkfail never pass test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-arm64-arm64-xl-credit1 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit1 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail never pass test-armhf-armhf-xl 15 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 15 migrate-support-checkfail never pass test-armhf-armhf-xl 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass test-armhf-armhf-libvirt 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit1 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit1 16 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 14 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 15 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit2 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-arndale 15 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 16 saverestore-support-checkfail never pass version targeted for testing: xen a8c532be6a44c7faa54ac777a717f4aa65e3a806 baseline version: xen b5b93627dd1c398a90b832af765b4720fc71814e Last test of basis 161296 2021-04-19 17:07:45 Z1 days Testing same since 161317 2021-04-20 08:13:00 Z0 days1 attempts People who touched revisions under test: Jan Beulich Tamas K Lengyel Tim Deegan jobs: build-amd64-xsm pass build-arm64-xsm
Re: Writing to arbritary cannonical addresses
Thanks again Andrew, ... My initial idea was to allocate a frame on kernel space and change the update_va_mapping to "forcibly" write the desired MFN as the l1 page table and return the va. You can see what I did here: https://github.com/charlesfg/xen/blob/5755f0752bd50891942768bf99898bbc8f7eac82/xen/arch/x86/mm.c#L4539 Basically, I assume the fast path and for the UPDATE_ENTRY https://github.com/charlesfg/xen/blob/5755f0752bd50891942768bf99898bbc8f7eac82/xen/arch/x86/mm.c#L2184 Any help on this would be greatly appreciated :) Atenciosamente, *Charles Ferreira Gonçalves * On Tue, Apr 20, 2021 at 7:05 PM Andrew Cooper wrote: > On 20/04/2021 17:13, Charles Gonçalves wrote: > > Hello Guys, > > > > I'm trying to reproduce old exploit behaviors in a simplistic way: > > create an hypercall to write a buffer to a specific MFN. > > > > At first, I thought that updating an l1 page in a valid VA in guest > > kernel space would do the trick. > > But for addresses outside the Guest-defined use (0x - > > 0x7fff ) is a no go! > > I get a page fault with 'reserved bit in page table' warning message. > > > > Now I'm trying to write to the address inside the hypervisor code, but > > not sure how to do it. > > > > Any comments or tips on this? > > So you're looking to add a hypercall to make arbitrary unaudited changes > to arbitrary memory, to simulate past security issues? > > If you're seeing "Reserved bit in page table" then you've managed to > corrupt a pagetable entry somehow. Xen doesn't write any reserved bits > (which it doesn't know how to interpret). > > I'm afraid that if you want any further help with this specific failure, > you're going to have to post your changes to Xen somewhere. pastebin, > or a pointer to a git branch, or whatever, but my divination skills > aren't great... > > ~Andrew > >
Re: [PATCH] xen/arm: Ensure the vCPU context is seen before clearing the _VPF_down
(+ Andrew and Jan) Hi Stefano, On 20/04/2021 20:38, Stefano Stabellini wrote: On Fri, 16 Apr 2021, Julien Grall wrote: I think your explanation is correct. However, don't we need a smp_rmb() barrier after reading v->is_initialised in xen/common/domctl.c:do_domctl ? That would be the barrier that pairs with smp_wmb in regards to v->is_initialised. There is already a smp_mb() in sync_vcpu_exec_state() which is called from vcpu_pause() -> vcpu_sleep_sync(). But that's too late, isn't? The v->is_initialized check is done before the vcpu_pause() call. We might end up taking the wrong code path: https://gitlab.com/xen-project/xen/-/blob/staging/xen/common/domctl.c#L587 https://gitlab.com/xen-project/xen/-/blob/staging/xen/common/domctl.c#L598 I am a bit confused what you mean by "wrong path" here. There is no timing guarantee with a memory barrier. What the barrier will guarantee you is v->is_initialized will be read *before* anything after the barrier. Are you worried that some variables in vcpu_pause() may be read before v->is_initialized? I don't think we can ever remove the memory barrier in sync_vcpu_exec_state() because the vCPU paused may have run (or initialized) on a different pCPU. So I would like to rely on the barrier rather than adding an extra one (even thought it is not a fast path). I am thinking to add a comment on top of vcpu_pause() to clarify that after the call, the vCPU context will be observed without extra synchronization required. Given that an if.. goto is involved, even if both code paths lead to good results, I think it would be best to have the smp_rmb() barrier above after the first v->is_initialised read to make sure we take the right path. I don't understand what you are referring by "wrong" and "right" path. The processor will always execute/commit the path based on the value of v->is_initialized. What may happen is the processor may start to speculate the wrong path and then backtrack if it discovers the value is not the expected one. But, AFAIK, smp_rmb() is not going to protect you against speculation... smp_rmb() is only going to enforce a memory ordering between read. Instead of having to make sure that even the "wrong" path behaves correctly. Just for clarification, I think you meant writing the following code: if ( !v->is_initialized ) goto getvcpucontext_out; smp_rmb(); ... vcpu_pause(); AFAICT, there is nothing in the implementation of XEN_DOMCTL_getvcpucontext that justify the extra barrier (assuming we consider vcpu_pause() as a full memory barrier). From your e-mail, I also could not infer what is your exact concern regarding the memory ordering. If you have something in mind, then would you mind to provide a sketch showing what could go wrong? Cheers, -- Julien Grall
Re: [PATCH] xen/arm: Ensure the vCPU context is seen before clearing the _VPF_down
On Fri, 16 Apr 2021, Julien Grall wrote: > Hi Stefano, > > On 13/04/2021 23:43, Stefano Stabellini wrote: > > On Sat, 20 Mar 2021, Julien Grall wrote: > > > On 20/03/2021 00:01, Stefano Stabellini wrote: > > > > On Sat, 27 Feb 2021, Julien Grall wrote: > > > > > (+ Dario and George) > > > > > > > > > > Hi Stefano, > > > > > > > > > > I have added Dario and George to get some inputs from the scheduling > > > > > part. > > > > > > > > > > On 27/02/2021 01:58, Stefano Stabellini wrote: > > > > > > On Fri, 26 Feb 2021, Julien Grall wrote: > > > > > > > From: Julien Grall > > > > > > > > > > > > > > A vCPU can get scheduled as soon as _VPF_down is cleared. As there > > > > > > > is > > > > > > > currently not ordering guarantee in arch_set_info_guest(), it may > > > > > > > be > > > > > > > possible that flag can be observed cleared before the new values > > > > > > > of > > > > > > > vCPU > > > > > > > registers are observed. > > > > > > > > > > > > > > Add an smp_mb() before the flag is cleared to prevent re-ordering. > > > > > > > > > > > > > > Signed-off-by: Julien Grall > > > > > > > > > > > > > > --- > > > > > > > > > > > > > > Barriers should work in pair. However, I am not entirely sure > > > > > > > whether > > > > > > > to > > > > > > > put the other half. Maybe at the beginning of context_switch_to()? > > > > > > > > > > > > It should be right after VGCF_online is set or cleared, right? > > > > > > > > > > vcpu_guest_context_t is variable allocated on the heap just for the > > > > > purpose of > > > > > this call. So an ordering with VGFC_online is not going to do > > > > > anything. > > > > > > > > > > > So it > > > > > > would be: > > > > > > > > > > > > xen/arch/arm/domctl.c:arch_get_info_guest > > > > > > xen/arch/arm/vpsci.c:do_common_cpu_on > > > > > > > > > > > > But I think it is impossible that either of them get called at the > > > > > > same > > > > > > time as arch_set_info_guest, which makes me wonder if we actually > > > > > > need > > > > > > the barrier... > > > > > arch_get_info_guest() is called without the domain lock held and I > > > > > can't > > > > > see > > > > > any other lock that could prevent it to be called in // of > > > > > arch_set_info_guest(). > > > > > > > > > > So you could technically get corrupted information from > > > > > XEN_DOMCTL_getvcpucontext. For this case, we would want a smp_wmb() > > > > > before > > > > > writing to v->is_initialised. The corresponding read barrier would be > > > > > in > > > > > vcpu_pause() -> vcpu_sleep_sync() -> sync_vcpu_execstate(). > > > > > > > > > > But this is not the issue I was originally trying to solve. Currently, > > > > > do_common_cpu_on() will roughly do: > > > > > > > > > >1) domain_lock(d) > > > > > > > > > >2) v->arch.sctlr = ... > > > > > v->arch.ttbr0 = ... > > > > > > > > > >3) clear_bit(_VFP_down, >pause_flags); > > > > > > > > > >4) domain_unlock(d) > > > > > > > > > >5) vcpu_wake(v); > > > > > > > > > > If we had only one pCPU on the system, then we would only wake the > > > > > vCPU in > > > > > step 5. We would be fine in this situation. But that's not the > > > > > interesting > > > > > case. > > > > > > > > > > If you add a second pCPU in the story, it may be possible to have > > > > > vcpu_wake() > > > > > happening in // (see more below). As there is no memory barrier, step > > > > > 3 > > > > > may be > > > > > observed before 2. So, assuming the vcpu is runnable, we could start > > > > > to > > > > > schedule a vCPU before any update to the registers (step 2) are > > > > > observed. > > > > > > > > > > This means that when context_switch_to() is called, we may end up to > > > > > restore > > > > > some old values. > > > > > > > > > > Now the question is can vcpu_wake() be called in // from another pCPU? > > > > > AFAICT, > > > > > it would be only called if a given flag in v->pause_flags is cleared > > > > > (e.g. > > > > > _VFP_blocked). But can we rely on that? > > > > > > > > > > Even if we can rely on it, v->pause_flags has other flags in it. I > > > > > couldn't > > > > > rule out that _VPF_down cannot be set at the same time as the other > > > > > _VPF_*. > > > > > > > > > > Therefore, I think a barrier is necessary to ensure the ordering. > > > > > > > > > > Do you agree with this analysis? > > > >Yes, I think this makes sense. The corresponding barrier in the > > > > scheduling code would have to be after reading _VPF_down and before > > > > reading v->arch.sctlr, etc. > > > > > > > > > > > > > > > The issues described here is also quite theoritical because there > > > > > > > are > > > > > > > hundreds of instructions executed between the time a vCPU is seen > > > > > > > runnable and scheduled. But better be safe than sorry :). > > > > > > > --- > > > > > > > xen/arch/arm/domain.c | 7 +++ > > > > > > > 1 file changed, 7 insertions(+) > > > > > > > > > > > > > > diff --git a/xen/arch/arm/domain.c
Re: Writing to arbritary cannonical addresses
On 20/04/2021 17:13, Charles Gonçalves wrote: > Hello Guys, > > I'm trying to reproduce old exploit behaviors in a simplistic way: > create an hypercall to write a buffer to a specific MFN. > > At first, I thought that updating an l1 page in a valid VA in guest > kernel space would do the trick. > But for addresses outside the Guest-defined use (0x - > 0x7fff ) is a no go! > I get a page fault with 'reserved bit in page table' warning message. > > Now I'm trying to write to the address inside the hypervisor code, but > not sure how to do it. > > Any comments or tips on this? So you're looking to add a hypercall to make arbitrary unaudited changes to arbitrary memory, to simulate past security issues? If you're seeing "Reserved bit in page table" then you've managed to corrupt a pagetable entry somehow. Xen doesn't write any reserved bits (which it doesn't know how to interpret). I'm afraid that if you want any further help with this specific failure, you're going to have to post your changes to Xen somewhere. pastebin, or a pointer to a git branch, or whatever, but my divination skills aren't great... ~Andrew
[PATCH v2] x86/shim: Simplify compat handling in write_start_info()
Factor out a compat boolean to remove the lfence overhead from multiple is_pv_32bit_domain() calls. Signed-off-by: Andrew Cooper --- CC: Jan Beulich CC: Roger Pau Monné CC: Wei Liu v2: * Reinstate the conditional for the start info pointer, in case Intel/AMD can't be persuaded to adjust the architectural guarentees for upper bits in compatibility mode transitions. --- xen/arch/x86/pv/shim.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/xen/arch/x86/pv/shim.c b/xen/arch/x86/pv/shim.c index d16c0048c0..4c6f442274 100644 --- a/xen/arch/x86/pv/shim.c +++ b/xen/arch/x86/pv/shim.c @@ -280,12 +280,12 @@ void __init pv_shim_setup_dom(struct domain *d, l4_pgentry_t *l4start, static void write_start_info(struct domain *d) { struct cpu_user_regs *regs = guest_cpu_user_regs(); -start_info_t *si = map_domain_page(_mfn(is_pv_32bit_domain(d) ? regs->edx - : regs->rdx)); +bool compat = is_pv_32bit_domain(d); +start_info_t *si = map_domain_page(_mfn(compat ? regs->edx : regs->rdx)); uint64_t param; snprintf(si->magic, sizeof(si->magic), "xen-3.0-x86_%s", - is_pv_32bit_domain(d) ? "32p" : "64"); + compat ? "32p" : "64"); si->nr_pages = domain_tot_pages(d); si->shared_info = virt_to_maddr(d->shared_info); si->flags = 0; @@ -300,7 +300,7 @@ static void write_start_info(struct domain *d) >console.domU.mfn) ) BUG(); -if ( is_pv_32bit_domain(d) ) +if ( compat ) xlat_start_info(si, XLAT_start_info_console_domU); unmap_domain_page(si); -- 2.11.0
Re: [PATCH] x86/shim: Simplify compat handling in write_start_info()
On 19/04/2021 17:00, Jan Beulich wrote: > On 19.04.2021 17:57, Andrew Cooper wrote: >> On 19/04/2021 16:55, Jan Beulich wrote: >>> On 19.04.2021 16:45, Andrew Cooper wrote: Factor out a compat boolean to remove the lfence overhead from multiple is_pv_32bit_domain() calls. For a compat guest, the upper 32 bits of rdx are zero, so there is no need to have any conditional logic at all when mapping the start info page. >>> Iirc the contents of the upper halves hold unspecified contents after >>> a switch from compat to 64-bit mode. Therefore only with this part of >>> the change dropped ... >> But we're shim, so will never ever mix compat and non-compat guests. > That's not the point: A compat guest will still cause the CPU to > transition back and forth between 64-bit and compat modes. It is > this transitioning which leaves the upper halves of all GPRs in > undefined state (even if in reality a CPU would likely need to go > through extra hoops to prevent them from being zero if they were > written to in compat mode). Hmm. That's awkward. So real behaviour (I've checked with some contacts) is that upper bits are preserved until the next write to the register, after which the upper bits are zeroed. I wonder whether I'll have any luck formally asking AMD and Intel for a tweak to this effect in the manuals. ~Andrew
[xen-unstable-smoke test] 161332: tolerable all pass - PUSHED
flight 161332 xen-unstable-smoke real [real] http://logs.test-lab.xenproject.org/osstest/logs/161332/ Failures :-/ but no regressions. Tests which did not succeed, but are not blocking: test-amd64-amd64-libvirt 15 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 16 saverestore-support-checkfail never pass test-armhf-armhf-xl 15 migrate-support-checkfail never pass test-armhf-armhf-xl 16 saverestore-support-checkfail never pass version targeted for testing: xen 9f6cd4983715cb31f0ea540e6bbb63f799a35d8a baseline version: xen a8c532be6a44c7faa54ac777a717f4aa65e3a806 Last test of basis 161293 2021-04-19 14:00:26 Z1 days Failing since161321 2021-04-20 10:02:00 Z0 days2 attempts Testing same since 161332 2021-04-20 15:00:28 Z0 days1 attempts People who touched revisions under test: Jan Beulich Julien Grall Rahul Singh Roger Pau Monné Stefano Stabellini jobs: build-arm64-xsm pass build-amd64 pass build-armhf pass build-amd64-libvirt pass test-armhf-armhf-xl pass test-arm64-arm64-xl-xsm pass test-amd64-amd64-xl-qemuu-debianhvm-amd64pass test-amd64-amd64-libvirt pass sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Pushing revision : To xenbits.xen.org:/home/xen/git/xen.git a8c532be6a..9f6cd49837 9f6cd4983715cb31f0ea540e6bbb63f799a35d8a -> smoke
[qemu-mainline test] 161308: regressions - FAIL
flight 161308 qemu-mainline real [real] http://logs.test-lab.xenproject.org/osstest/logs/161308/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-amd64-qemuu-freebsd11-amd64 16 guest-saverestore fail REGR. vs. 152631 test-amd64-i386-libvirt 14 guest-start fail REGR. vs. 152631 test-amd64-amd64-libvirt 14 guest-start fail REGR. vs. 152631 test-amd64-i386-libvirt-xsm 14 guest-start fail REGR. vs. 152631 test-amd64-amd64-qemuu-freebsd12-amd64 16 guest-saverestore fail REGR. vs. 152631 test-amd64-amd64-libvirt-xsm 14 guest-start fail REGR. vs. 152631 test-amd64-i386-freebsd10-amd64 16 guest-saverestore fail REGR. vs. 152631 test-amd64-amd64-libvirt-pair 25 guest-start/debian fail REGR. vs. 152631 test-amd64-i386-freebsd10-i386 16 guest-saverestore fail REGR. vs. 152631 test-amd64-i386-libvirt-pair 25 guest-start/debian fail REGR. vs. 152631 test-amd64-i386-xl-qemuu-debianhvm-amd64 15 guest-saverestore fail REGR. vs. 152631 test-amd64-i386-xl-qemuu-debianhvm-amd64-shadow 15 guest-saverestore fail REGR. vs. 152631 test-amd64-amd64-xl-qemuu-debianhvm-i386-xsm 15 guest-saverestore fail REGR. vs. 152631 test-amd64-i386-xl-qemuu-ovmf-amd64 15 guest-saverestore fail REGR. vs. 152631 test-amd64-amd64-xl-qemuu-debianhvm-amd64 15 guest-saverestore fail REGR. vs. 152631 test-amd64-amd64-xl-qemuu-debianhvm-amd64-shadow 15 guest-saverestore fail REGR. vs. 152631 test-amd64-amd64-xl-qemuu-win7-amd64 15 guest-saverestore fail REGR. vs. 152631 test-amd64-amd64-xl-qemuu-ovmf-amd64 15 guest-saverestore fail REGR. vs. 152631 test-arm64-arm64-libvirt-xsm 14 guest-start fail REGR. vs. 152631 test-amd64-i386-xl-qemuu-debianhvm-i386-xsm 15 guest-saverestore fail REGR. vs. 152631 test-amd64-amd64-qemuu-nested-intel 20 debian-hvm-install/l1/l2 fail REGR. vs. 152631 test-armhf-armhf-libvirt 14 guest-start fail REGR. vs. 152631 test-amd64-i386-xl-qemuu-win7-amd64 15 guest-saverestore fail REGR. vs. 152631 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 12 debian-hvm-install fail REGR. vs. 152631 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 12 debian-hvm-install fail REGR. vs. 152631 test-amd64-i386-xl-qemuu-ws16-amd64 15 guest-saverestore fail REGR. vs. 152631 test-amd64-amd64-xl-qemuu-ws16-amd64 15 guest-saverestore fail REGR. vs. 152631 Tests which did not succeed, but are not blocking: test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 152631 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail like 152631 test-amd64-i386-xl-pvshim14 guest-start fail never pass test-arm64-arm64-xl-credit1 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit1 16 saverestore-support-checkfail never pass test-arm64-arm64-xl 15 migrate-support-checkfail never pass test-arm64-arm64-xl 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit2 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-arndale 15 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit1 15 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit1 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-seattle 15 migrate-support-checkfail never pass test-arm64-arm64-xl-seattle 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-vhd 14 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 15 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail never
Re: [PATCH] build: centralize / unify asm-offsets generation
On Tue, Apr 20, 2021 at 05:47:49PM +0200, Jan Beulich wrote: > On 20.04.2021 17:29, Roger Pau Monné wrote: > > On Thu, Apr 01, 2021 at 10:33:47AM +0200, Jan Beulich wrote: > >> @@ -399,7 +399,11 @@ include/xen/compile.h: include/xen/compi > >>@sed -rf tools/process-banner.sed < .banner >> $@.new > >>@mv -f $@.new $@ > >> > >> -include/asm-$(TARGET_ARCH)/asm-offsets.h: > >> arch/$(TARGET_ARCH)/asm-offsets.s > >> +asm-offsets.s: arch/$(TARGET_ARCH)/$(TARGET_SUBARCH)/asm-offsets.c > >> + $(CC) $(filter-out -Wa$(comma)% -flto,$(c_flags)) -S -g0 -o $@.new -MQ > >> $@ $< > >> + $(call move-if-changed,$@.new,$@) > > > > Won't it be more natural to keep the .s file in arch/$(TARGET_ARCH)? > > Yes and no: Yes as far as the actual file location is concerned. > No when considering where it gets generated: I generally consider > it risky to generate files outside of the directory where make > currently runs. There may be good reasons for certain exceptions, > but personally I don't see this case as good enough a reason. > > Somewhat related - if doing as you suggest, which Makefile's > clean: rule should clean up that file in your opinion? The clean rule should be in the makefile where it's generated IMO, same as asm-offsets.h clean rule currently in xen/Makefile. > Nevertheless, if there's general agreement that keeping the file > there is better, I'd make the change and simply ignore my unhappy > feelings about it. I don't have a strong opinion, it just feels weird to have this IMO stray asm-offsets.s outside of it's arch directory, taking into account that we have asm-offsets.h generated from xen/Makefile into an arch specific directory already as a precedent in that makefile. Thanks, Roger.
Re: [PATCH v2] VT-d: Don't assume register-based invalidation is always supported
On Tue, Apr 20, 2021 at 05:38:51PM +0200, Jan Beulich wrote: > On 20.04.2021 17:08, Roger Pau Monné wrote: > > On Thu, Apr 02, 2020 at 04:06:06AM +0800, Chao Gao wrote: > >> --- a/xen/drivers/passthrough/vtd/qinval.c > >> +++ b/xen/drivers/passthrough/vtd/qinval.c > >> @@ -442,6 +442,23 @@ int enable_qinval(struct vtd_iommu *iommu) > >> return 0; > >> } > >> > >> +static int vtd_flush_context_noop(struct vtd_iommu *iommu, uint16_t did, > >> + uint16_t source_id, uint8_t > >> function_mask, > >> + uint64_t type, bool > >> flush_non_present_entry) > >> +{ > >> +dprintk(XENLOG_ERR VTDPREFIX, "IOMMU: Cannot flush CONTEXT.\n"); > >> +return -EIO; > >> +} > >> + > >> +static int vtd_flush_iotlb_noop(struct vtd_iommu *iommu, uint16_t did, > >> +uint64_t addr, unsigned int size_order, > >> +uint64_t type, bool > >> flush_non_present_entry, > >> +bool flush_dev_iotlb) > >> +{ > >> +dprintk(XENLOG_ERR VTDPREFIX, "IOMMU: Cannot flush IOTLB.\n"); > >> +return -EIO; > >> +} > > > > I think I would add an ASSERT_UNREACHABLE() to both noop handlers > > above, as I would expect trying to use them without the proper mode > > being configured would point to an error elsewhere? > > If such an assertion triggered e.g. during S3 suspend/resume, it may > lead to the box simply not doing anything useful, without there being > any way to know what went wrong. If instead the system at least > managed to resume, the log message could be observed. Oh, OK then. I'm simply worried that people might ignore such one line messages, maybe add a WARN? Would it make sense to mark as tainted which could help identify the issue on production builds? Maybe that's too much. Thanks, Roger.
Writing to arbritary cannonical addresses
Hello Guys, I'm trying to reproduce old exploit behaviors in a simplistic way: create an hypercall to write a buffer to a specific MFN. At first, I thought that updating an l1 page in a valid VA in guest kernel space would do the trick. But for addresses outside the Guest-defined use (0x - 0x7fff ) is a no go! I get a page fault with 'reserved bit in page table' warning message. Now I'm trying to write to the address inside the hypervisor code, but not sure how to do it. Any comments or tips on this? Atenciosamente, *Charles Ferreira Gonçalves *
Re: [PATCH v4 3/3] x86/time: avoid reading the platform timer in rendezvous functions
On Thu, Apr 01, 2021 at 11:55:10AM +0200, Jan Beulich wrote: > Reading the platform timer isn't cheap, so we'd better avoid it when the > resulting value is of no interest to anyone. > > The consumer of master_stime, obtained by > time_calibration_{std,tsc}_rendezvous() and propagated through > this_cpu(cpu_calibration), is local_time_calibration(). With > CONSTANT_TSC the latter function uses an early exit path, which doesn't > explicitly use the field. While this_cpu(cpu_calibration) (including the > master_stime field) gets propagated to this_cpu(cpu_time).stamp on that > path, both structures' fields get consumed only by the !CONSTANT_TSC > logic of the function. > > Signed-off-by: Jan Beulich > --- > v4: New. > --- > I realize there's some risk associated with potential new uses of the > field down the road. What would people think about compiling time.c a > 2nd time into a dummy object file, with a conditional enabled to force > assuming CONSTANT_TSC, and with that conditional used to suppress > presence of the field as well as all audited used of it (i.e. in > particular that large part of local_time_calibration())? Unexpected new > users of the field would then cause build time errors. Wouldn't that add quite a lot of churn to the file itself in the form of pre-processor conditionals? Could we instead set master_stime to an invalid value that would make the consumers explode somehow? I know there might be new consumers, but those should be able to figure whether the value is sane by looking at the existing ones. Also, since this is only done on the BSP on the last iteration I wonder if it really makes such a difference performance-wise to warrant all this trouble. Thanks, Roger.
Re: [PATCH v4 2/3] x86/time: yield to hyperthreads after updating TSC during rendezvous
On Thu, Apr 01, 2021 at 11:54:27AM +0200, Jan Beulich wrote: > Since we'd like the updates to be done as synchronously as possible, > make an attempt at yielding immediately after the TSC write. > > Signed-off-by: Jan Beulich Reviewed-by: Roger Pau Monné Did you observe any difference with the pause inserted? I wonder whether that's enough to give a chance the hyperthread to also perform the TSC write. In any case there's no harm from it certainly. Thanks, Roger.
Re: [PATCH] build: centralize / unify asm-offsets generation
On 20.04.2021 17:29, Roger Pau Monné wrote: > On Thu, Apr 01, 2021 at 10:33:47AM +0200, Jan Beulich wrote: >> @@ -399,7 +399,11 @@ include/xen/compile.h: include/xen/compi >> @sed -rf tools/process-banner.sed < .banner >> $@.new >> @mv -f $@.new $@ >> >> -include/asm-$(TARGET_ARCH)/asm-offsets.h: arch/$(TARGET_ARCH)/asm-offsets.s >> +asm-offsets.s: arch/$(TARGET_ARCH)/$(TARGET_SUBARCH)/asm-offsets.c >> +$(CC) $(filter-out -Wa$(comma)% -flto,$(c_flags)) -S -g0 -o $@.new -MQ >> $@ $< >> +$(call move-if-changed,$@.new,$@) > > Won't it be more natural to keep the .s file in arch/$(TARGET_ARCH)? Yes and no: Yes as far as the actual file location is concerned. No when considering where it gets generated: I generally consider it risky to generate files outside of the directory where make currently runs. There may be good reasons for certain exceptions, but personally I don't see this case as good enough a reason. Somewhat related - if doing as you suggest, which Makefile's clean: rule should clean up that file in your opinion? Nevertheless, if there's general agreement that keeping the file there is better, I'd make the change and simply ignore my unhappy feelings about it. Jan
Re: [PATCH v4 1/3] x86/time: latch to-be-written TSC value early in rendezvous loop
On Thu, Apr 01, 2021 at 11:54:05AM +0200, Jan Beulich wrote: > To reduce latency on time_calibration_tsc_rendezvous()'s last loop > iteration, read the value to be written on the last iteration at the end > of the loop body (i.e. in particular at the end of the second to last > iteration). > > On my single-socket 18-core Skylake system this reduces the average loop > exit time on CPU0 (from the TSC write on the last iteration to until > after the main loop) from around 32k cycles to around 29k (albeit the > values measured on separate runs vary quite significantly). > > Signed-off-by: Jan Beulich Reviewed-by: Roger Pau Monné Thanks, Roger.
Re: [PATCH v2] VT-d: Don't assume register-based invalidation is always supported
On 20.04.2021 17:08, Roger Pau Monné wrote: > On Thu, Apr 02, 2020 at 04:06:06AM +0800, Chao Gao wrote: >> --- a/xen/drivers/passthrough/vtd/qinval.c >> +++ b/xen/drivers/passthrough/vtd/qinval.c >> @@ -442,6 +442,23 @@ int enable_qinval(struct vtd_iommu *iommu) >> return 0; >> } >> >> +static int vtd_flush_context_noop(struct vtd_iommu *iommu, uint16_t did, >> + uint16_t source_id, uint8_t function_mask, >> + uint64_t type, bool >> flush_non_present_entry) >> +{ >> +dprintk(XENLOG_ERR VTDPREFIX, "IOMMU: Cannot flush CONTEXT.\n"); >> +return -EIO; >> +} >> + >> +static int vtd_flush_iotlb_noop(struct vtd_iommu *iommu, uint16_t did, >> +uint64_t addr, unsigned int size_order, >> +uint64_t type, bool flush_non_present_entry, >> +bool flush_dev_iotlb) >> +{ >> +dprintk(XENLOG_ERR VTDPREFIX, "IOMMU: Cannot flush IOTLB.\n"); >> +return -EIO; >> +} > > I think I would add an ASSERT_UNREACHABLE() to both noop handlers > above, as I would expect trying to use them without the proper mode > being configured would point to an error elsewhere? If such an assertion triggered e.g. during S3 suspend/resume, it may lead to the box simply not doing anything useful, without there being any way to know what went wrong. If instead the system at least managed to resume, the log message could be observed. Jan
Re: [PATCH v2] xen/pci: Refactor PCI MSI interrupts related code
On 20.04.2021 15:45, Rahul Singh wrote: >> On 19 Apr 2021, at 1:33 pm, Jan Beulich wrote: >> On 19.04.2021 13:54, Julien Grall wrote: >>> For the time being, I think move this code in x86 is a lot better than >>> #ifdef or keep the code in common code. >> >> Well, I would perhaps agree if it ended up being #ifdef CONFIG_X86. >> I would perhaps not agree if there was a new CONFIG_* which other >> (future) arch-es could select if desired. > > I agree with Julien moving the code to x86 file as currently it is referenced > only in x86 code > and as of now we are not sure how other architecture will implement the > Interrupt remapping > (via IOMMU or any other means). > > Let me know if you are ok with moving the code to x86. I can't answer this with "yes" or "no" without knowing what the alternative would be. As said, if the alternative is CONFIG_X86 #ifdef-ary, then yes. If a separate CONFIG_* gets introduced (and selected by X86), then a separate file (getting built only when that new setting is y) would seem better to me. Jan
Re: [PATCH] build: centralize / unify asm-offsets generation
On Thu, Apr 01, 2021 at 10:33:47AM +0200, Jan Beulich wrote: > Except for an additional prereq Arm and x86 have the same needs here, > and Arm can also benefit from the recent x86 side improvement. Recurse > into arch/*/ only for a phony include target (doing nothing on Arm), > and handle asm-offsets itself entirely locally to xen/Makefile. > > Signed-off-by: Jan Beulich > > --- a/.gitignore > +++ b/.gitignore > @@ -318,7 +318,6 @@ > xen/arch/x86/efi/check.efi > xen/arch/x86/efi/mkreloc > xen/arch/*/xen.lds > -xen/arch/*/asm-offsets.s > xen/arch/*/efi/boot.c > xen/arch/*/efi/compat.c > xen/arch/*/efi/ebmalloc.c > @@ -325,6 +324,7 @@ > xen/arch/*/efi/efi.h > xen/arch/*/efi/pe.c > xen/arch/*/efi/runtime.c > +xen/asm-offsets.s > xen/common/config_data.S > xen/common/config.gz > xen/include/headers*.chk > --- a/xen/Makefile > +++ b/xen/Makefile > @@ -341,7 +341,7 @@ _clean: delete-unfresh-files > find . \( -name "*.o" -o -name ".*.d" -o -name ".*.d2" \ > -o -name "*.gcno" -o -name ".*.cmd" \) -exec rm -f {} \; > rm -f include/asm $(TARGET) $(TARGET).gz $(TARGET).efi > $(TARGET).efi.map $(TARGET)-syms $(TARGET)-syms.map *~ core > - rm -f include/asm-*/asm-offsets.h > + rm -f asm-offsets.s include/asm-*/asm-offsets.h > rm -f .banner > > .PHONY: _distclean > @@ -362,7 +362,7 @@ $(TARGET): delete-unfresh-files > done; \ > true > $(MAKE) -f $(BASEDIR)/Rules.mk -C include > - $(MAKE) -f $(BASEDIR)/Rules.mk -C arch/$(TARGET_ARCH) asm-offsets.s > + $(MAKE) -f $(BASEDIR)/Rules.mk -C arch/$(TARGET_ARCH) include > $(MAKE) -f $(BASEDIR)/Rules.mk include/asm-$(TARGET_ARCH)/asm-offsets.h > $(MAKE) -f $(BASEDIR)/Rules.mk -C arch/$(TARGET_ARCH) $@ > > @@ -399,7 +399,11 @@ include/xen/compile.h: include/xen/compi > @sed -rf tools/process-banner.sed < .banner >> $@.new > @mv -f $@.new $@ > > -include/asm-$(TARGET_ARCH)/asm-offsets.h: arch/$(TARGET_ARCH)/asm-offsets.s > +asm-offsets.s: arch/$(TARGET_ARCH)/$(TARGET_SUBARCH)/asm-offsets.c > + $(CC) $(filter-out -Wa$(comma)% -flto,$(c_flags)) -S -g0 -o $@.new -MQ > $@ $< > + $(call move-if-changed,$@.new,$@) Won't it be more natural to keep the .s file in arch/$(TARGET_ARCH)? Thanks, Roger.
Re: [PATCH] tools/xenstored: Remove unused prototype
Hi Julien, On 20 Apr 2021, at 14:49, Julien Grall mailto:jul...@xen.org>> wrote: From: Julien Grall mailto:jgr...@amazon.com>> A prototype for dump_conn() has been present for quite a long time but there are no implementation. Even, AFAICT in the patch that introduced it. So drop it. Signed-off-by: Julien Grall mailto:jgr...@amazon.com>> Agree, no implementation of this exists so: Reviewed-by: Bertrand Marquis mailto:bertrand.marq...@arm.com>> Cheers Bertrand --- tools/xenstore/xenstored_core.c | 1 - 1 file changed, 1 deletion(-) diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c index d54a6042a9f7..591b28e4552f 100644 --- a/tools/xenstore/xenstored_core.c +++ b/tools/xenstore/xenstored_core.c @@ -2085,7 +2085,6 @@ static struct option options[] = { #endif { NULL, 0, NULL, 0 } }; -extern void dump_conn(struct connection *conn); int dom0_domid = 0; int dom0_event = 0; int priv_domid = 0; -- 2.17.1
Re: [PATCH v2] VT-d: Don't assume register-based invalidation is always supported
On Thu, Apr 02, 2020 at 04:06:06AM +0800, Chao Gao wrote: > --- a/xen/drivers/passthrough/vtd/qinval.c > +++ b/xen/drivers/passthrough/vtd/qinval.c > @@ -442,6 +442,23 @@ int enable_qinval(struct vtd_iommu *iommu) > return 0; > } > > +static int vtd_flush_context_noop(struct vtd_iommu *iommu, uint16_t did, > + uint16_t source_id, uint8_t function_mask, > + uint64_t type, bool > flush_non_present_entry) > +{ > +dprintk(XENLOG_ERR VTDPREFIX, "IOMMU: Cannot flush CONTEXT.\n"); > +return -EIO; > +} > + > +static int vtd_flush_iotlb_noop(struct vtd_iommu *iommu, uint16_t did, > +uint64_t addr, unsigned int size_order, > +uint64_t type, bool flush_non_present_entry, > +bool flush_dev_iotlb) > +{ > +dprintk(XENLOG_ERR VTDPREFIX, "IOMMU: Cannot flush IOTLB.\n"); > +return -EIO; > +} I think I would add an ASSERT_UNREACHABLE() to both noop handlers above, as I would expect trying to use them without the proper mode being configured would point to an error elsewhere? Thanks, Roger.
[xen-unstable-smoke test] 161321: regressions - FAIL
flight 161321 xen-unstable-smoke real [real] flight 161328 xen-unstable-smoke real-retest [real] http://logs.test-lab.xenproject.org/osstest/logs/161321/ http://logs.test-lab.xenproject.org/osstest/logs/161328/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-amd64-xl-qemuu-debianhvm-amd64 18 guest-localmigrate/x10 fail REGR. vs. 161293 Tests which did not succeed, but are not blocking: test-amd64-amd64-libvirt 15 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 16 saverestore-support-checkfail never pass test-armhf-armhf-xl 15 migrate-support-checkfail never pass test-armhf-armhf-xl 16 saverestore-support-checkfail never pass version targeted for testing: xen 730d0f6082e66eefae64f35bc62e51fc54d02d55 baseline version: xen a8c532be6a44c7faa54ac777a717f4aa65e3a806 Last test of basis 161293 2021-04-19 14:00:26 Z1 days Testing same since 161321 2021-04-20 10:02:00 Z0 days1 attempts People who touched revisions under test: Jan Beulich Roger Pau Monné jobs: build-arm64-xsm pass build-amd64 pass build-armhf pass build-amd64-libvirt pass test-armhf-armhf-xl pass test-arm64-arm64-xl-xsm pass test-amd64-amd64-xl-qemuu-debianhvm-amd64fail test-amd64-amd64-libvirt pass sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Not pushing. commit 730d0f6082e66eefae64f35bc62e51fc54d02d55 Author: Roger Pau Monné Date: Tue Apr 20 11:36:54 2021 +0200 x86/dpci: remove the dpci EOI timer Current interrupt pass though code will setup a timer for each interrupt injected to the guest that requires an EOI from the guest. Such timer would perform two actions if the guest doesn't EOI the interrupt before a given period of time. The first one is deasserting the virtual line, the second is perform an EOI of the physical interrupt source if it requires such. The deasserting of the guest virtual line is wrong, since it messes with the interrupt status of the guest. This seems to have been done in order to compensate for missing deasserts when certain interrupt controller actions are performed. The original motivation of the introduction of the timer was to fix issues when a GSI was shared between different guests. We believe that other changes in the interrupt handling code (ie: proper propagation of EOI related actions to dpci) will have fixed such errors now. Performing an EOI of the physical interrupt source is redundant, since there's already a timer that takes care of this for all interrupts, not just the HVM dpci ones, see irq_guest_action_t struct eoi_timer field. Since both of the actions performed by the dpci timer are not required, remove it altogether. Signed-off-by: Roger Pau Monné Reviewed-by: Jan Beulich commit 2d494f2198d7909a394085d079475bb099d7afe7 Author: Roger Pau Monné Date: Tue Apr 20 11:36:09 2021 +0200 x86/vpic: issue dpci EOI for cleared pins at ICW1 When pins are cleared from either ISR or IRR as part of the initialization sequence forward the clearing of those pins to the dpci EOI handler, as it is equivalent to an EOI. Not doing so can bring the interrupt controller state out of sync with the dpci handling logic, that expects a notification when a pin has been EOI'ed. Fixes: 7b3cb5e5416 ('IRQ injection changes for HVM PCI passthru.') Signed-off-by: Roger Pau Monné Reviewed-by: Jan Beulich commit 192f7479f21ef63dad8d8acbbda93cce0971fe66 Author: Roger Pau Monné Date: Tue Apr 20 11:35:29 2021 +0200 x86/vpic: don't trigger unmask event until end of init Wait until the end of the init sequence to trigger the unmask event. Note that it will be unconditionally
[PATCH v4 10/12] x86/irq: drop return value from hvm_ioapic_assert
There's no caller anymore that cares about the injected vector, so drop the returned vector from the function. No functional change indented. Signed-off-by: Roger Pau Monné --- Changes since v3: - New in this version. --- xen/arch/x86/hvm/irq.c| 8 ++-- xen/include/asm-x86/hvm/irq.h | 2 +- 2 files changed, 3 insertions(+), 7 deletions(-) diff --git a/xen/arch/x86/hvm/irq.c b/xen/arch/x86/hvm/irq.c index c3d8f2a786a..1c588e212f9 100644 --- a/xen/arch/x86/hvm/irq.c +++ b/xen/arch/x86/hvm/irq.c @@ -47,24 +47,20 @@ static void assert_gsi(struct domain *d, unsigned ioapic_gsi) vioapic_irq_positive_edge(d, ioapic_gsi); } -int hvm_ioapic_assert(struct domain *d, unsigned int gsi, bool level) +void hvm_ioapic_assert(struct domain *d, unsigned int gsi, bool level) { struct hvm_irq *hvm_irq = hvm_domain_irq(d); -int vector; if ( gsi >= hvm_irq->nr_gsis ) { ASSERT_UNREACHABLE(); -return -1; +return; } spin_lock(>arch.hvm.irq_lock); if ( !level || hvm_irq->gsi_assert_count[gsi]++ == 0 ) assert_gsi(d, gsi); -vector = vioapic_get_vector(d, gsi); spin_unlock(>arch.hvm.irq_lock); - -return vector; } void hvm_ioapic_deassert(struct domain *d, unsigned int gsi) diff --git a/xen/include/asm-x86/hvm/irq.h b/xen/include/asm-x86/hvm/irq.h index 4e3534d4eb4..fda2f8e8ebf 100644 --- a/xen/include/asm-x86/hvm/irq.h +++ b/xen/include/asm-x86/hvm/irq.h @@ -226,7 +226,7 @@ int hvm_set_pci_link_route(struct domain *d, u8 link, u8 isa_irq); int hvm_inject_msi(struct domain *d, uint64_t addr, uint32_t data); /* Assert/deassert an IO APIC pin. */ -int hvm_ioapic_assert(struct domain *d, unsigned int gsi, bool level); +void hvm_ioapic_assert(struct domain *d, unsigned int gsi, bool level); void hvm_ioapic_deassert(struct domain *d, unsigned int gsi); void hvm_maybe_deassert_evtchn_irq(void); -- 2.30.1
[PATCH v4 12/12] x86/vpt: introduce a per-vPT lock
Introduce a per virtual timer lock that replaces the existing per-vCPU and per-domain vPT locks. Since virtual timers are no longer assigned or migrated between vCPUs the locking can be simplified to a in-structure spinlock that protects all the fields. This requires introducing a helper to initialize the spinlock, and that could be used to initialize other virtual timer fields in the future. Signed-off-by: Roger Pau Monné Reviewed-by: Jan Beulich --- Changes since v1: - New in his version. --- xen/arch/x86/emul-i8254.c | 1 + xen/arch/x86/hvm/hpet.c| 8 +- xen/arch/x86/hvm/hvm.c | 4 --- xen/arch/x86/hvm/rtc.c | 1 + xen/arch/x86/hvm/vlapic.c | 1 + xen/arch/x86/hvm/vpt.c | 52 ++ xen/include/asm-x86/hvm/vcpu.h | 3 -- xen/include/asm-x86/hvm/vpt.h | 15 ++ 8 files changed, 33 insertions(+), 52 deletions(-) diff --git a/xen/arch/x86/emul-i8254.c b/xen/arch/x86/emul-i8254.c index 73be4188ad4..a47138cbab7 100644 --- a/xen/arch/x86/emul-i8254.c +++ b/xen/arch/x86/emul-i8254.c @@ -484,6 +484,7 @@ void pit_init(struct domain *d, unsigned long cpu_khz) { register_portio_handler(d, PIT_BASE, 4, handle_pit_io); register_portio_handler(d, 0x61, 1, handle_speaker_io); +init_periodic_timer(>pt0); } pit_reset(d); diff --git a/xen/arch/x86/hvm/hpet.c b/xen/arch/x86/hvm/hpet.c index ca94e8b4538..20593c3862d 100644 --- a/xen/arch/x86/hvm/hpet.c +++ b/xen/arch/x86/hvm/hpet.c @@ -739,12 +739,18 @@ static void hpet_set(HPETState *h) void hpet_init(struct domain *d) { +HPETState *h = domain_vhpet(d); +unsigned int i; + if ( !has_vhpet(d) ) return; -hpet_set(domain_vhpet(d)); +hpet_set(h); register_mmio_handler(d, _mmio_ops); d->arch.hvm.params[HVM_PARAM_HPET_ENABLED] = 1; + +for ( i = 0; i < HPET_TIMER_NUM; i++ ) +init_periodic_timer(>pt[i]); } void hpet_deinit(struct domain *d) diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c index ec4ab1f5199..0498c3b02e2 100644 --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -666,8 +666,6 @@ int hvm_domain_initialise(struct domain *d) /* need link to containing domain */ d->arch.hvm.pl_time->domain = d; -rwlock_init(>arch.hvm.pl_time->pt_migrate); - /* Set the default IO Bitmap. */ if ( is_hardware_domain(d) ) { @@ -1557,8 +1555,6 @@ int hvm_vcpu_initialise(struct vcpu *v) hvm_asid_flush_vcpu(v); -spin_lock_init(>arch.hvm.tm_lock); - rc = hvm_vcpu_cacheattr_init(v); /* teardown: vcpu_cacheattr_destroy */ if ( rc != 0 ) goto fail1; diff --git a/xen/arch/x86/hvm/rtc.c b/xen/arch/x86/hvm/rtc.c index b66ca6f64f1..8c8b4ed4018 100644 --- a/xen/arch/x86/hvm/rtc.c +++ b/xen/arch/x86/hvm/rtc.c @@ -821,6 +821,7 @@ void rtc_init(struct domain *d) init_timer(>update_timer, rtc_update_timer, s, smp_processor_id()); init_timer(>update_timer2, rtc_update_timer2, s, smp_processor_id()); init_timer(>alarm_timer, rtc_alarm_cb, s, smp_processor_id()); +init_periodic_timer(>pt); register_portio_handler(d, RTC_PORT(0), 2, handle_rtc_io); diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c index 2af24989dd5..5c60bf66584 100644 --- a/xen/arch/x86/hvm/vlapic.c +++ b/xen/arch/x86/hvm/vlapic.c @@ -1654,6 +1654,7 @@ int vlapic_init(struct vcpu *v) return 0; } +init_periodic_timer(>pt); vlapic->pt.source = PTSRC_lapic; vlapic->regs_page = alloc_domheap_page(v->domain, MEMF_no_owner); diff --git a/xen/arch/x86/hvm/vpt.c b/xen/arch/x86/hvm/vpt.c index 6a8d216c7b5..e4ecb98d3a4 100644 --- a/xen/arch/x86/hvm/vpt.c +++ b/xen/arch/x86/hvm/vpt.c @@ -125,27 +125,6 @@ static int pt_irq_masked(struct periodic_time *pt) return 1; } -/* - * Functions which want to modify a particular periodic_time object - * use pt_{un}lock locking helpers. - * - * These users lock whichever vCPU the periodic_time is attached to, - * but since the vCPU could be modified without holding any lock, they - * need to take an additional lock that protects against pt->vcpu - * changing. - */ -static void pt_lock(struct periodic_time *pt) -{ -read_lock(>vcpu->domain->arch.hvm.pl_time->pt_migrate); -spin_lock(>vcpu->arch.hvm.tm_lock); -} - -static void pt_unlock(struct periodic_time *pt) -{ -spin_unlock(>vcpu->arch.hvm.tm_lock); -read_unlock(>vcpu->domain->arch.hvm.pl_time->pt_migrate); -} - static void pt_process_missed_ticks(struct periodic_time *pt) { s_time_t missed_ticks, now = NOW(); @@ -220,7 +199,7 @@ static void pt_timer_fn(void *data) void *cb_priv; unsigned int irq; -pt_lock(pt); +spin_lock(>lock); v = pt->vcpu; irq = pt->irq; @@ -235,7 +214,7 @@ static void pt_timer_fn(void *data) cb_priv = pt->priv; } -pt_unlock(pt); +spin_unlock(>lock); if ( cb ) cb(v, cb_priv); @@ -247,7
[PATCH v4 09/12] x86/irq: remove unused parameter from hvm_isa_irq_assert
There are no callers anymore passing a get_vector function pointer to hvm_isa_irq_assert, so drop the parameter. No functional change expected. Signed-off-by: Roger Pau Monné --- Changes since v3: - New in this version. --- xen/arch/x86/hvm/dm.c | 2 +- xen/arch/x86/hvm/irq.c| 10 +- xen/arch/x86/hvm/pmtimer.c| 2 +- xen/arch/x86/hvm/rtc.c| 2 +- xen/arch/x86/hvm/vpt.c| 2 +- xen/include/asm-x86/hvm/irq.h | 4 +--- 6 files changed, 6 insertions(+), 16 deletions(-) diff --git a/xen/arch/x86/hvm/dm.c b/xen/arch/x86/hvm/dm.c index b60b9f3364a..c62a259b7fc 100644 --- a/xen/arch/x86/hvm/dm.c +++ b/xen/arch/x86/hvm/dm.c @@ -110,7 +110,7 @@ static int set_isa_irq_level(struct domain *d, uint8_t isa_irq, hvm_isa_irq_deassert(d, isa_irq); break; case 1: -hvm_isa_irq_assert(d, isa_irq, NULL); +hvm_isa_irq_assert(d, isa_irq); break; default: return -EINVAL; diff --git a/xen/arch/x86/hvm/irq.c b/xen/arch/x86/hvm/irq.c index 4825a387bdc..c3d8f2a786a 100644 --- a/xen/arch/x86/hvm/irq.c +++ b/xen/arch/x86/hvm/irq.c @@ -212,13 +212,10 @@ void hvm_gsi_deassert(struct domain *d, unsigned int gsi) spin_unlock(>arch.hvm.irq_lock); } -int hvm_isa_irq_assert(struct domain *d, unsigned int isa_irq, - int (*get_vector)(const struct domain *d, - unsigned int gsi)) +void hvm_isa_irq_assert(struct domain *d, unsigned int isa_irq) { struct hvm_irq *hvm_irq = hvm_domain_irq(d); unsigned int gsi = hvm_isa_irq_to_gsi(isa_irq); -int vector = -1; ASSERT(isa_irq <= 15); @@ -228,12 +225,7 @@ int hvm_isa_irq_assert(struct domain *d, unsigned int isa_irq, (hvm_irq->gsi_assert_count[gsi]++ == 0) ) assert_irq(d, gsi, isa_irq); -if ( get_vector ) -vector = get_vector(d, gsi); - spin_unlock(>arch.hvm.irq_lock); - -return vector; } void hvm_isa_irq_deassert( diff --git a/xen/arch/x86/hvm/pmtimer.c b/xen/arch/x86/hvm/pmtimer.c index 97b9e41712f..9d30b145f60 100644 --- a/xen/arch/x86/hvm/pmtimer.c +++ b/xen/arch/x86/hvm/pmtimer.c @@ -61,7 +61,7 @@ static void pmt_update_sci(PMTState *s) ASSERT(spin_is_locked(>lock)); if ( acpi->pm1a_en & acpi->pm1a_sts & SCI_MASK ) -hvm_isa_irq_assert(s->vcpu->domain, SCI_IRQ, NULL); +hvm_isa_irq_assert(s->vcpu->domain, SCI_IRQ); else hvm_isa_irq_deassert(s->vcpu->domain, SCI_IRQ); } diff --git a/xen/arch/x86/hvm/rtc.c b/xen/arch/x86/hvm/rtc.c index 9992595c45a..b66ca6f64f1 100644 --- a/xen/arch/x86/hvm/rtc.c +++ b/xen/arch/x86/hvm/rtc.c @@ -63,7 +63,7 @@ static void rtc_update_irq(RTCState *s) s->hw.cmos_data[RTC_REG_C] |= RTC_IRQF; hvm_isa_irq_deassert(vrtc_domain(s), RTC_IRQ); -hvm_isa_irq_assert(vrtc_domain(s), RTC_IRQ, NULL); +hvm_isa_irq_assert(vrtc_domain(s), RTC_IRQ); } /* Called by the VPT code after it's injected a PF interrupt for us. diff --git a/xen/arch/x86/hvm/vpt.c b/xen/arch/x86/hvm/vpt.c index 6744b88d20c..639e45c520e 100644 --- a/xen/arch/x86/hvm/vpt.c +++ b/xen/arch/x86/hvm/vpt.c @@ -368,7 +368,7 @@ static bool inject_interrupt(struct periodic_time *pt) case PTSRC_isa: hvm_isa_irq_deassert(d, irq); -hvm_isa_irq_assert(d, irq, NULL); +hvm_isa_irq_assert(d, irq); break; case PTSRC_ioapic: diff --git a/xen/include/asm-x86/hvm/irq.h b/xen/include/asm-x86/hvm/irq.h index 57d51c15863..4e3534d4eb4 100644 --- a/xen/include/asm-x86/hvm/irq.h +++ b/xen/include/asm-x86/hvm/irq.h @@ -214,9 +214,7 @@ void hvm_pci_intx_deassert(struct domain *d, unsigned int device, * allows us to get the interrupt vector in the protection of irq_lock. * For most cases, just set get_vector to NULL. */ -int hvm_isa_irq_assert(struct domain *d, unsigned int isa_irq, - int (*get_vector)(const struct domain *d, - unsigned int gsi)); +void hvm_isa_irq_assert(struct domain *d, unsigned int isa_irq); void hvm_isa_irq_deassert(struct domain *d, unsigned int isa_irq); /* Modify state of GSIs. */ -- 2.30.1
[PATCH v4 08/12] x86/vpt: switch interrupt injection model
Currently vPT relies on timers being assigned to a vCPU and performing checks on every return to HVM guest in order to check if an interrupt from a vPT timer assigned to the vCPU is currently being injected. This model doesn't work properly since the interrupt destination vCPU of a vPT timer can be different from the vCPU where the timer is currently assigned, in which case the timer would get stuck because it never sees the interrupt as being injected. Knowing when a vPT interrupt is injected is relevant for the guest timer modes where missed vPT interrupts are not discarded and instead are accumulated and injected when possible. This change aims to modify the logic described above, so that vPT doesn't need to check on every return to HVM guest if a vPT interrupt is being injected. In order to achieve this the vPT code is modified to make use of the new EOI callbacks, so that virtual timers can detect when a interrupt has been serviced by the guest by waiting for the EOI callback to execute. This model also simplifies some of the logic, as when executing the timer EOI callback Xen can try to inject another interrupt if the timer has interrupts pending for delivery. Note that timers are still bound to a vCPU for the time being, this relation however doesn't limit the interrupt destination anymore, and will be removed by further patches. This model has been tested with Windows 7 guests without showing any timer delay, even when the guest was limited to have very little CPU capacity and pending virtual timer interrupts accumulate. Signed-off-by: Roger Pau Monné --- Changes since v3: - Rename pt_irq_fired to irq_eoi and adjust the logic. - Initialize v and cb_priv in eoi_callback. Changes since v2: - Avoid and explicit != NULL check. - Use a switch in inject_interrupt to evaluate the timer mode. - Print the pt->source field on error in create_periodic_time. Changes since v1: - New in this version. --- xen/arch/x86/hvm/svm/intr.c | 3 - xen/arch/x86/hvm/vmx/intr.c | 59 -- xen/arch/x86/hvm/vpt.c| 352 +++--- xen/include/asm-x86/hvm/vpt.h | 5 +- 4 files changed, 157 insertions(+), 262 deletions(-) diff --git a/xen/arch/x86/hvm/svm/intr.c b/xen/arch/x86/hvm/svm/intr.c index 7f815d23078..2ee2332253b 100644 --- a/xen/arch/x86/hvm/svm/intr.c +++ b/xen/arch/x86/hvm/svm/intr.c @@ -146,8 +146,6 @@ void svm_intr_assist(void) return; /* Crank the handle on interrupt state. */ -pt_update_irq(v); - do { intack = hvm_vcpu_has_pending_irq(v); if ( likely(intack.source == hvm_intsrc_none) ) @@ -219,7 +217,6 @@ void svm_intr_assist(void) { HVMTRACE_2D(INJ_VIRQ, intack.vector, /*fake=*/ 0); svm_inject_extint(v, intack.vector); -pt_intr_post(v, intack); } /* Is there another IRQ to queue up behind this one? */ diff --git a/xen/arch/x86/hvm/vmx/intr.c b/xen/arch/x86/hvm/vmx/intr.c index 80bfbb47878..3fcc7073db2 100644 --- a/xen/arch/x86/hvm/vmx/intr.c +++ b/xen/arch/x86/hvm/vmx/intr.c @@ -203,7 +203,6 @@ static int nvmx_intr_intercept(struct vcpu *v, struct hvm_intack intack) { /* for now, duplicate the ack path in vmx_intr_assist */ hvm_vcpu_ack_pending_irq(v, intack); -pt_intr_post(v, intack); intack = hvm_vcpu_has_pending_irq(v); if ( unlikely(intack.source != hvm_intsrc_none) ) @@ -242,7 +241,6 @@ void vmx_intr_assist(void) struct vcpu *v = current; unsigned int tpr_threshold = 0; enum hvm_intblk intblk; -int pt_vector; /* Block event injection when single step with MTF. */ if ( unlikely(v->arch.hvm.single_step) ) @@ -263,8 +261,6 @@ void vmx_intr_assist(void) #endif /* Crank the handle on interrupt state. */ -pt_vector = pt_update_irq(v); - do { unsigned long intr_info; @@ -337,58 +333,6 @@ void vmx_intr_assist(void) { unsigned long status; - /* -* intack.vector is the highest priority vector. So we set eoi_exit_bitmap -* for intack.vector - give a chance to post periodic time interrupts when -* periodic time interrupts become the highest one -*/ -if ( pt_vector != -1 ) -{ -#ifndef NDEBUG -/* - * We assert that intack.vector is the highest priority vector for - * only an interrupt from vlapic can reach this point and the - * highest vector is chosen in hvm_vcpu_has_pending_irq(). - * But, in fact, the assertion failed sometimes. It is suspected - * that PIR is not synced to vIRR which makes pt_vector is left in - * PIR. In order to verify this suspicion, dump some information - * when the assertion fails. - */ -if ( unlikely(intack.vector < pt_vector) ) -{ -const struct vlapic *vlapic; -
[PATCH v4 11/12] x86/vpt: remove vPT timers per-vCPU lists
No longer add vPT timers to lists on specific vCPUs, since there's no need anymore to check if timer interrupts have been injected on return to HVM guest. Such change allows to get rid of virtual timers vCPU migration, and also cleanup some of the virtual timers fields that are no longer required. The model is also slightly different now in that timers are not stopped when a vCPU is de-scheduled. Such timers will continue running, and when triggered the function will try to inject the corresponding interrupt to the guest (which might be different than the currently running one). Note that the timer triggering when the guest is no longer running can only happen once, as the timer callback will not reset the interrupt to fire again. Such resetting if required will be done by the EOI callback. Signed-off-by: Roger Pau Monné Reviewed-by: Jan Beulich --- Changes since v3: - Remove stale commit log paragrpah. Changes since v2: - Remove pt_{save/restore}_timer and instead use pt_{freeze/thaw}_time. - Remove the introduction of the 'masked' field, it's not needed. - Rework pt_active to use timer_is_active. Changes since v1: - New in this version. --- xen/arch/x86/domain.c | 4 +- xen/arch/x86/hvm/hvm.c | 4 +- xen/arch/x86/hvm/vlapic.c | 1 - xen/arch/x86/hvm/vpt.c | 192 - xen/include/asm-x86/hvm/vcpu.h | 3 +- xen/include/asm-x86/hvm/vpt.h | 12 +-- 6 files changed, 25 insertions(+), 191 deletions(-) diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c index 4dc27f798e7..b6cd715dce9 100644 --- a/xen/arch/x86/domain.c +++ b/xen/arch/x86/domain.c @@ -2039,8 +2039,8 @@ void context_switch(struct vcpu *prev, struct vcpu *next) vpmu_switch_from(prev); np2m_schedule(NP2M_SCHEDLE_OUT); -if ( is_hvm_domain(prevd) && !list_empty(>arch.hvm.tm_list) ) -pt_save_timer(prev); +if ( is_hvm_domain(prevd) ) +pt_freeze_time(prev); local_irq_disable(); diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c index 2c4dd1b86f2..ec4ab1f5199 100644 --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -489,7 +489,6 @@ void hvm_set_info_guest(struct vcpu *v) void hvm_migrate_timers(struct vcpu *v) { rtc_migrate_timers(v); -pt_migrate(v); } void hvm_migrate_pirq(struct hvm_pirq_dpci *pirq_dpci, const struct vcpu *v) @@ -544,7 +543,7 @@ void hvm_do_resume(struct vcpu *v) { check_wakeup_from_wait(); -pt_restore_timer(v); +pt_thaw_time(v); if ( !vcpu_ioreq_handle_completion(v) ) return; @@ -1559,7 +1558,6 @@ int hvm_vcpu_initialise(struct vcpu *v) hvm_asid_flush_vcpu(v); spin_lock_init(>arch.hvm.tm_lock); -INIT_LIST_HEAD(>arch.hvm.tm_list); rc = hvm_vcpu_cacheattr_init(v); /* teardown: vcpu_cacheattr_destroy */ if ( rc != 0 ) diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c index 8f3a0a2e8f7..2af24989dd5 100644 --- a/xen/arch/x86/hvm/vlapic.c +++ b/xen/arch/x86/hvm/vlapic.c @@ -1342,7 +1342,6 @@ void vlapic_adjust_i8259_target(struct domain *d) if ( d->arch.hvm.i8259_target == v ) return; d->arch.hvm.i8259_target = v; -pt_adjust_global_vcpu_target(v); } int vlapic_has_pending_irq(struct vcpu *v) diff --git a/xen/arch/x86/hvm/vpt.c b/xen/arch/x86/hvm/vpt.c index 639e45c520e..6a8d216c7b5 100644 --- a/xen/arch/x86/hvm/vpt.c +++ b/xen/arch/x86/hvm/vpt.c @@ -125,24 +125,6 @@ static int pt_irq_masked(struct periodic_time *pt) return 1; } -/* - * Functions which read (maybe write) all periodic_time instances - * attached to a particular vCPU use pt_vcpu_{un}lock locking helpers. - * - * Such users are explicitly forbidden from changing the value of the - * pt->vcpu field, because another thread holding the pt_migrate lock - * may already be spinning waiting for your vcpu lock. - */ -static void pt_vcpu_lock(struct vcpu *v) -{ -spin_lock(>arch.hvm.tm_lock); -} - -static void pt_vcpu_unlock(struct vcpu *v) -{ -spin_unlock(>arch.hvm.tm_lock); -} - /* * Functions which want to modify a particular periodic_time object * use pt_{un}lock locking helpers. @@ -176,14 +158,12 @@ static void pt_process_missed_ticks(struct periodic_time *pt) return; missed_ticks = missed_ticks / (s_time_t) pt->period + 1; -if ( mode_is(pt->vcpu->domain, no_missed_ticks_pending) ) -pt->do_not_freeze = !pt->pending_intr_nr; -else +if ( !mode_is(pt->vcpu->domain, no_missed_ticks_pending) ) pt->pending_intr_nr += missed_ticks; pt->scheduled += missed_ticks * pt->period; } -static void pt_freeze_time(struct vcpu *v) +void pt_freeze_time(struct vcpu *v) { if ( !mode_is(v->domain, delay_for_missed_ticks) ) return; @@ -191,7 +171,7 @@ static void pt_freeze_time(struct vcpu *v) v->arch.hvm.guest_time = hvm_get_guest_time(v); } -static void pt_thaw_time(struct vcpu *v) +void pt_thaw_time(struct vcpu *v) { if (
[PATCH v4 07/12] x86/dpci: switch to use a GSI EOI callback
Switch the dpci GSI EOI callback hooks to use the newly introduced generic callback functionality, and remove the custom dpci calls found on the vPIC and vIO-APIC implementations. Signed-off-by: Roger Pau Monné --- Changes since v3: - Print a warning message if the EOI callback cannot be unregistered. Changes since v2: - Avoid leaking the allocated callback on error paths of pt_irq_create_bind. Changes since v1: - New in this version. --- xen/arch/x86/hvm/vioapic.c| 8 - xen/arch/x86/hvm/vpic.c | 4 --- xen/drivers/passthrough/x86/hvm.c | 57 --- xen/include/asm-x86/hvm/io.h | 1 - xen/include/asm-x86/hvm/irq.h | 1 + 5 files changed, 54 insertions(+), 17 deletions(-) diff --git a/xen/arch/x86/hvm/vioapic.c b/xen/arch/x86/hvm/vioapic.c index 0824ede91ab..4f844965423 100644 --- a/xen/arch/x86/hvm/vioapic.c +++ b/xen/arch/x86/hvm/vioapic.c @@ -284,7 +284,6 @@ static void vioapic_write_redirent( */ ASSERT(prev_level); ASSERT(!top_word); -hvm_dpci_eoi(d, gsi); hvm_gsi_execute_callbacks(d, gsi); } @@ -418,13 +417,6 @@ static void eoi_callback(struct vcpu *v, unsigned int vector, void *data) ent->fields.remote_irr = 0; -if ( is_iommu_enabled(d) ) -{ -spin_unlock(>arch.hvm.irq_lock); -hvm_dpci_eoi(d, gsi); -spin_lock(>arch.hvm.irq_lock); -} - /* * Callbacks don't expect to be executed with any lock held, so * drop the lock that protects the vIO-APIC fields from changing. diff --git a/xen/arch/x86/hvm/vpic.c b/xen/arch/x86/hvm/vpic.c index ef668f3668a..60d6740f69a 100644 --- a/xen/arch/x86/hvm/vpic.c +++ b/xen/arch/x86/hvm/vpic.c @@ -237,8 +237,6 @@ static void vpic_ioport_write( ASSERT(pin < 8); hvm_gsi_execute_callbacks(current->domain, hvm_isa_irq_to_gsi((addr >> 7) ? (pin | 8) : pin)); -hvm_dpci_eoi(current->domain, - hvm_isa_irq_to_gsi((addr >> 7) ? (pin | 8) : pin)); __clear_bit(pin, ); } return; @@ -289,8 +287,6 @@ static void vpic_ioport_write( vpic_unlock(vpic); hvm_gsi_execute_callbacks(current->domain, hvm_isa_irq_to_gsi((addr >> 7) ? (pin | 8) : pin)); -hvm_dpci_eoi(current->domain, - hvm_isa_irq_to_gsi((addr >> 7) ? (pin | 8) : pin)); return; /* bail immediately */ case 6: /* Set Priority*/ vpic->priority_add = (val + 1) & 7; diff --git a/xen/drivers/passthrough/x86/hvm.c b/xen/drivers/passthrough/x86/hvm.c index 0db26e5dbb2..02e027cff8c 100644 --- a/xen/drivers/passthrough/x86/hvm.c +++ b/xen/drivers/passthrough/x86/hvm.c @@ -252,7 +252,7 @@ static void hvm_gsi_eoi(struct domain *d, unsigned int gsi) hvm_pirq_eoi(pirq); } -void hvm_dpci_eoi(struct domain *d, unsigned int guest_gsi) +static void dpci_eoi(struct domain *d, unsigned int guest_gsi, void *data) { const struct hvm_irq_dpci *hvm_irq_dpci; const struct hvm_girq_dpci_mapping *girq; @@ -476,6 +476,7 @@ int pt_irq_create_bind( { struct dev_intx_gsi_link *digl = NULL; struct hvm_girq_dpci_mapping *girq = NULL; +struct hvm_gsi_eoi_callback *cb = NULL; unsigned int guest_gsi; /* @@ -489,7 +490,7 @@ int pt_irq_create_bind( unsigned int link; digl = xmalloc(struct dev_intx_gsi_link); -girq = xmalloc(struct hvm_girq_dpci_mapping); +girq = xzalloc(struct hvm_girq_dpci_mapping); if ( !digl || !girq ) { @@ -502,11 +503,22 @@ int pt_irq_create_bind( girq->bus = digl->bus = pt_irq_bind->u.pci.bus; girq->device = digl->device = pt_irq_bind->u.pci.device; girq->intx = digl->intx = pt_irq_bind->u.pci.intx; -list_add_tail(>list, _dpci->digl_list); +girq->cb.callback = dpci_eoi; guest_gsi = hvm_pci_intx_gsi(digl->device, digl->intx); link = hvm_pci_intx_link(digl->device, digl->intx); +rc = hvm_gsi_register_callback(d, guest_gsi, >cb); +if ( rc ) +{ +spin_unlock(>event_lock); +xfree(girq); +xfree(digl); +return rc; +} + +list_add_tail(>list, _dpci->digl_list); + hvm_irq_dpci->link_cnt[link]++; girq->machine_gsi = pirq; @@ -514,17 +526,43 @@ int pt_irq_create_bind( } else { +/* + * NB: the callback structure allocated below will never be freed + * once setup because it's used by the hardware domain and will +
[PATCH v4 06/12] x86/dpci: move code
This is code movement in order to simply further changes. No functional change intended. Signed-off-by: Roger Pau Monné Acked-by: Jan Beulich --- Changes since v2: - Drop one of the leading underscores from __hvm_dpci_eoi. Changes since v1: - New in this version. --- xen/drivers/passthrough/x86/hvm.c | 162 +++--- 1 file changed, 81 insertions(+), 81 deletions(-) diff --git a/xen/drivers/passthrough/x86/hvm.c b/xen/drivers/passthrough/x86/hvm.c index 8f78c0935b9..0db26e5dbb2 100644 --- a/xen/drivers/passthrough/x86/hvm.c +++ b/xen/drivers/passthrough/x86/hvm.c @@ -205,6 +205,87 @@ static struct vcpu *vector_hashing_dest(const struct domain *d, return dest; } +static void hvm_pirq_eoi(struct pirq *pirq) +{ +struct hvm_pirq_dpci *pirq_dpci; + +if ( !pirq ) +{ +ASSERT_UNREACHABLE(); +return; +} + +pirq_dpci = pirq_dpci(pirq); + +/* + * No need to get vector lock for timer + * since interrupt is still not EOIed + */ +if ( --pirq_dpci->pending || + /* When the interrupt source is MSI no Ack should be performed. */ + (pirq_dpci->flags & HVM_IRQ_DPCI_TRANSLATE) ) +return; + +pirq_guest_eoi(pirq); +} + +static void __hvm_dpci_eoi(struct domain *d, + const struct hvm_girq_dpci_mapping *girq) +{ +struct pirq *pirq = pirq_info(d, girq->machine_gsi); + +if ( !hvm_domain_use_pirq(d, pirq) ) +hvm_pci_intx_deassert(d, girq->device, girq->intx); + +hvm_pirq_eoi(pirq); +} + +static void hvm_gsi_eoi(struct domain *d, unsigned int gsi) +{ +struct pirq *pirq = pirq_info(d, gsi); + +/* Check if GSI is actually mapped. */ +if ( !pirq_dpci(pirq) ) +return; + +hvm_gsi_deassert(d, gsi); +hvm_pirq_eoi(pirq); +} + +void hvm_dpci_eoi(struct domain *d, unsigned int guest_gsi) +{ +const struct hvm_irq_dpci *hvm_irq_dpci; +const struct hvm_girq_dpci_mapping *girq; + +if ( !is_iommu_enabled(d) ) +return; + +if ( is_hardware_domain(d) ) +{ +spin_lock(>event_lock); +hvm_gsi_eoi(d, guest_gsi); +goto unlock; +} + +if ( guest_gsi < NR_ISAIRQS ) +{ +hvm_dpci_isairq_eoi(d, guest_gsi); +return; +} + +spin_lock(>event_lock); +hvm_irq_dpci = domain_get_irq_dpci(d); + +if ( !hvm_irq_dpci ) +goto unlock; + +list_for_each_entry ( girq, _irq_dpci->girq[guest_gsi], list ) +__hvm_dpci_eoi(d, girq); + +unlock: +spin_unlock(>event_lock); +} + int pt_irq_create_bind( struct domain *d, const struct xen_domctl_bind_pt_irq *pt_irq_bind) { @@ -860,87 +941,6 @@ static void hvm_dirq_assist(struct domain *d, struct hvm_pirq_dpci *pirq_dpci) spin_unlock(>event_lock); } -static void hvm_pirq_eoi(struct pirq *pirq) -{ -struct hvm_pirq_dpci *pirq_dpci; - -if ( !pirq ) -{ -ASSERT_UNREACHABLE(); -return; -} - -pirq_dpci = pirq_dpci(pirq); - -/* - * No need to get vector lock for timer - * since interrupt is still not EOIed - */ -if ( --pirq_dpci->pending || - /* When the interrupt source is MSI no Ack should be performed. */ - (pirq_dpci->flags & HVM_IRQ_DPCI_TRANSLATE) ) -return; - -pirq_guest_eoi(pirq); -} - -static void __hvm_dpci_eoi(struct domain *d, - const struct hvm_girq_dpci_mapping *girq) -{ -struct pirq *pirq = pirq_info(d, girq->machine_gsi); - -if ( !hvm_domain_use_pirq(d, pirq) ) -hvm_pci_intx_deassert(d, girq->device, girq->intx); - -hvm_pirq_eoi(pirq); -} - -static void hvm_gsi_eoi(struct domain *d, unsigned int gsi) -{ -struct pirq *pirq = pirq_info(d, gsi); - -/* Check if GSI is actually mapped. */ -if ( !pirq_dpci(pirq) ) -return; - -hvm_gsi_deassert(d, gsi); -hvm_pirq_eoi(pirq); -} - -void hvm_dpci_eoi(struct domain *d, unsigned int guest_gsi) -{ -const struct hvm_irq_dpci *hvm_irq_dpci; -const struct hvm_girq_dpci_mapping *girq; - -if ( !is_iommu_enabled(d) ) -return; - -if ( is_hardware_domain(d) ) -{ -spin_lock(>event_lock); -hvm_gsi_eoi(d, guest_gsi); -goto unlock; -} - -if ( guest_gsi < NR_ISAIRQS ) -{ -hvm_dpci_isairq_eoi(d, guest_gsi); -return; -} - -spin_lock(>event_lock); -hvm_irq_dpci = domain_get_irq_dpci(d); - -if ( !hvm_irq_dpci ) -goto unlock; - -list_for_each_entry ( girq, _irq_dpci->girq[guest_gsi], list ) -__hvm_dpci_eoi(d, girq); - -unlock: -spin_unlock(>event_lock); -} - static int pci_clean_dpci_irq(struct domain *d, struct hvm_pirq_dpci *pirq_dpci, void *arg) { -- 2.30.1
[PATCH v4 05/12] x86/hvm: allowing registering EOI callbacks for GSIs
Such callbacks will be executed once a EOI is performed by the guest, regardless of whether the interrupts are injected from the vIO-APIC or the vPIC, as ISA IRQs are translated to GSIs and then the corresponding callback is executed at EOI. The vIO-APIC infrastructure for handling EOIs is build on top of the existing vlapic EOI callback functionality, while the vPIC one is handled when writing to the vPIC EOI register. Note that such callbacks need to be registered and de-registered, and that a single GSI can have multiple callbacks associated. That's because GSIs can be level triggered and shared, as that's the case with legacy PCI interrupts shared between several devices. Strictly speaking this is a non-functional change, since there are no users of this new interface introduced by this change. Signed-off-by: Roger Pau Monné --- Changes since v3: - Make callback take a domain parameter. - Return whether the unregistered callback was found. - Add a comment regarding the result of hvm_gsi_has_callbacks being stable. Changes since v2: - Latch hvm_domain_irq in some functions. - Make domain parameter of hvm_gsi_has_callbacks const. - Add comment about dropping the lock around the hvm_gsi_execute_callbacks call. - Drop change to ioapic_load. Changes since v1: - New in this version. --- xen/arch/x86/hvm/hvm.c| 15 ++- xen/arch/x86/hvm/irq.c| 75 +++ xen/arch/x86/hvm/vioapic.c| 29 +++--- xen/arch/x86/hvm/vpic.c | 4 ++ xen/include/asm-x86/hvm/irq.h | 21 ++ 5 files changed, 137 insertions(+), 7 deletions(-) diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c index ae37bc434ae..2c4dd1b86f2 100644 --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -608,7 +608,7 @@ static int hvm_print_line( int hvm_domain_initialise(struct domain *d) { -unsigned int nr_gsis; +unsigned int nr_gsis, i; int rc; if ( !hvm_enabled ) @@ -656,6 +656,14 @@ int hvm_domain_initialise(struct domain *d) BUILD_BUG_ON(NR_HVM_DOMU_IRQS < NR_ISAIRQS); ASSERT(hvm_domain_irq(d)->nr_gsis >= NR_ISAIRQS); +/* Initialize the EOI callback list. */ +hvm_domain_irq(d)->gsi_callbacks = xmalloc_array(struct list_head, nr_gsis); +if ( !hvm_domain_irq(d)->gsi_callbacks ) +goto fail1; +rwlock_init(_domain_irq(d)->gsi_callbacks_lock); +for ( i = 0; i < nr_gsis; i++ ) +INIT_LIST_HEAD(_domain_irq(d)->gsi_callbacks[i]); + /* need link to containing domain */ d->arch.hvm.pl_time->domain = d; @@ -715,6 +723,8 @@ int hvm_domain_initialise(struct domain *d) fail1: if ( is_hardware_domain(d) ) xfree(d->arch.hvm.io_bitmap); +if ( hvm_domain_irq(d) ) +XFREE(hvm_domain_irq(d)->gsi_callbacks); XFREE(d->arch.hvm.params); XFREE(d->arch.hvm.irq); fail0: @@ -777,6 +787,9 @@ void hvm_domain_destroy(struct domain *d) vioapic_deinit(d); XFREE(d->arch.hvm.pl_time); + +if ( hvm_domain_irq(d) ) +XFREE(hvm_domain_irq(d)->gsi_callbacks); XFREE(d->arch.hvm.irq); list_for_each_safe ( ioport_list, tmp, >arch.hvm.g2m_ioport_list ) diff --git a/xen/arch/x86/hvm/irq.c b/xen/arch/x86/hvm/irq.c index 38ac5fb6c7c..4825a387bdc 100644 --- a/xen/arch/x86/hvm/irq.c +++ b/xen/arch/x86/hvm/irq.c @@ -595,6 +595,81 @@ int hvm_local_events_need_delivery(struct vcpu *v) return !hvm_interrupt_blocked(v, intack); } +int hvm_gsi_register_callback(struct domain *d, unsigned int gsi, + struct hvm_gsi_eoi_callback *cb) +{ +struct hvm_irq *hvm_irq = hvm_domain_irq(d); + +if ( gsi >= hvm_irq->nr_gsis ) +{ +ASSERT_UNREACHABLE(); +return -EINVAL; +} + +write_lock(_irq->gsi_callbacks_lock); +list_add(>list, _irq->gsi_callbacks[gsi]); +write_unlock(_irq->gsi_callbacks_lock); + +return 0; +} + +int hvm_gsi_unregister_callback(struct domain *d, unsigned int gsi, +struct hvm_gsi_eoi_callback *cb) +{ +struct hvm_irq *hvm_irq = hvm_domain_irq(d); +const struct list_head *tmp; +bool found = false; + +if ( gsi >= hvm_irq->nr_gsis ) +{ +ASSERT_UNREACHABLE(); +return -EINVAL; +} + +write_lock(_irq->gsi_callbacks_lock); +list_for_each ( tmp, _irq->gsi_callbacks[gsi] ) +if ( tmp == >list ) +{ +list_del(>list); +found = true; +break; +} +write_unlock(_irq->gsi_callbacks_lock); + +return found ? 0 : -ENOENT; +} + +void hvm_gsi_execute_callbacks(struct domain *d, unsigned int gsi) +{ +struct hvm_irq *hvm_irq = hvm_domain_irq(d); +struct hvm_gsi_eoi_callback *cb; + +read_lock(_irq->gsi_callbacks_lock); +list_for_each_entry ( cb, _irq->gsi_callbacks[gsi], list ) +cb->callback(d, gsi, cb->data); +read_unlock(_irq->gsi_callbacks_lock); +} + +bool hvm_gsi_has_callbacks(const struct domain *d,
[PATCH v4 04/12] x86/vioapic: switch to use the EOI callback mechanism
Switch the emulated IO-APIC code to use the local APIC EOI callback mechanism. This allows to remove the last hardcoded callback from vlapic_handle_EOI. Removing the hardcoded vIO-APIC callback also allows to getting rid of setting the EOI exit bitmap based on the triggering mode, as now all users that require an EOI action use the newly introduced callback mechanism. Move and rename the vioapic_update_EOI now that it can be made static. Signed-off-by: Roger Pau Monné --- Changes since v3: - Remove assert in eoi_callback. - Cast callback to bool. - Simplify check in ioapic_load: GSIs < 16 and edge interrupts can also have callbacks. - Reword comment about casting to boolean. Changes since v2: - Explicitly convert the last alternative_vcall parameter to a boolean in vlapic_set_irq_callback. Changes since v1: - Remove the triggering check in the update_eoi_exit_bitmap call. - Register the vlapic callbacks when loading the vIO-APIC state. - Reduce scope of ent. --- xen/arch/x86/hvm/vioapic.c | 127 - xen/arch/x86/hvm/vlapic.c | 12 ++-- 2 files changed, 89 insertions(+), 50 deletions(-) diff --git a/xen/arch/x86/hvm/vioapic.c b/xen/arch/x86/hvm/vioapic.c index 87370dd4172..1c2763fc2bf 100644 --- a/xen/arch/x86/hvm/vioapic.c +++ b/xen/arch/x86/hvm/vioapic.c @@ -394,6 +394,48 @@ static const struct hvm_mmio_ops vioapic_mmio_ops = { .write = vioapic_write }; +static void eoi_callback(struct vcpu *v, unsigned int vector, void *data) +{ +struct domain *d = v->domain; +struct hvm_irq *hvm_irq = hvm_domain_irq(d); +unsigned int i; + +spin_lock(>arch.hvm.irq_lock); + +for ( i = 0; i < d->arch.hvm.nr_vioapics; i++ ) +{ +struct hvm_vioapic *vioapic = domain_vioapic(d, i); +unsigned int pin; + +for ( pin = 0; pin < vioapic->nr_pins; pin++ ) +{ +union vioapic_redir_entry *ent = >redirtbl[pin]; + +if ( ent->fields.vector != vector ) +continue; + +ent->fields.remote_irr = 0; + +if ( is_iommu_enabled(d) ) +{ +spin_unlock(>arch.hvm.irq_lock); +hvm_dpci_eoi(d, vioapic->base_gsi + pin); +spin_lock(>arch.hvm.irq_lock); +} + +if ( (ent->fields.trig_mode == VIOAPIC_LEVEL_TRIG) && + !ent->fields.mask && !ent->fields.remote_irr && + hvm_irq->gsi_assert_count[vioapic->base_gsi + pin] ) +{ +ent->fields.remote_irr = 1; +vioapic_deliver(vioapic, pin); +} +} +} + +spin_unlock(>arch.hvm.irq_lock); +} + static void ioapic_inj_irq( struct hvm_vioapic *vioapic, struct vlapic *target, @@ -407,7 +449,8 @@ static void ioapic_inj_irq( ASSERT((delivery_mode == dest_Fixed) || (delivery_mode == dest_LowestPrio)); -vlapic_set_irq(target, vector, trig_mode); +vlapic_set_irq_callback(target, vector, trig_mode, +trig_mode ? eoi_callback : NULL, NULL); } static void vioapic_deliver(struct hvm_vioapic *vioapic, unsigned int pin) @@ -514,49 +557,6 @@ void vioapic_irq_positive_edge(struct domain *d, unsigned int irq) } } -void vioapic_update_EOI(struct domain *d, u8 vector) -{ -struct hvm_irq *hvm_irq = hvm_domain_irq(d); -union vioapic_redir_entry *ent; -unsigned int i; - -ASSERT(has_vioapic(d)); - -spin_lock(>arch.hvm.irq_lock); - -for ( i = 0; i < d->arch.hvm.nr_vioapics; i++ ) -{ -struct hvm_vioapic *vioapic = domain_vioapic(d, i); -unsigned int pin; - -for ( pin = 0; pin < vioapic->nr_pins; pin++ ) -{ -ent = >redirtbl[pin]; -if ( ent->fields.vector != vector ) -continue; - -ent->fields.remote_irr = 0; - -if ( is_iommu_enabled(d) ) -{ -spin_unlock(>arch.hvm.irq_lock); -hvm_dpci_eoi(d, vioapic->base_gsi + pin); -spin_lock(>arch.hvm.irq_lock); -} - -if ( (ent->fields.trig_mode == VIOAPIC_LEVEL_TRIG) && - !ent->fields.mask && !ent->fields.remote_irr && - hvm_irq->gsi_assert_count[vioapic->base_gsi + pin] ) -{ -ent->fields.remote_irr = 1; -vioapic_deliver(vioapic, pin); -} -} -} - -spin_unlock(>arch.hvm.irq_lock); -} - int vioapic_get_mask(const struct domain *d, unsigned int gsi) { unsigned int pin = 0; /* See gsi_vioapic */ @@ -610,6 +610,8 @@ static int ioapic_save(struct vcpu *v, hvm_domain_context_t *h) static int ioapic_load(struct domain *d, hvm_domain_context_t *h) { struct hvm_vioapic *s; +unsigned int i; +int rc; if ( !has_vioapic(d) ) return -ENODEV; @@ -620,7 +622,42 @@ static int ioapic_load(struct domain *d, hvm_domain_context_t *h)
[PATCH v4 03/12] x86/vmsi: use the newly introduced EOI callbacks
Remove the unconditional call to hvm_dpci_msi_eoi in vlapic_handle_EOI and instead use the newly introduced EOI callback mechanism in order to register a callback for MSI vectors injected from passed through devices. This avoids having multiple callback functions open-coded in vlapic_handle_EOI, as there is now a generic framework for registering such callbacks. It also avoids doing an unconditional call to hvm_dpci_msi_eoi for each EOI processed by the local APIC. Note that now the callback is only registered (and thus executed) when there's an MSI interrupt originating from a PCI passthrough device being injected into the guest, so the check in hvm_dpci_msi_eoi can be removed as it's already done by hvm_dirq_assist which is the only caller of vmsi_deliver_pirq. Signed-off-by: Roger Pau Monné Reviewed-by: Jan Beulich --- Changes since v3: - Fix the callback to take a vcpu parameter. Changes since v2: - Expand commit message. - Pass the domain as the callback data. - Remove the check in hvm_dpci_msi_eoi --- xen/arch/x86/hvm/vlapic.c | 2 -- xen/arch/x86/hvm/vmsi.c | 35 ++- xen/drivers/passthrough/x86/hvm.c | 6 ++ xen/include/asm-x86/hvm/io.h | 2 +- 4 files changed, 24 insertions(+), 21 deletions(-) diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c index 4465beaeec1..cfcbd732b16 100644 --- a/xen/arch/x86/hvm/vlapic.c +++ b/xen/arch/x86/hvm/vlapic.c @@ -503,8 +503,6 @@ void vlapic_handle_EOI(struct vlapic *vlapic, u8 vector) if ( vlapic_test_vector(vector, >regs->data[APIC_TMR]) ) vioapic_update_EOI(d, vector); -hvm_dpci_msi_eoi(d, vector); - spin_lock_irqsave(>callback_lock, flags); callback = vlapic->callbacks[index].callback; vlapic->callbacks[index].callback = NULL; diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c index 13e2a190b43..03ae0dfb3c5 100644 --- a/xen/arch/x86/hvm/vmsi.c +++ b/xen/arch/x86/hvm/vmsi.c @@ -44,11 +44,9 @@ #include #include -static void vmsi_inj_irq( -struct vlapic *target, -uint8_t vector, -uint8_t trig_mode, -uint8_t delivery_mode) +static void vmsi_inj_irq(struct vlapic *target, uint8_t vector, + uint8_t trig_mode, uint8_t delivery_mode, + vlapic_eoi_callback_t *callback, void *data) { HVM_DBG_LOG(DBG_LEVEL_VLAPIC, "vmsi_inj_irq: vec %02x trig %d dm %d\n", vector, trig_mode, delivery_mode); @@ -57,17 +55,17 @@ static void vmsi_inj_irq( { case dest_Fixed: case dest_LowestPrio: -vlapic_set_irq(target, vector, trig_mode); +vlapic_set_irq_callback(target, vector, trig_mode, callback, data); break; default: BUG(); } } -int vmsi_deliver( -struct domain *d, int vector, -uint8_t dest, uint8_t dest_mode, -uint8_t delivery_mode, uint8_t trig_mode) +static int vmsi_deliver_callback(struct domain *d, int vector, uint8_t dest, + uint8_t dest_mode, uint8_t delivery_mode, + uint8_t trig_mode, + vlapic_eoi_callback_t *callback, void *data) { struct vlapic *target; struct vcpu *v; @@ -78,7 +76,8 @@ int vmsi_deliver( target = vlapic_lowest_prio(d, NULL, 0, dest, dest_mode); if ( target != NULL ) { -vmsi_inj_irq(target, vector, trig_mode, delivery_mode); +vmsi_inj_irq(target, vector, trig_mode, delivery_mode, callback, + data); break; } HVM_DBG_LOG(DBG_LEVEL_VLAPIC, "null MSI round robin: vector=%02x\n", @@ -89,8 +88,8 @@ int vmsi_deliver( for_each_vcpu ( d, v ) if ( vlapic_match_dest(vcpu_vlapic(v), NULL, 0, dest, dest_mode) ) -vmsi_inj_irq(vcpu_vlapic(v), vector, - trig_mode, delivery_mode); +vmsi_inj_irq(vcpu_vlapic(v), vector, trig_mode, delivery_mode, + callback, data); break; default: @@ -103,6 +102,13 @@ int vmsi_deliver( return 0; } +int vmsi_deliver(struct domain *d, int vector, uint8_t dest, uint8_t dest_mode, + uint8_t delivery_mode, uint8_t trig_mode) +{ +return vmsi_deliver_callback(d, vector, dest, dest_mode, delivery_mode, + trig_mode, NULL, NULL); +} + void vmsi_deliver_pirq(struct domain *d, const struct hvm_pirq_dpci *pirq_dpci) { uint32_t flags = pirq_dpci->gmsi.gflags; @@ -119,7 +125,8 @@ void vmsi_deliver_pirq(struct domain *d, const struct hvm_pirq_dpci *pirq_dpci) ASSERT(pirq_dpci->flags & HVM_IRQ_DPCI_GUEST_MSI); -vmsi_deliver(d, vector, dest, dest_mode, delivery_mode, trig_mode); +vmsi_deliver_callback(d, vector, dest, dest_mode, delivery_mode, trig_mode, + hvm_dpci_msi_eoi, NULL); } /*
[PATCH v4 02/12] x86/vlapic: introduce an EOI callback mechanism
Add a new vlapic_set_irq_callback helper in order to inject a vector and set a callback to be executed when the guest performs the end of interrupt acknowledgment. Such functionality will be used to migrate the current ad hoc handling done in vlapic_handle_EOI for the vectors that require some logic to be executed when the end of interrupt is performed. The setter of the callback will be in charge for setting the callback again on guest restore, as callbacks are not saved as part of the vlapic state. That is the reason why vlapic_set_callback is not a static function. No current users are migrated to use this new functionality yet, so no functional change expected as a result. Signed-off-by: Roger Pau Monné --- Changes since v3: - Use xzalloc. - Drop printk on ENOMEM. - Add vcpu parameter to vlapic EOI callback. - Check that the vector is pending in ISR or IRR when printing a warning message because of an overriding callback. - Fix commit message regarding resume mention. Changes since v2: - Fix commit message typo. - Expand commit message. - Also print a warning if the callback data is overridden. - Properly free memory in case of error in vlapic_init. Changes since v1: - Make vlapic_set_irq an inline function on the header. - Clear the callback hook in vlapic_handle_EOI. - Introduce a helper to set the callback without injecting a vector. - Remove unneeded parentheses. - Reduce callback table by 16. - Use %pv to print domain/vcpu ID. --- xen/arch/x86/hvm/vlapic.c| 62 ++-- xen/include/asm-x86/hvm/vlapic.h | 19 +- 2 files changed, 78 insertions(+), 3 deletions(-) diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c index 5e21fb4937d..4465beaeec1 100644 --- a/xen/arch/x86/hvm/vlapic.c +++ b/xen/arch/x86/hvm/vlapic.c @@ -144,7 +144,37 @@ bool vlapic_test_irq(const struct vlapic *vlapic, uint8_t vec) return vlapic_test_vector(vec, >regs->data[APIC_IRR]); } -void vlapic_set_irq(struct vlapic *vlapic, uint8_t vec, uint8_t trig) +void vlapic_set_callback(struct vlapic *vlapic, unsigned int vec, + vlapic_eoi_callback_t *callback, void *data) +{ +unsigned long flags; +unsigned int index = vec - 16; + +if ( !callback || vec < 16 || vec >= X86_NR_VECTORS ) +{ +ASSERT_UNREACHABLE(); +return; +} + +spin_lock_irqsave(>callback_lock, flags); +if ( vlapic->callbacks[index].callback && + (vlapic->callbacks[index].callback != callback || + vlapic->callbacks[index].data != data) && + (vlapic_test_vector(vec, >regs->data[APIC_IRR]) || + vlapic_test_vector(vec, >regs->data[APIC_ISR])) ) +printk(XENLOG_G_WARNING + "%pv overriding vector %#x callback %ps (%p) data %p " + "with %ps (%p) data %p\n", + vlapic_vcpu(vlapic), vec, vlapic->callbacks[index].callback, + vlapic->callbacks[index].callback, vlapic->callbacks[index].data, + callback, callback, data); +vlapic->callbacks[index].callback = callback; +vlapic->callbacks[index].data = data; +spin_unlock_irqrestore(>callback_lock, flags); +} + +void vlapic_set_irq_callback(struct vlapic *vlapic, uint8_t vec, uint8_t trig, + vlapic_eoi_callback_t *callback, void *data) { struct vcpu *target = vlapic_vcpu(vlapic); @@ -159,8 +189,12 @@ void vlapic_set_irq(struct vlapic *vlapic, uint8_t vec, uint8_t trig) else vlapic_clear_vector(vec, >regs->data[APIC_TMR]); +if ( callback ) +vlapic_set_callback(vlapic, vec, callback, data); + if ( hvm_funcs.update_eoi_exit_bitmap ) -alternative_vcall(hvm_funcs.update_eoi_exit_bitmap, target, vec, trig); +alternative_vcall(hvm_funcs.update_eoi_exit_bitmap, target, vec, + trig || callback); if ( hvm_funcs.deliver_posted_intr ) alternative_vcall(hvm_funcs.deliver_posted_intr, target, vec); @@ -461,11 +495,24 @@ void vlapic_handle_EOI(struct vlapic *vlapic, u8 vector) { struct vcpu *v = vlapic_vcpu(vlapic); struct domain *d = v->domain; +vlapic_eoi_callback_t *callback; +void *data; +unsigned long flags; +unsigned int index = vector - 16; if ( vlapic_test_vector(vector, >regs->data[APIC_TMR]) ) vioapic_update_EOI(d, vector); hvm_dpci_msi_eoi(d, vector); + +spin_lock_irqsave(>callback_lock, flags); +callback = vlapic->callbacks[index].callback; +vlapic->callbacks[index].callback = NULL; +data = vlapic->callbacks[index].data; +spin_unlock_irqrestore(>callback_lock, flags); + +if ( callback ) +callback(v, vector, data); } static bool_t is_multicast_dest(struct vlapic *vlapic, unsigned int short_hand, @@ -1623,9 +1670,19 @@ int vlapic_init(struct vcpu *v) clear_page(vlapic->regs); +vlapic->callbacks = xzalloc_array(typeof(*vlapic->callbacks), +
[PATCH v4 01/12] x86/rtc: drop code related to strict mode
Xen has been for a long time setting the WAET ACPI table "RTC good" flag, which implies there's no need to perform a read of the RTC REG_C register in order to get further interrupts after having received one. This is hardcoded in the static ACPI tables, and in the RTC emulation in Xen. Drop the support for the alternative (strict) mode, it's been unused for a long (since Xen 4.3) time without any complains. Signed-off-by: Roger Pau Monné --- Further changes in the series will require that no registering or unregistering of callback is done inside of the handlers themselves, like it was done in rtc_pf_callback when in strict_mode. --- Changes since v3: - New in this version. --- xen/arch/x86/hvm/rtc.c | 27 +-- xen/arch/x86/hvm/vpt.c | 4 +--- 2 files changed, 2 insertions(+), 29 deletions(-) diff --git a/xen/arch/x86/hvm/rtc.c b/xen/arch/x86/hvm/rtc.c index 3150f5f1479..9992595c45a 100644 --- a/xen/arch/x86/hvm/rtc.c +++ b/xen/arch/x86/hvm/rtc.c @@ -46,15 +46,6 @@ #define epoch_year 1900 #define get_year(x)(x + epoch_year) -enum rtc_mode { - rtc_mode_no_ack, - rtc_mode_strict -}; - -/* This must be in sync with how hvmloader sets the ACPI WAET flags. */ -#define mode_is(d, m) ((void)(d), rtc_mode_##m == rtc_mode_no_ack) -#define rtc_mode_is(s, m) mode_is(vrtc_domain(s), m) - static void rtc_copy_date(RTCState *s); static void rtc_set_time(RTCState *s); static inline int from_bcd(RTCState *s, int a); @@ -64,9 +55,6 @@ static void rtc_update_irq(RTCState *s) { ASSERT(spin_is_locked(>lock)); -if ( rtc_mode_is(s, strict) && (s->hw.cmos_data[RTC_REG_C] & RTC_IRQF) ) -return; - /* IRQ is raised if any source is both raised & enabled */ if ( !(s->hw.cmos_data[RTC_REG_B] & s->hw.cmos_data[RTC_REG_C] & @@ -74,8 +62,7 @@ static void rtc_update_irq(RTCState *s) return; s->hw.cmos_data[RTC_REG_C] |= RTC_IRQF; -if ( rtc_mode_is(s, no_ack) ) -hvm_isa_irq_deassert(vrtc_domain(s), RTC_IRQ); +hvm_isa_irq_deassert(vrtc_domain(s), RTC_IRQ); hvm_isa_irq_assert(vrtc_domain(s), RTC_IRQ, NULL); } @@ -86,19 +73,7 @@ static void rtc_pf_callback(struct vcpu *v, void *opaque) RTCState *s = opaque; spin_lock(>lock); - -if ( !rtc_mode_is(s, no_ack) - && (s->hw.cmos_data[RTC_REG_C] & RTC_IRQF) - && ++(s->pt_dead_ticks) >= 10 ) -{ -/* VM is ignoring its RTC; no point in running the timer */ -TRACE_0D(TRC_HVM_EMUL_RTC_STOP_TIMER); -destroy_periodic_time(>pt); -s->period = 0; -} - s->hw.cmos_data[RTC_REG_C] |= RTC_PF|RTC_IRQF; - spin_unlock(>lock); } diff --git a/xen/arch/x86/hvm/vpt.c b/xen/arch/x86/hvm/vpt.c index 4cc0a0848bd..24d90c0a186 100644 --- a/xen/arch/x86/hvm/vpt.c +++ b/xen/arch/x86/hvm/vpt.c @@ -21,7 +21,6 @@ #include #include #include -#include #include #define mode_is(d, name) \ @@ -337,8 +336,7 @@ int pt_update_irq(struct vcpu *v) { if ( pt->pending_intr_nr ) { -/* RTC code takes care of disabling the timer itself. */ -if ( (pt->irq != RTC_IRQ || !pt->priv) && pt_irq_masked(pt) && +if ( pt_irq_masked(pt) && /* Level interrupts should be asserted even if masked. */ !pt->level ) { -- 2.30.1
[PATCH v4 00/12] x86/intr: introduce EOI callbacks and fix vPT
Hello, The following series introduces EOI callbacks and switches a number of subsystems to use them. The first one is vmsi and dpci, which are quite straight forward to switch since they used to open code hooks in the EOI handlers. Finally HVM virtual timers are also switched to a different model where EOI callbacks are used instead of checking at every return to guest whether a timer interrupt is being injected. Note that such change also fixes a bug in virtual periodic timers that prevented injecting to a vCPU different than the one where the timer is assigned (and that would currently match the i8259 target). Those changes are paired together so that practical applications of having EOI callbacks can be seen in action. Note that further cleanup can be done, but I think the series is already big enough and provides a clear benefit. A couple of notes from the testing performed: - PVH dom0. - Windows guest, limited to 2% capacity to test delay for missed ticks mode - time is consistent in the guest. - Windows guest migration. - PCI passthrough to a guest. Thanks, Roger. Roger Pau Monne (12): x86/rtc: drop code related to strict mode x86/vlapic: introduce an EOI callback mechanism x86/vmsi: use the newly introduced EOI callbacks x86/vioapic: switch to use the EOI callback mechanism x86/hvm: allowing registering EOI callbacks for GSIs x86/dpci: move code x86/dpci: switch to use a GSI EOI callback x86/vpt: switch interrupt injection model x86/irq: remove unused parameter from hvm_isa_irq_assert x86/irq: drop return value from hvm_ioapic_assert x86/vpt: remove vPT timers per-vCPU lists x86/vpt: introduce a per-vPT lock xen/arch/x86/domain.c | 4 +- xen/arch/x86/emul-i8254.c | 1 + xen/arch/x86/hvm/dm.c | 2 +- xen/arch/x86/hvm/hpet.c | 8 +- xen/arch/x86/hvm/hvm.c| 23 +- xen/arch/x86/hvm/irq.c| 93 +- xen/arch/x86/hvm/pmtimer.c| 2 +- xen/arch/x86/hvm/rtc.c| 30 +- xen/arch/x86/hvm/svm/intr.c | 3 - xen/arch/x86/hvm/vioapic.c| 144 ++--- xen/arch/x86/hvm/vlapic.c | 72 - xen/arch/x86/hvm/vmsi.c | 35 +- xen/arch/x86/hvm/vmx/intr.c | 59 xen/arch/x86/hvm/vpic.c | 8 +- xen/arch/x86/hvm/vpt.c| 514 +- xen/drivers/passthrough/x86/hvm.c | 223 - xen/include/asm-x86/hvm/io.h | 3 +- xen/include/asm-x86/hvm/irq.h | 28 +- xen/include/asm-x86/hvm/vcpu.h| 4 - xen/include/asm-x86/hvm/vlapic.h | 19 +- xen/include/asm-x86/hvm/vpt.h | 32 +- 21 files changed, 630 insertions(+), 677 deletions(-) -- 2.30.1
[xen-4.12-testing test] 161304: regressions - FAIL
flight 161304 xen-4.12-testing real [real] flight 161327 xen-4.12-testing real-retest [real] http://logs.test-lab.xenproject.org/osstest/logs/161304/ http://logs.test-lab.xenproject.org/osstest/logs/161327/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-amd64-xl-qcow2 19 guest-localmigrate/x10 fail in 161261 REGR. vs. 159418 Tests which are failing intermittently (not blocking): test-amd64-amd64-xl-qcow218 guest-saverestore.2fail pass in 161261 Tests which did not succeed, but are not blocking: test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 159418 test-armhf-armhf-libvirt 16 saverestore-support-checkfail like 159418 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 159418 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 159418 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 159418 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 159418 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 159418 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 159418 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 159418 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail like 159418 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 159418 test-amd64-i386-xl-pvshim14 guest-start fail never pass test-arm64-arm64-xl-seattle 15 migrate-support-checkfail never pass test-arm64-arm64-xl-seattle 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt 15 migrate-support-checkfail never pass test-amd64-i386-libvirt-xsm 15 migrate-support-checkfail never pass test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail never pass test-amd64-i386-libvirt 15 migrate-support-checkfail never pass test-arm64-arm64-xl 15 migrate-support-checkfail never pass test-arm64-arm64-xl 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 15 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit2 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit1 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit1 16 saverestore-support-checkfail never pass test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 16 saverestore-support-checkfail never pass test-armhf-armhf-xl 15 migrate-support-checkfail never pass test-armhf-armhf-xl 16 saverestore-support-checkfail never pass test-armhf-armhf-libvirt 15 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit1 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit1 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-rtds 15 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-vhd 14 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 15 saverestore-support-checkfail never pass test-armhf-armhf-xl-arndale 15 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 16 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail never pass version targeted for testing: xen 5b280a59c4dd8dad6cc8da28db981b193d10acee baseline version: xen 4cf5929606adc2fb1ab4e2921c14ba4b8046ecd1 Last test of basis 159418 2021-02-16 15:06:11 Z 62 days Failing since160128 2021-03-18 14:36:18 Z 32 days 43
Re: [PATCH] xen/arm: smmuv1: Revert associating the group pointer with the S2CR
Hi Julien, > On 20 Apr 2021, at 1:39 pm, Julien Grall wrote: > > Hi, > > On 19/04/2021 17:21, Stefano Stabellini wrote: >> On Mon, 19 Apr 2021, Rahul Singh wrote: >>> Hi Julien, >>> On 18 Apr 2021, at 6:48 pm, Julien Grall wrote: On 16/04/2021 17:41, Rahul Singh wrote: > Hi Julien Hi Rahul, >> On 16 Apr 2021, at 5:08 pm, Julien Grall wrote: >> >> >> >> On 16/04/2021 17:05, Rahul Singh wrote: On 16 Apr 2021, at 4:23 pm, Julien Grall wrote: On 16/04/2021 16:01, Rahul Singh wrote: > Hi Julien, Hi Rahul, >> On 16 Apr 2021, at 3:35 pm, Julien Grall wrote: >> >> Hi, >> >> On 16/04/2021 12:25, Rahul Singh wrote: >>> Revert the code that associates the group pointer with the S2CR as >>> this >>> code causing an issue when the SMMU device has more than one master >>> device. >> >> It is not clear to me why this change was first added. Are we >> missing any feature when reverting it? > This feature was added when we backported the code from Linux to fix > the stream match conflict issue > as part of commit "xen/arm: smmuv1: Intelligent SMR allocation”. > This is an extra feature added to allocate IOMMU group based on > stream-id. If two device has the > same stream-id then we assign those devices to the same group. If we revert the patch, then it would not be possible to use the SMMU if two devices use the same stream-id. Is that correct? >>> No. If we revert the patch we can use the SMMU if two devices use the >>> same stream-id without any issue but each device will be in a separate >>> group.This is same behaviour before the code is merged. >> >> Ok. So there is no change in behavior. Good. Can you propose a commit >> message clarifying that? > Please have a look if it make sense. > xen/arm: smmuv1: Revert associating the group pointer with the S2CR > Revert the code that associates the group pointer with the S2CR as this > code causing an issue when the SMMU device has more than one master > device with same stream-id. This issue is introduced by the below commit: > “0435784cc75dcfef3b5f59c29deb1dbb84265ddb:xen/arm: smmuv1: Intelligent > SMR allocation” > Reverting the code will not impact to use of SMMU if two devices use the > same stream-id but each device will be in a separate group. This is the > same > behaviour before the code is merged. Look good to me. Is this patch to be applied on top of Stefano's series? If not, is there going to be more clash? >>> >>> As per Stefano's mail he already tested his patch series on top of this >>> patch. I think this patch has to merged before Stefano’s patch series >>> Let Stefano also confirm that. >>> >>> I think there will be no more clashes. >> Yes, this patch is to be committed *before* my series and I have already >> tested this patch alone and with my series on top. Both cases work fine. > > Cool. Thanks for the confirmation. I have committed the patch with the new > commit message (although, I tweaked a little bit to use the abbreviated > version of the commit ID). > Thanks! Regards, Rahul > Cheers, > > -- > Julien Grall
[PATCH] tools/xenstored: Remove unused prototype
From: Julien Grall A prototype for dump_conn() has been present for quite a long time but there are no implementation. Even, AFAICT in the patch that introduced it. So drop it. Signed-off-by: Julien Grall --- tools/xenstore/xenstored_core.c | 1 - 1 file changed, 1 deletion(-) diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c index d54a6042a9f7..591b28e4552f 100644 --- a/tools/xenstore/xenstored_core.c +++ b/tools/xenstore/xenstored_core.c @@ -2085,7 +2085,6 @@ static struct option options[] = { #endif { NULL, 0, NULL, 0 } }; -extern void dump_conn(struct connection *conn); int dom0_domid = 0; int dom0_event = 0; int priv_domid = 0; -- 2.17.1
Re: swiotlb cleanups v3
On 4/20/21 4:23 AM, Christoph Hellwig wrote: > On Sat, Apr 17, 2021 at 11:39:22AM -0500, Tom Lendacky wrote: >> Somewhere between the 1st and 2nd patch, specifying a specific swiotlb >> for an SEV guest is no longer honored. For example, if I start an SEV >> guest with 16GB of memory and specify swiotlb=131072 I used to get a >> 256MB SWIOTLB. However, after the 2nd patch, the swiotlb=131072 is no >> longer honored and I get a 982MB SWIOTLB (as set via sev_setup_arch() in >> arch/x86/mm/mem_encrypt.c). >> >> I can't be sure which patch caused the issue since an SEV guest fails to >> boot with the 1st patch but can boot with the 2nd patch, at which point >> the SWIOTLB comes in at 982MB (I haven't had a chance to debug it and so >> I'm hoping you might be able to quickly spot what's going on). > > Can you try this patch? Thanks, Christoph. This works for honoring the command line value with SEV guests. There was still a reference to default_nslabs in setup_io_tlb_npages() that I'm not sure how you want to handle. I just commented it out for now to let the code compile to test the intent of the patch. Thanks, Tom > > diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c > index 0a5b6f7e75bce6..ac81ef97df32f5 100644 > --- a/kernel/dma/swiotlb.c > +++ b/kernel/dma/swiotlb.c > @@ -71,15 +71,17 @@ struct io_tlb_mem *io_tlb_default_mem; > */ > static unsigned int max_segment; > > -static unsigned long default_nslabs = IO_TLB_DEFAULT_SIZE >> IO_TLB_SHIFT; > +static unsigned long swiotlb_cmdline_size; > > static int __init > setup_io_tlb_npages(char *str) > { > if (isdigit(*str)) { > /* avoid tail segment of size < IO_TLB_SEGSIZE */ > - default_nslabs = > - ALIGN(simple_strtoul(str, , 0), IO_TLB_SEGSIZE); > + unsigned long nslabs = simple_strtoul(str, , 0); > + > + swiotlb_cmdline_size = > + ALIGN(nslabs, IO_TLB_SEGSIZE) << IO_TLB_SHIFT; > } > if (*str == ',') > ++str; > @@ -108,7 +110,9 @@ void swiotlb_set_max_segment(unsigned int val) > > unsigned long swiotlb_size_or_default(void) > { > - return default_nslabs << IO_TLB_SHIFT; > + if (swiotlb_cmdline_size) > + return swiotlb_cmdline_size; > + return IO_TLB_DEFAULT_SIZE; > } > > void __init swiotlb_adjust_size(unsigned long size) > @@ -118,9 +122,10 @@ void __init swiotlb_adjust_size(unsigned long size) >* architectures such as those supporting memory encryption to >* adjust/expand SWIOTLB size for their use. >*/ > - size = ALIGN(size, IO_TLB_SIZE); > - default_nslabs = ALIGN(size >> IO_TLB_SHIFT, IO_TLB_SEGSIZE); > - pr_info("SWIOTLB bounce buffer size adjusted to %luMB", size >> 20); > + if (!swiotlb_cmdline_size) > + swiotlb_cmdline_size = ALIGN(size, IO_TLB_SIZE); > + pr_info("SWIOTLB bounce buffer size adjusted to %luMB", > + swiotlb_cmdline_size >> 20); > } > > void swiotlb_print_info(void) > @@ -209,7 +214,7 @@ int __init swiotlb_init_with_tbl(char *tlb, unsigned long > nslabs, int verbose) > void __init > swiotlb_init(int verbose) > { > - size_t bytes = PAGE_ALIGN(default_nslabs << IO_TLB_SHIFT); > + size_t bytes = PAGE_ALIGN(swiotlb_size_or_default()); > void *tlb; > > if (swiotlb_force == SWIOTLB_NO_FORCE) > @@ -219,7 +224,7 @@ swiotlb_init(int verbose) > tlb = memblock_alloc_low(bytes, PAGE_SIZE); > if (!tlb) > goto fail; > - if (swiotlb_init_with_tbl(tlb, default_nslabs, verbose)) > + if (swiotlb_init_with_tbl(tlb, bytes >> IO_TLB_SHIFT, verbose)) > goto fail_free_mem; > return; > >
Re: [PATCH v2] xen/pci: Refactor PCI MSI interrupts related code
Hi Jan > On 19 Apr 2021, at 1:33 pm, Jan Beulich wrote: > > On 19.04.2021 13:54, Julien Grall wrote: >> For the time being, I think move this code in x86 is a lot better than >> #ifdef or keep the code in common code. > > Well, I would perhaps agree if it ended up being #ifdef CONFIG_X86. > I would perhaps not agree if there was a new CONFIG_* which other > (future) arch-es could select if desired. I agree with Julien moving the code to x86 file as currently it is referenced only in x86 code and as of now we are not sure how other architecture will implement the Interrupt remapping (via IOMMU or any other means). Let me know if you are ok with moving the code to x86. Regards, Rahul > > Jan
Re: [PATCH 5/9] arm/mm: Get rid of READ/WRITE_SYSREG32
Hi Michal, On 20/04/2021 08:08, Michal Orzel wrote: AArch64 system registers are 64bit whereas AArch32 ones are 32bit or 64bit. MSR/MRS are expecting 64bit values thus we should get rid of helpers READ/WRITE_SYSREG32 in favour of using READ/WRITE_SYSREG. We should also use register_t type when reading sysregs which can correspond to uint64_t or uint32_t. Even though many AArch64 sysregs have upper 32bit reserved it does not mean that they can't be widen in the future. Modify SCTLR_EL2 accesses to use READ/WRITE_SYSREG. SCTLR_EL2 already has bits defined in the range [32:63]. So this change is going to have a side effect as AFAICT head.S will not touch those bits. So they are now going to be preserved. The Arm Arm defines them as unknown if implemented. Therefore shouldn't we zero them somewhere else? In any case, I think the commit message ought to contain an analysis for system registers that happened to have bits defined in the range [32:63]. Signed-off-by: Michal Orzel --- xen/arch/arm/mm.c| 2 +- xen/arch/arm/traps.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c index 59f8a3f15f..0e07335291 100644 --- a/xen/arch/arm/mm.c +++ b/xen/arch/arm/mm.c @@ -613,7 +613,7 @@ void __init remove_early_mappings(void) */ static void xen_pt_enforce_wnx(void) { -WRITE_SYSREG32(READ_SYSREG32(SCTLR_EL2) | SCTLR_Axx_ELx_WXN, SCTLR_EL2); +WRITE_SYSREG(READ_SYSREG(SCTLR_EL2) | SCTLR_Axx_ELx_WXN, SCTLR_EL2); /* * The TLBs may cache SCTLR_EL2.WXN. So ensure it is synchronized * before flushing the TLBs. diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c index c7acdb2087..e7384381cc 100644 --- a/xen/arch/arm/traps.c +++ b/xen/arch/arm/traps.c @@ -915,7 +915,7 @@ static void _show_registers(const struct cpu_user_regs *regs, printk(" VTTBR_EL2: %016"PRIx64"\n", ctxt->vttbr_el2); printk("\n"); -printk(" SCTLR_EL2: %08"PRIx32"\n", READ_SYSREG32(SCTLR_EL2)); +printk(" SCTLR_EL2: %"PRIregister"\n", READ_SYSREG(SCTLR_EL2)); printk(" HCR_EL2: %"PRIregister"\n", READ_SYSREG(HCR_EL2)); printk(" TTBR0_EL2: %016"PRIx64"\n", READ_SYSREG64(TTBR0_EL2)); printk("\n"); Cheers, -- Julien Grall
Re: [PATCH 4/9] arm/p2m: Get rid of READ/WRITE_SYSREG32
Hi Michal, On 20/04/2021 08:08, Michal Orzel wrote: AArch64 system registers are 64bit whereas AArch32 ones are 32bit or 64bit. MSR/MRS are expecting 64bit values thus we should get rid of helpers READ/WRITE_SYSREG32 in favour of using READ/WRITE_SYSREG. We should also use register_t type when reading sysregs which can correspond to uint64_t or uint32_t. Even though many AArch64 sysregs have upper 32bit reserved it does not mean that they can't be widen in the future. Modify type of vtcr to register_t. Signed-off-by: Michal Orzel Reviewed-by: Julien Grall Cheers, -- Julien Grall
Re: [PATCH 3/9] arm/gic: Get rid of READ/WRITE_SYSREG32
Hi Michal, On 20/04/2021 08:08, Michal Orzel wrote: AArch64 system registers are 64bit whereas AArch32 ones are 32bit or 64bit. MSR/MRS are expecting 64bit values thus we should get rid of helpers READ/WRITE_SYSREG32 in favour of using READ/WRITE_SYSREG. We should also use register_t type when reading sysregs which can correspond to uint64_t or uint32_t. Even though many AArch64 sysregs have upper 32bit reserved it does not mean that they can't be widen in the future. Modify types of following members of struct gic_v3 to register_t: -hcr(not used at all in Xen) It looks like we never used it (even in the patch introducing it). So I would suggest to remove it in a patch before this one. -vmcr -sre_el1 -apr0 -apr1 Signed-off-by: Michal Orzel --- xen/arch/arm/gic-v3-lpi.c | 2 +- xen/arch/arm/gic-v3.c | 96 --- xen/include/asm-arm/gic.h | 6 +-- 3 files changed, 54 insertions(+), 50 deletions(-) diff --git a/xen/arch/arm/gic-v3-lpi.c b/xen/arch/arm/gic-v3-lpi.c index 869bc97fa1..e1594dd20e 100644 --- a/xen/arch/arm/gic-v3-lpi.c +++ b/xen/arch/arm/gic-v3-lpi.c @@ -178,7 +178,7 @@ void gicv3_do_LPI(unsigned int lpi) irq_enter(); /* EOI the LPI already. */ -WRITE_SYSREG32(lpi, ICC_EOIR1_EL1); +WRITE_SYSREG(lpi, ICC_EOIR1_EL1); /* Find out if a guest mapped something to this physical LPI. */ hlpip = gic_get_host_lpi(lpi); diff --git a/xen/arch/arm/gic-v3.c b/xen/arch/arm/gic-v3.c index ac28013c19..0634013a67 100644 --- a/xen/arch/arm/gic-v3.c +++ b/xen/arch/arm/gic-v3.c @@ -246,12 +246,12 @@ static void gicv3_ich_write_lr(int lr, uint64_t val) */ static void gicv3_enable_sre(void) { -uint32_t val; +register_t val; -val = READ_SYSREG32(ICC_SRE_EL2); +val = READ_SYSREG(ICC_SRE_EL2); val |= GICC_SRE_EL2_SRE; -WRITE_SYSREG32(val, ICC_SRE_EL2); +WRITE_SYSREG(val, ICC_SRE_EL2); isb(); } @@ -315,16 +315,16 @@ static void restore_aprn_regs(const union gic_state_data *d) switch ( gicv3.nr_priorities ) { case 7: -WRITE_SYSREG32(d->v3.apr0[2], ICH_AP0R2_EL2); -WRITE_SYSREG32(d->v3.apr1[2], ICH_AP1R2_EL2); +WRITE_SYSREG(d->v3.apr0[2], ICH_AP0R2_EL2); +WRITE_SYSREG(d->v3.apr1[2], ICH_AP1R2_EL2); /* Fall through */ case 6: -WRITE_SYSREG32(d->v3.apr0[1], ICH_AP0R1_EL2); -WRITE_SYSREG32(d->v3.apr1[1], ICH_AP1R1_EL2); +WRITE_SYSREG(d->v3.apr0[1], ICH_AP0R1_EL2); +WRITE_SYSREG(d->v3.apr1[1], ICH_AP1R1_EL2); /* Fall through */ case 5: -WRITE_SYSREG32(d->v3.apr0[0], ICH_AP0R0_EL2); -WRITE_SYSREG32(d->v3.apr1[0], ICH_AP1R0_EL2); +WRITE_SYSREG(d->v3.apr0[0], ICH_AP0R0_EL2); +WRITE_SYSREG(d->v3.apr1[0], ICH_AP1R0_EL2); break; default: BUG(); @@ -338,16 +338,16 @@ static void save_aprn_regs(union gic_state_data *d) switch ( gicv3.nr_priorities ) { case 7: -d->v3.apr0[2] = READ_SYSREG32(ICH_AP0R2_EL2); -d->v3.apr1[2] = READ_SYSREG32(ICH_AP1R2_EL2); +d->v3.apr0[2] = READ_SYSREG(ICH_AP0R2_EL2); +d->v3.apr1[2] = READ_SYSREG(ICH_AP1R2_EL2); /* Fall through */ case 6: -d->v3.apr0[1] = READ_SYSREG32(ICH_AP0R1_EL2); -d->v3.apr1[1] = READ_SYSREG32(ICH_AP1R1_EL2); +d->v3.apr0[1] = READ_SYSREG(ICH_AP0R1_EL2); +d->v3.apr1[1] = READ_SYSREG(ICH_AP1R1_EL2); /* Fall through */ case 5: -d->v3.apr0[0] = READ_SYSREG32(ICH_AP0R0_EL2); -d->v3.apr1[0] = READ_SYSREG32(ICH_AP1R0_EL2); +d->v3.apr0[0] = READ_SYSREG(ICH_AP0R0_EL2); +d->v3.apr1[0] = READ_SYSREG(ICH_AP1R0_EL2); break; default: BUG(); @@ -371,15 +371,15 @@ static void gicv3_save_state(struct vcpu *v) dsb(sy); gicv3_save_lrs(v); save_aprn_regs(>arch.gic); -v->arch.gic.v3.vmcr = READ_SYSREG32(ICH_VMCR_EL2); -v->arch.gic.v3.sre_el1 = READ_SYSREG32(ICC_SRE_EL1); +v->arch.gic.v3.vmcr = READ_SYSREG(ICH_VMCR_EL2); +v->arch.gic.v3.sre_el1 = READ_SYSREG(ICC_SRE_EL1); } static void gicv3_restore_state(const struct vcpu *v) { -uint32_t val; +register_t val; -val = READ_SYSREG32(ICC_SRE_EL2); +val = READ_SYSREG(ICC_SRE_EL2); /* * Don't give access to system registers when the guest is using * GICv2 @@ -388,7 +388,7 @@ static void gicv3_restore_state(const struct vcpu *v) val &= ~GICC_SRE_EL2_ENEL1; else val |= GICC_SRE_EL2_ENEL1; -WRITE_SYSREG32(val, ICC_SRE_EL2); +WRITE_SYSREG(val, ICC_SRE_EL2); /* * VFIQEn is RES1 if ICC_SRE_EL1.SRE is 1. This causes a Group0 @@ -398,9 +398,9 @@ static void gicv3_restore_state(const struct vcpu *v) * want before starting to mess with the rest of the GIC, and * VMCR_EL1 in particular. */ -WRITE_SYSREG32(v->arch.gic.v3.sre_el1,
Re: [PATCH 2/9] arm/domain: Get rid of READ/WRITE_SYSREG32
Hi Michal, On 20/04/2021 08:08, Michal Orzel wrote: AArch64 system registers are 64bit whereas AArch32 ones are 32bit or 64bit. MSR/MRS are expecting 64bit values thus we should get rid of helpers READ/WRITE_SYSREG32 in favour of using READ/WRITE_SYSREG. We should also use register_t type when reading sysregs which can correspond to uint64_t or uint32_t. Even though many AArch64 sysregs have upper 32bit reserved it does not mean that they can't be widen in the future. Modify type of registers: actlr, cntkctl to register_t. ACTLR is implementation defined, so in theory there might already bits already defined in the range [32:63]. So I would consider to split it from the patch so we can backport it. Signed-off-by: Michal Orzel --- xen/arch/arm/domain.c| 20 ++-- xen/include/asm-arm/domain.h | 4 ++-- 2 files changed, 12 insertions(+), 12 deletions(-) diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c index bdd3d3e5b5..c021a03c61 100644 --- a/xen/arch/arm/domain.c +++ b/xen/arch/arm/domain.c @@ -113,13 +113,13 @@ static void ctxt_switch_from(struct vcpu *p) p->arch.tpidr_el1 = READ_SYSREG(TPIDR_EL1); /* Arch timer */ -p->arch.cntkctl = READ_SYSREG32(CNTKCTL_EL1); +p->arch.cntkctl = READ_SYSREG(CNTKCTL_EL1); virt_timer_save(p); if ( is_32bit_domain(p->domain) && cpu_has_thumbee ) { -p->arch.teecr = READ_SYSREG32(TEECR32_EL1); -p->arch.teehbr = READ_SYSREG32(TEEHBR32_EL1); +p->arch.teecr = READ_SYSREG(TEECR32_EL1); +p->arch.teehbr = READ_SYSREG(TEEHBR32_EL1); It feels strange you converted cntkctl and actlr to use register_t but not teecr and teehbr. Can you explain why? Cheers, -- Julien Grall
Re: [PATCH 1/9] arm64/vfp: Get rid of READ/WRITE_SYSREG32
Hi Michal, On 20/04/2021 08:08, Michal Orzel wrote: AArch64 system registers are 64bit whereas AArch32 ones are 32bit or 64bit. MSR/MRS are expecting 64bit values thus we should get rid of helpers READ/WRITE_SYSREG32 in favour of using READ/WRITE_SYSREG. We should also use register_t type when reading sysregs which can correspond to uint64_t or uint32_t. Even though many AArch64 sysregs have upper 32bit reserved it does not mean that they can't be widen in the future. Modify type of FPCR, FPSR, FPEXC32_EL2 to register_t. Signed-off-by: Michal Orzel Reviewed-by: Julien Grall Cheers, -- Julien Grall
Re: [PATCH v2] VT-d: Don't assume register-based invalidation is always supported
On 20/04/2021 13:50, Jan Beulich wrote: Alternatively, you could be a bit more verbose in your request so the other understand the reasoning behind it. Well, yes, perhaps. But then there's the desire to not repeat oneself all the time. Most likely, the time you try to save not expanding your thought are going to be lost when the contributor will come back asking why you are requesting it. ;) Cheers, -- Julien Grall
Re: [PATCH v2] VT-d: Don't assume register-based invalidation is always supported
On 20.04.2021 14:39, Julien Grall wrote: > On 20/04/2021 13:25, Jan Beulich wrote: >> On 20.04.2021 14:14, Julien Grall wrote: >>> It is not really my area of expertise but I wanted to jump on one >>> comment below... >>> >>> On 20/04/2021 12:38, Jan Beulich wrote: On 01.04.2020 22:06, Chao Gao wrote: > --- > Changes in v2: >- verify system suspension and resumption with this patch applied >- don't fall back to register-based interface if enabling qinval > failed. > see the change in init_vtd_hw(). >- remove unnecessary "queued_inval_supported" variable >- constify the "struct vtd_iommu *" of > has_register_based_invalidation() >- coding-style changes ... while this suggests this is v2 of a recently sent patch, the submission is dated a little over a year ago. This is confusing. It is additionally confusing that there were two copies of it in my inbox, despite mails coming from a list normally getting de-duplicated somewhere at our end (I believe). > --- a/xen/drivers/passthrough/vtd/iommu.c > +++ b/xen/drivers/passthrough/vtd/iommu.c > @@ -1193,6 +1193,14 @@ int __init iommu_alloc(struct acpi_drhd_unit *drhd) > >iommu->cap = dmar_readq(iommu->reg, DMAR_CAP_REG); >iommu->ecap = dmar_readq(iommu->reg, DMAR_ECAP_REG); > +iommu->version = dmar_readl(iommu->reg, DMAR_VER_REG); > + > +if ( !iommu_qinval && !has_register_based_invalidation(iommu) ) > +{ > +printk(XENLOG_WARNING VTDPREFIX "IOMMU %d: cannot disable Queued > Invalidation.\n", > + iommu->index); Here (and at least once more yet further down): We don't normally end log messages with a full stop. Easily addressable while committing, of course. >>> >>> I can find a large number of cases where log messages are ended with a >>> full stop... Actually it looks more natural to me than your suggestion. >> >> Interesting. From purely a language perspective it indeed looks more >> natural, I agree. But when considering (serial) console bandwidth, we >> ought to try to eliminate _any_ output that's there without conveying >> information or making the conveyed information understandable. In fact >> I recall a number of cases (without having commit hashes to hand) >> where we deliberately dropped full stops. (The messages here aren't at >> risk of causing bandwidth issues, but as with any such generic item I >> think the goal ought to be consistency, and hence either full stops >> everywhere, or nowhere. If bandwidth was an issue here, I might also >> have suggested to shorten "Queued Invalidation" to just "QI".) > I wasn't aware of such requirement in Xen... Although, I can see how > this can be a concern. If you really want to enforce it, then it should > be written in the CODING_STYLE. Agreed, but since I've had no success with prior adjustments to that file (not even worth a reply to tell me why a change might be a bad one, in at least some of the cases), I'm afraid I've given up making attempts to get adjustments into there. > Alternatively, you could be a bit more > verbose in your request so the other understand the reasoning behind it. Well, yes, perhaps. But then there's the desire to not repeat oneself all the time. Jan
Re: [PATCH v2 2/2] x86/cpuid: support LFENCE always serializing CPUID bit
On 15/04/2021 15:47, Roger Pau Monne wrote: > AMD Milan (Zen3) CPUs have an LFENCE Always Serializing CPUID bit in > leaf 8021.eax. Previous AMD versions used to have a user settable > bit in DE_CFG MSR to select whether LFENCE was dispatch serializing, > which Xen always attempts to set. The forcefully always on setting is > due to the addition of SEV-SNP so that a VMM cannot break the > confidentiality of a guest. > > In order to support this new CPUID bit move the LFENCE_DISPATCH > synthetic CPUID bit to map the hardware bit (leaving a hole in the > synthetic range) and either rely on the bit already being set by the > native CPUID output, or attempt to fake it in Xen by modifying the > DE_CFG MSR. This requires adding one more entry to the featureset to > support leaf 8021.eax. > > The bit is exposed to guests by default if the underlying hardware > supports leaf 8021, as a way to signal that LFENCE is always > serializing. Hardware that doesn't have the leaf might also get the > bit set because Xen has performed the necessary arrangements, but > that's not yet exposed to guests. Note that Xen doesn't allow guests > to change the DE_CFG value, so once set by Xen LFENCE will always be > serializing. > > Note that the access to DE_CFG by guests is left as-is: reads will > unconditionally return LFENCE_SERIALISE bit set, while writes are > silently dropped. > > Suggested-by: Andrew Cooper > Signed-off-by: Roger Pau Monné > --- > Changes since v1: > - Rename to lfence+. > - Add feature to libxl_cpuid.c. > - Reword commit message. > --- > Note this doesn't yet expose the bit on hardware that doesn't support > leaf 8021. It's still TBD whether we want to hardcode this support > manually, or instead rely on a more general approach like the one > suggested by the shrink max CPUID leaf patch from Jan. I'm going to insist on using the manual approach. Upping max leaf is strictly opposite to shrinking logic. It's very rare that we'll want to extend max leaf beyond what hardware supports, and it wants calling out clearly, along with identifying why it is safe to do so in this specific case. It is not safe or sensible to blindly escalate to the compile time max. The only cases where the differ are bugs needing fixing - the manual approach has the special case clearly called out, while the blindly escalate case has the bug hidden in derivation logic somewhere else. ~Andrew
Re: [PATCH] xen/arm: smmuv1: Revert associating the group pointer with the S2CR
Hi, On 19/04/2021 17:21, Stefano Stabellini wrote: On Mon, 19 Apr 2021, Rahul Singh wrote: Hi Julien, On 18 Apr 2021, at 6:48 pm, Julien Grall wrote: On 16/04/2021 17:41, Rahul Singh wrote: Hi Julien Hi Rahul, On 16 Apr 2021, at 5:08 pm, Julien Grall wrote: On 16/04/2021 17:05, Rahul Singh wrote: On 16 Apr 2021, at 4:23 pm, Julien Grall wrote: On 16/04/2021 16:01, Rahul Singh wrote: Hi Julien, Hi Rahul, On 16 Apr 2021, at 3:35 pm, Julien Grall wrote: Hi, On 16/04/2021 12:25, Rahul Singh wrote: Revert the code that associates the group pointer with the S2CR as this code causing an issue when the SMMU device has more than one master device. It is not clear to me why this change was first added. Are we missing any feature when reverting it? This feature was added when we backported the code from Linux to fix the stream match conflict issue as part of commit "xen/arm: smmuv1: Intelligent SMR allocation”. This is an extra feature added to allocate IOMMU group based on stream-id. If two device has the same stream-id then we assign those devices to the same group. If we revert the patch, then it would not be possible to use the SMMU if two devices use the same stream-id. Is that correct? No. If we revert the patch we can use the SMMU if two devices use the same stream-id without any issue but each device will be in a separate group.This is same behaviour before the code is merged. Ok. So there is no change in behavior. Good. Can you propose a commit message clarifying that? Please have a look if it make sense. xen/arm: smmuv1: Revert associating the group pointer with the S2CR Revert the code that associates the group pointer with the S2CR as this code causing an issue when the SMMU device has more than one master device with same stream-id. This issue is introduced by the below commit: “0435784cc75dcfef3b5f59c29deb1dbb84265ddb:xen/arm: smmuv1: Intelligent SMR allocation” Reverting the code will not impact to use of SMMU if two devices use the same stream-id but each device will be in a separate group. This is the same behaviour before the code is merged. Look good to me. Is this patch to be applied on top of Stefano's series? If not, is there going to be more clash? As per Stefano's mail he already tested his patch series on top of this patch. I think this patch has to merged before Stefano’s patch series Let Stefano also confirm that. I think there will be no more clashes. Yes, this patch is to be committed *before* my series and I have already tested this patch alone and with my series on top. Both cases work fine. Cool. Thanks for the confirmation. I have committed the patch with the new commit message (although, I tweaked a little bit to use the abbreviated version of the commit ID). Cheers, -- Julien Grall
Re: [PATCH v2] VT-d: Don't assume register-based invalidation is always supported
Hi, On 20/04/2021 13:25, Jan Beulich wrote: On 20.04.2021 14:14, Julien Grall wrote: Hi, It is not really my area of expertise but I wanted to jump on one comment below... On 20/04/2021 12:38, Jan Beulich wrote: On 01.04.2020 22:06, Chao Gao wrote: --- Changes in v2: - verify system suspension and resumption with this patch applied - don't fall back to register-based interface if enabling qinval failed. see the change in init_vtd_hw(). - remove unnecessary "queued_inval_supported" variable - constify the "struct vtd_iommu *" of has_register_based_invalidation() - coding-style changes ... while this suggests this is v2 of a recently sent patch, the submission is dated a little over a year ago. This is confusing. It is additionally confusing that there were two copies of it in my inbox, despite mails coming from a list normally getting de-duplicated somewhere at our end (I believe). --- a/xen/drivers/passthrough/vtd/iommu.c +++ b/xen/drivers/passthrough/vtd/iommu.c @@ -1193,6 +1193,14 @@ int __init iommu_alloc(struct acpi_drhd_unit *drhd) iommu->cap = dmar_readq(iommu->reg, DMAR_CAP_REG); iommu->ecap = dmar_readq(iommu->reg, DMAR_ECAP_REG); +iommu->version = dmar_readl(iommu->reg, DMAR_VER_REG); + +if ( !iommu_qinval && !has_register_based_invalidation(iommu) ) +{ +printk(XENLOG_WARNING VTDPREFIX "IOMMU %d: cannot disable Queued Invalidation.\n", + iommu->index); Here (and at least once more yet further down): We don't normally end log messages with a full stop. Easily addressable while committing, of course. I can find a large number of cases where log messages are ended with a full stop... Actually it looks more natural to me than your suggestion. Interesting. From purely a language perspective it indeed looks more natural, I agree. But when considering (serial) console bandwidth, we ought to try to eliminate _any_ output that's there without conveying information or making the conveyed information understandable. In fact I recall a number of cases (without having commit hashes to hand) where we deliberately dropped full stops. (The messages here aren't at risk of causing bandwidth issues, but as with any such generic item I think the goal ought to be consistency, and hence either full stops everywhere, or nowhere. If bandwidth was an issue here, I might also have suggested to shorten "Queued Invalidation" to just "QI".) I wasn't aware of such requirement in Xen... Although, I can see how this can be a concern. If you really want to enforce it, then it should be written in the CODING_STYLE. Alternatively, you could be a bit more verbose in your request so the other understand the reasoning behind it. Also, when you say "large number" - did you compare to the number of cases where there are no full stops? (I sincerely hope we don't have that many full stops left.) 42sh> ack "printk\(\"" | ack "\..n\"" | wc -l 130 42sh> ack "printk\(\"" | wc -l 1708 So ~7.5% of the printks. -- Julien Grall
Re: [PATCH v2] VT-d: Don't assume register-based invalidation is always supported
On Tue, Apr 20, 2021 at 01:38:26PM +0200, Jan Beulich wrote: >On 01.04.2020 22:06, Chao Gao wrote: >> According to Intel VT-d SPEC rev3.3 Section 6.5, Register-based Invalidation >> isn't supported by Intel VT-d version 6 and beyond. >> >> This hardware change impacts following two scenarios: admin can disable >> queued invalidation via 'qinval' cmdline and use register-based interface; >> VT-d switches to register-based invalidation when queued invalidation needs >> to be disabled, for example, during disabling x2apic or during system >> suspension or after enabling queued invalidation fails. >> >> To deal with this hardware change, if register-based invalidation isn't >> supported, queued invalidation cannot be disabled through Xen cmdline; and >> if queued invalidation has to be disabled temporarily in some scenarios, >> VT-d won't switch to register-based interface but use some dummy functions >> to catch errors in case there is any invalidation request issued when queued >> invalidation is disabled. >> >> Signed-off-by: Chao Gao > >In principle (with a minor nit further down) >Reviewed-by: Jan Beulich > >However, ... > >> --- >> Changes in v2: >> - verify system suspension and resumption with this patch applied >> - don't fall back to register-based interface if enabling qinval failed. >>see the change in init_vtd_hw(). >> - remove unnecessary "queued_inval_supported" variable >> - constify the "struct vtd_iommu *" of has_register_based_invalidation() >> - coding-style changes > >... while this suggests this is v2 of a recently sent patch, the >submission is dated a little over a year ago. This is confusing. >It is additionally confusing that there were two copies of it in >my inbox, despite mails coming from a list normally getting >de-duplicated somewhere at our end (I believe). You are right. I messed the system time of my server somehow. Sorry for this. If it is possible, please help to update the date of this patch also. > >> --- a/xen/drivers/passthrough/vtd/iommu.c >> +++ b/xen/drivers/passthrough/vtd/iommu.c >> @@ -1193,6 +1193,14 @@ int __init iommu_alloc(struct acpi_drhd_unit *drhd) >> >> iommu->cap = dmar_readq(iommu->reg, DMAR_CAP_REG); >> iommu->ecap = dmar_readq(iommu->reg, DMAR_ECAP_REG); >> +iommu->version = dmar_readl(iommu->reg, DMAR_VER_REG); >> + >> +if ( !iommu_qinval && !has_register_based_invalidation(iommu) ) >> +{ >> +printk(XENLOG_WARNING VTDPREFIX "IOMMU %d: cannot disable Queued >> Invalidation.\n", >> + iommu->index); > >Here (and at least once more yet further down): We don't normally end >log messages with a full stop. Easily addressable while committing, of >course. Okay. Please go ahead. Thanks Chao
[ovmf test] 161312: all pass - PUSHED
flight 161312 ovmf real [real] http://logs.test-lab.xenproject.org/osstest/logs/161312/ Perfect :-) All tests in this flight passed as required version targeted for testing: ovmf 0bbc20727598421c4e47d46b982246217df8c6bc baseline version: ovmf 64138c95db5a7a3e4768d8a01ba71dc3475e6524 Last test of basis 161301 2021-04-19 21:40:07 Z0 days Testing same since 161312 2021-04-20 04:56:06 Z0 days1 attempts People who touched revisions under test: Jason Jason Lou Kun Qin jobs: build-amd64-xsm pass build-i386-xsm pass build-amd64 pass build-i386 pass build-amd64-libvirt pass build-i386-libvirt pass build-amd64-pvopspass build-i386-pvops pass test-amd64-amd64-xl-qemuu-ovmf-amd64 pass test-amd64-i386-xl-qemuu-ovmf-amd64 pass sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Pushing revision : To xenbits.xen.org:/home/xen/git/osstest/ovmf.git 64138c95db..0bbc207275 0bbc20727598421c4e47d46b982246217df8c6bc -> xen-tested-master
Re: [PATCH v9 10/13] x86/smpboot: switch clone_mapping() to new APIs
On 06.04.2021 13:05, Hongyan Xia wrote: > @@ -742,51 +742,58 @@ static int clone_mapping(const void *ptr, > root_pgentry_t *rpt) > } > } > > +UNMAP_DOMAIN_PAGE(pl1e); > +UNMAP_DOMAIN_PAGE(pl2e); > +UNMAP_DOMAIN_PAGE(pl3e); Just one minor remark: A pedantic(?) compiler might warn about the setting to NULL of pl3e here, when > if ( !(root_get_flags(rpt[root_table_offset(linear)]) & _PAGE_PRESENT) ) > { > -pl3e = alloc_xen_pagetable(); > +mfn_t l3mfn; > + > +pl3e = alloc_map_clear_xen_pt(); > rc = -ENOMEM; > if ( !pl3e ) > goto out; > -clear_page(pl3e); > l4e_write([root_table_offset(linear)], > - l4e_from_paddr(__pa(pl3e), __PAGE_HYPERVISOR)); > + l4e_from_mfn(l3mfn, __PAGE_HYPERVISOR)); > } > else > -pl3e = l4e_to_l3e(rpt[root_table_offset(linear)]); > +pl3e = map_l3t_from_l4e(rpt[root_table_offset(linear)]); ... it is guaranteed to get initialized again before any further consumption. IOW strictly speaking the last of those three would want to be unmap_domain_page(), just like you have ... > @@ -802,6 +809,9 @@ static int clone_mapping(const void *ptr, root_pgentry_t > *rpt) > > rc = 0; > out: > +unmap_domain_page(pl1e); > +unmap_domain_page(pl2e); > +unmap_domain_page(pl3e); > return rc; > } ... here. Jan
Re: [PATCH v9 09/13] x86/smpboot: add exit path for clone_mapping()
On 06.04.2021 13:05, Hongyan Xia wrote: > From: Wei Liu > > We will soon need to clean up page table mappings in the exit path. > > No functional change. > > Signed-off-by: Wei Liu > Signed-off-by: Hongyan Xia Acked-by: Jan Beulich
Re: [PATCH v2] VT-d: Don't assume register-based invalidation is always supported
On 20.04.2021 14:14, Julien Grall wrote: > Hi, > > It is not really my area of expertise but I wanted to jump on one > comment below... > > On 20/04/2021 12:38, Jan Beulich wrote: >> On 01.04.2020 22:06, Chao Gao wrote: >>> --- >>> Changes in v2: >>> - verify system suspension and resumption with this patch applied >>> - don't fall back to register-based interface if enabling qinval failed. >>> see the change in init_vtd_hw(). >>> - remove unnecessary "queued_inval_supported" variable >>> - constify the "struct vtd_iommu *" of has_register_based_invalidation() >>> - coding-style changes >> >> ... while this suggests this is v2 of a recently sent patch, the >> submission is dated a little over a year ago. This is confusing. >> It is additionally confusing that there were two copies of it in >> my inbox, despite mails coming from a list normally getting >> de-duplicated somewhere at our end (I believe). >> >>> --- a/xen/drivers/passthrough/vtd/iommu.c >>> +++ b/xen/drivers/passthrough/vtd/iommu.c >>> @@ -1193,6 +1193,14 @@ int __init iommu_alloc(struct acpi_drhd_unit *drhd) >>> >>> iommu->cap = dmar_readq(iommu->reg, DMAR_CAP_REG); >>> iommu->ecap = dmar_readq(iommu->reg, DMAR_ECAP_REG); >>> +iommu->version = dmar_readl(iommu->reg, DMAR_VER_REG); >>> + >>> +if ( !iommu_qinval && !has_register_based_invalidation(iommu) ) >>> +{ >>> +printk(XENLOG_WARNING VTDPREFIX "IOMMU %d: cannot disable Queued >>> Invalidation.\n", >>> + iommu->index); >> >> Here (and at least once more yet further down): We don't normally end >> log messages with a full stop. Easily addressable while committing, of >> course. > > I can find a large number of cases where log messages are ended with a > full stop... Actually it looks more natural to me than your suggestion. Interesting. From purely a language perspective it indeed looks more natural, I agree. But when considering (serial) console bandwidth, we ought to try to eliminate _any_ output that's there without conveying information or making the conveyed information understandable. In fact I recall a number of cases (without having commit hashes to hand) where we deliberately dropped full stops. (The messages here aren't at risk of causing bandwidth issues, but as with any such generic item I think the goal ought to be consistency, and hence either full stops everywhere, or nowhere. If bandwidth was an issue here, I might also have suggested to shorten "Queued Invalidation" to just "QI".) Also, when you say "large number" - did you compare to the number of cases where there are no full stops? (I sincerely hope we don't have that many full stops left.) Jan
Re: [PATCH v9 01/13] x86/mm: rewrite virt_to_xen_l*e
On 06.04.2021 13:05, Hongyan Xia wrote: > From: Wei Liu > > Rewrite those functions to use the new APIs. Modify its callers to unmap > the pointer returned. Since alloc_xen_pagetable_new() is almost never > useful unless accompanied by page clearing and a mapping, introduce a > helper alloc_map_clear_xen_pt() for this sequence. > > Signed-off-by: Wei Liu > Signed-off-by: Hongyan Xia > > --- > Changed in v9: > - use domain_page_map_to_mfn() around the L3 table locking logic. > - remove vmap_to_mfn() changes since we now use xen_map_to_mfn(). > > Changed in v8: > - s/virtual address/linear address/. > - BUG_ON() on NULL return in vmap_to_mfn(). > > Changed in v7: > - remove a comment. > - use l1e_get_mfn() instead of converting things back and forth. > - add alloc_map_clear_xen_pt(). I realize this was in v7 already, but at v6 time the name I suggested was void *alloc_mapped_pagetable(mfn_t *mfn); "alloc_map_clear_xen", for my taste at least, is too strange. It doesn't really matter whether it's a page table for Xen's own use (it typically will be), so "xen" could be dropped. Clearing of a page table ought to also be the rule rather than the exception, so I'd see "clear" also dropped. I'd be fine with alloc_map_pt() or about any intermediate variant between that and my originally suggested name. > @@ -5108,7 +5140,8 @@ int map_pages_to_xen( > unsigned int flags) > { > bool locking = system_state > SYS_STATE_boot; > -l2_pgentry_t *pl2e, ol2e; > +l3_pgentry_t *pl3e = NULL, ol3e; > +l2_pgentry_t *pl2e = NULL, ol2e; > l1_pgentry_t *pl1e, ol1e; > unsigned int i; > int rc = -ENOMEM; > @@ -5132,15 +5165,16 @@ int map_pages_to_xen( > > while ( nr_mfns != 0 ) > { > -l3_pgentry_t *pl3e, ol3e; > - > +/* Clean up the previous iteration. */ > L3T_UNLOCK(current_l3page); > +UNMAP_DOMAIN_PAGE(pl3e); > +UNMAP_DOMAIN_PAGE(pl2e); Doing this here suggests the lower-case equivalent is needed at the out label, even without looking at the rest of the function (doing so confirms the suspicion, as there's at least one "goto out" with pl2e clearly still mapped). > @@ -5305,6 +5339,8 @@ int map_pages_to_xen( > pl1e = virt_to_xen_l1e(virt); > if ( pl1e == NULL ) > goto out; > + > +UNMAP_DOMAIN_PAGE(pl1e); > } Unmapping the page right after mapping it looks suspicious. I see that further down we have pl1e = l2e_to_l1e(*pl2e) + l1_table_offset(virt); but don't you need to also change that? Actually, you do in patch 2, but the latest then the double mapping should imo be avoided. > @@ -5505,6 +5542,7 @@ int populate_pt_range(unsigned long virt, unsigned long > nr_mfns) > int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf) > { > bool locking = system_state > SYS_STATE_boot; > +l3_pgentry_t *pl3e = NULL; > l2_pgentry_t *pl2e; > l1_pgentry_t *pl1e; > unsigned int i; And here we have the opposite situation: You don't set pl2e to NULL and the function only uses l3e_to_l2e() to initialize the variable, yet ... > @@ -5761,6 +5799,8 @@ int modify_xen_mappings(unsigned long s, unsigned long > e, unsigned int nf) > > out: > L3T_UNLOCK(current_l3page); > +unmap_domain_page(pl2e); > +unmap_domain_page(pl3e); ... here you try to unmap it. Did the two respective hunks somehow magically get swapped? > --- a/xen/common/vmap.c > +++ b/xen/common/vmap.c > @@ -1,6 +1,7 @@ > #ifdef VMAP_VIRT_START > #include > #include > +#include Why is this needed? (Looks like a now stale change.) Jan
Re: [PATCH v4] xen/arm64: Place a speculation barrier following an ret instruction
Hi, On 19/04/2021 19:24, Stefano Stabellini wrote: On Mon, 19 Apr 2021, Bertrand Marquis wrote: Hi Julien, On 18 Apr 2021, at 19:03, Julien Grall wrote: From: Julien Grall Some CPUs can speculate past a RET instruction and potentially perform speculative accesses to memory before processing the return. There is no known gadget available after the RET instruction today. However some of the registers (such as in check_pending_guest_serror()) may contain a value provided by the guest. In order to harden the code, it would be better to add a speculation barrier after each RET instruction. The performance impact is meant to be negligeable as the speculation barrier is not meant to be architecturally executed. Rather than manually inserting a speculation barrier, use a macro which overrides the mnemonic RET and replace with RET + SB. We need to use the opcode for RET to prevent any macro recursion. This patch is only covering the assembly code. C code would need to be covered separately using the compiler support. Note that the definition of the macros sb needs to be moved earlier in asm-arm/macros.h so it can be used by the new macro. This is part of the work to mitigate straight-line speculation. Signed-off-by: Julien Grall Reviewed-by: Bertrand Marquis Acked-by: Stefano Stabellini Thanks both! I have committed the patch. Cheers, -- Julien Grall
Re: [PATCH v2] VT-d: Don't assume register-based invalidation is always supported
Hi, It is not really my area of expertise but I wanted to jump on one comment below... On 20/04/2021 12:38, Jan Beulich wrote: On 01.04.2020 22:06, Chao Gao wrote: --- Changes in v2: - verify system suspension and resumption with this patch applied - don't fall back to register-based interface if enabling qinval failed. see the change in init_vtd_hw(). - remove unnecessary "queued_inval_supported" variable - constify the "struct vtd_iommu *" of has_register_based_invalidation() - coding-style changes ... while this suggests this is v2 of a recently sent patch, the submission is dated a little over a year ago. This is confusing. It is additionally confusing that there were two copies of it in my inbox, despite mails coming from a list normally getting de-duplicated somewhere at our end (I believe). --- a/xen/drivers/passthrough/vtd/iommu.c +++ b/xen/drivers/passthrough/vtd/iommu.c @@ -1193,6 +1193,14 @@ int __init iommu_alloc(struct acpi_drhd_unit *drhd) iommu->cap = dmar_readq(iommu->reg, DMAR_CAP_REG); iommu->ecap = dmar_readq(iommu->reg, DMAR_ECAP_REG); +iommu->version = dmar_readl(iommu->reg, DMAR_VER_REG); + +if ( !iommu_qinval && !has_register_based_invalidation(iommu) ) +{ +printk(XENLOG_WARNING VTDPREFIX "IOMMU %d: cannot disable Queued Invalidation.\n", + iommu->index); Here (and at least once more yet further down): We don't normally end log messages with a full stop. Easily addressable while committing, of course. I can find a large number of cases where log messages are ended with a full stop... Actually it looks more natural to me than your suggestion. Cheers, -- Julien Grall
Re: [PATCH v2] VT-d: Don't assume register-based invalidation is always supported
On 01.04.2020 22:06, Chao Gao wrote: > According to Intel VT-d SPEC rev3.3 Section 6.5, Register-based Invalidation > isn't supported by Intel VT-d version 6 and beyond. > > This hardware change impacts following two scenarios: admin can disable > queued invalidation via 'qinval' cmdline and use register-based interface; > VT-d switches to register-based invalidation when queued invalidation needs > to be disabled, for example, during disabling x2apic or during system > suspension or after enabling queued invalidation fails. > > To deal with this hardware change, if register-based invalidation isn't > supported, queued invalidation cannot be disabled through Xen cmdline; and > if queued invalidation has to be disabled temporarily in some scenarios, > VT-d won't switch to register-based interface but use some dummy functions > to catch errors in case there is any invalidation request issued when queued > invalidation is disabled. > > Signed-off-by: Chao Gao In principle (with a minor nit further down) Reviewed-by: Jan Beulich However, ... > --- > Changes in v2: > - verify system suspension and resumption with this patch applied > - don't fall back to register-based interface if enabling qinval failed. >see the change in init_vtd_hw(). > - remove unnecessary "queued_inval_supported" variable > - constify the "struct vtd_iommu *" of has_register_based_invalidation() > - coding-style changes ... while this suggests this is v2 of a recently sent patch, the submission is dated a little over a year ago. This is confusing. It is additionally confusing that there were two copies of it in my inbox, despite mails coming from a list normally getting de-duplicated somewhere at our end (I believe). > --- a/xen/drivers/passthrough/vtd/iommu.c > +++ b/xen/drivers/passthrough/vtd/iommu.c > @@ -1193,6 +1193,14 @@ int __init iommu_alloc(struct acpi_drhd_unit *drhd) > > iommu->cap = dmar_readq(iommu->reg, DMAR_CAP_REG); > iommu->ecap = dmar_readq(iommu->reg, DMAR_ECAP_REG); > +iommu->version = dmar_readl(iommu->reg, DMAR_VER_REG); > + > +if ( !iommu_qinval && !has_register_based_invalidation(iommu) ) > +{ > +printk(XENLOG_WARNING VTDPREFIX "IOMMU %d: cannot disable Queued > Invalidation.\n", > + iommu->index); Here (and at least once more yet further down): We don't normally end log messages with a full stop. Easily addressable while committing, of course. Jan
Re: [PATCH v2 2/2] x86/cpuid: support LFENCE always serializing CPUID bit
On 20.04.2021 12:47, Roger Pau Monné wrote: > On Tue, Apr 20, 2021 at 12:35:54PM +0200, Jan Beulich wrote: >> I'd like to give Andrew a day or two more to respond there in case he >> continues to see an issue, before I commit that with your R-b and this >> one here. I'll assume you'll subsequently take care of that missing >> piece then - if not, i.e. if e.g. I should, please let me know. > > I think it should be something like the above, Right (assuming you meant "below). > in fact I think it > would be perfectly fine to merge that chunk into your patch? I'd rather not, so that this change can have its own reasoning in its description. Jan > diff --git a/xen/arch/x86/cpuid.c b/xen/arch/x86/cpuid.c > index 050cd5713e2..daf501779fe 100644 > --- a/xen/arch/x86/cpuid.c > +++ b/xen/arch/x86/cpuid.c > @@ -314,12 +314,9 @@ static void __init calculate_host_policy(void) > > *p = raw_cpuid_policy; > > -p->basic.max_leaf = > -min_t(uint32_t, p->basic.max_leaf, ARRAY_SIZE(p->basic.raw) - 1); > -p->feat.max_subleaf = > -min_t(uint32_t, p->feat.max_subleaf, ARRAY_SIZE(p->feat.raw) - 1); > -p->extd.max_leaf = 0x8000 | min_t(uint32_t, p->extd.max_leaf & > 0x, > - ARRAY_SIZE(p->extd.raw) - 1); > +p->basic.max_leaf = ARRAY_SIZE(p->basic.raw) - 1; > +p->feat.max_subleaf = ARRAY_SIZE(p->feat.raw) - 1; > +p->extd.max_leaf = 0x8000 | ARRAY_SIZE(p->extd.raw) - 1; > > cpuid_featureset_to_policy(boot_cpu_data.x86_capability, p); > recalculate_xstate(p); >
[PATCH v2] VT-d: Don't assume register-based invalidation is always supported
According to Intel VT-d SPEC rev3.3 Section 6.5, Register-based Invalidation isn't supported by Intel VT-d version 6 and beyond. This hardware change impacts following two scenarios: admin can disable queued invalidation via 'qinval' cmdline and use register-based interface; VT-d switches to register-based invalidation when queued invalidation needs to be disabled, for example, during disabling x2apic or during system suspension or after enabling queued invalidation fails. To deal with this hardware change, if register-based invalidation isn't supported, queued invalidation cannot be disabled through Xen cmdline; and if queued invalidation has to be disabled temporarily in some scenarios, VT-d won't switch to register-based interface but use some dummy functions to catch errors in case there is any invalidation request issued when queued invalidation is disabled. Signed-off-by: Chao Gao --- Changes in v2: - verify system suspension and resumption with this patch applied - don't fall back to register-based interface if enabling qinval failed. see the change in init_vtd_hw(). - remove unnecessary "queued_inval_supported" variable - constify the "struct vtd_iommu *" of has_register_based_invalidation() - coding-style changes --- docs/misc/xen-command-line.pandoc| 4 +++- xen/drivers/passthrough/vtd/iommu.c | 27 +-- xen/drivers/passthrough/vtd/iommu.h | 7 ++ xen/drivers/passthrough/vtd/qinval.c | 33 ++-- 4 files changed, 66 insertions(+), 5 deletions(-) diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc index deef6d0b4c..4ff4a87844 100644 --- a/docs/misc/xen-command-line.pandoc +++ b/docs/misc/xen-command-line.pandoc @@ -1442,7 +1442,9 @@ The following options are specific to Intel VT-d hardware: * The `qinval` boolean controls the Queued Invalidation sub-feature, and is active by default on compatible hardware. Queued Invalidation is a feature in second-generation IOMMUs and is a functional prerequisite for -Interrupt Remapping. +Interrupt Remapping. Note that Xen disregards this setting for Intel VT-d +version 6 and greater as Registered-Based Invalidation isn't supported +by them. * The `igfx` boolean is active by default, and controls whether the IOMMU in front of an Intel Graphics Device is enabled or not. diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c index 6428c8fe3e..94d1372903 100644 --- a/xen/drivers/passthrough/vtd/iommu.c +++ b/xen/drivers/passthrough/vtd/iommu.c @@ -1193,6 +1193,14 @@ int __init iommu_alloc(struct acpi_drhd_unit *drhd) iommu->cap = dmar_readq(iommu->reg, DMAR_CAP_REG); iommu->ecap = dmar_readq(iommu->reg, DMAR_ECAP_REG); +iommu->version = dmar_readl(iommu->reg, DMAR_VER_REG); + +if ( !iommu_qinval && !has_register_based_invalidation(iommu) ) +{ +printk(XENLOG_WARNING VTDPREFIX "IOMMU %d: cannot disable Queued Invalidation.\n", + iommu->index); +iommu_qinval = true; +} if ( iommu_verbose ) { @@ -2141,6 +2149,10 @@ static int __must_check init_vtd_hw(bool resume) */ if ( enable_qinval(iommu) != 0 ) { +/* Ensure register-based invalidation is available */ +if ( !has_register_based_invalidation(iommu) ) +return -EIO; + iommu->flush.context = vtd_flush_context_reg; iommu->flush.iotlb = vtd_flush_iotlb_reg; } @@ -2231,6 +2243,7 @@ static int __init vtd_setup(void) struct acpi_drhd_unit *drhd; struct vtd_iommu *iommu; int ret; +bool reg_inval_supported = true; if ( list_empty(_drhd_units) ) { @@ -2252,8 +2265,8 @@ static int __init vtd_setup(void) } /* We enable the following features only if they are supported by all VT-d - * engines: Snoop Control, DMA passthrough, Queued Invalidation, Interrupt - * Remapping, and Posted Interrupt + * engines: Snoop Control, DMA passthrough, Register-based Invalidation, + * Queued Invalidation, Interrupt Remapping, and Posted Interrupt. */ for_each_drhd_unit ( drhd ) { @@ -2275,6 +2288,9 @@ static int __init vtd_setup(void) if ( iommu_qinval && !ecap_queued_inval(iommu->ecap) ) iommu_qinval = 0; +if ( !has_register_based_invalidation(iommu) ) +reg_inval_supported = false; + if ( iommu_intremap && !ecap_intr_remap(iommu->ecap) ) iommu_intremap = iommu_intremap_off; @@ -2301,6 +2317,13 @@ static int __init vtd_setup(void) softirq_tasklet_init(_fault_tasklet, do_iommu_page_fault, NULL); +if ( !iommu_qinval && !reg_inval_supported ) +{ +dprintk(XENLOG_ERR VTDPREFIX, "No available invalidation interface.\n"); +ret = -ENODEV; +goto error; +} + if ( !iommu_qinval && iommu_intremap ) {
Re: [PATCH] xen-mapcache: avoid a race on memory map while using MAP_FIXED
On Tue, Apr 20, 2021 at 10:51:47AM +0100, Igor Druzhinin wrote: > On 20/04/2021 04:39, no-re...@patchew.org wrote: > > === OUTPUT BEGIN === > > ERROR: Author email address is mangled by the mailing list > > #2: > > Author: Igor Druzhinin via > > > > total: 1 errors, 0 warnings, 21 lines checked > > > > Anthony, > > Is there any action required from me here? I don't completely understand how > that happened. But I found this: No action, I just need to remember to fix the patch's author before submitting a pull request. Citrix's policy is just too strict and don't even allow space changes in some headers. -- Anthony PERARD
Re: [PATCH v2 2/2] x86/cpuid: support LFENCE always serializing CPUID bit
On Tue, Apr 20, 2021 at 12:35:54PM +0200, Jan Beulich wrote: > On 15.04.2021 16:47, Roger Pau Monne wrote: > > AMD Milan (Zen3) CPUs have an LFENCE Always Serializing CPUID bit in > > leaf 8021.eax. Previous AMD versions used to have a user settable > > bit in DE_CFG MSR to select whether LFENCE was dispatch serializing, > > which Xen always attempts to set. The forcefully always on setting is > > due to the addition of SEV-SNP so that a VMM cannot break the > > confidentiality of a guest. > > > > In order to support this new CPUID bit move the LFENCE_DISPATCH > > synthetic CPUID bit to map the hardware bit (leaving a hole in the > > synthetic range) and either rely on the bit already being set by the > > native CPUID output, or attempt to fake it in Xen by modifying the > > DE_CFG MSR. This requires adding one more entry to the featureset to > > support leaf 8021.eax. > > > > The bit is exposed to guests by default if the underlying hardware > > supports leaf 8021, as a way to signal that LFENCE is always > > serializing. Hardware that doesn't have the leaf might also get the > > bit set because Xen has performed the necessary arrangements, but > > that's not yet exposed to guests. Note that Xen doesn't allow guests > > to change the DE_CFG value, so once set by Xen LFENCE will always be > > serializing. > > > > Note that the access to DE_CFG by guests is left as-is: reads will > > unconditionally return LFENCE_SERIALISE bit set, while writes are > > silently dropped. > > > > Suggested-by: Andrew Cooper > > Signed-off-by: Roger Pau Monné > > Reviewed-by: Jan Beulich > > > --- > > Note this doesn't yet expose the bit on hardware that doesn't support > > leaf 8021. It's still TBD whether we want to hardcode this support > > manually, or instead rely on a more general approach like the one > > suggested by the shrink max CPUID leaf patch from Jan. > > I'd like to give Andrew a day or two more to respond there in case he > continues to see an issue, before I commit that with your R-b and this > one here. I'll assume you'll subsequently take care of that missing > piece then - if not, i.e. if e.g. I should, please let me know. I think it should be something like the above, in fact I think it would be perfectly fine to merge that chunk into your patch? Thanks, Roger. ---8<--- diff --git a/xen/arch/x86/cpuid.c b/xen/arch/x86/cpuid.c index 050cd5713e2..daf501779fe 100644 --- a/xen/arch/x86/cpuid.c +++ b/xen/arch/x86/cpuid.c @@ -314,12 +314,9 @@ static void __init calculate_host_policy(void) *p = raw_cpuid_policy; -p->basic.max_leaf = -min_t(uint32_t, p->basic.max_leaf, ARRAY_SIZE(p->basic.raw) - 1); -p->feat.max_subleaf = -min_t(uint32_t, p->feat.max_subleaf, ARRAY_SIZE(p->feat.raw) - 1); -p->extd.max_leaf = 0x8000 | min_t(uint32_t, p->extd.max_leaf & 0x, - ARRAY_SIZE(p->extd.raw) - 1); +p->basic.max_leaf = ARRAY_SIZE(p->basic.raw) - 1; +p->feat.max_subleaf = ARRAY_SIZE(p->feat.raw) - 1; +p->extd.max_leaf = 0x8000 | ARRAY_SIZE(p->extd.raw) - 1; cpuid_featureset_to_policy(boot_cpu_data.x86_capability, p); recalculate_xstate(p);
Re: [PATCH v2 2/2] x86/cpuid: support LFENCE always serializing CPUID bit
On 15.04.2021 16:47, Roger Pau Monne wrote: > AMD Milan (Zen3) CPUs have an LFENCE Always Serializing CPUID bit in > leaf 8021.eax. Previous AMD versions used to have a user settable > bit in DE_CFG MSR to select whether LFENCE was dispatch serializing, > which Xen always attempts to set. The forcefully always on setting is > due to the addition of SEV-SNP so that a VMM cannot break the > confidentiality of a guest. > > In order to support this new CPUID bit move the LFENCE_DISPATCH > synthetic CPUID bit to map the hardware bit (leaving a hole in the > synthetic range) and either rely on the bit already being set by the > native CPUID output, or attempt to fake it in Xen by modifying the > DE_CFG MSR. This requires adding one more entry to the featureset to > support leaf 8021.eax. > > The bit is exposed to guests by default if the underlying hardware > supports leaf 8021, as a way to signal that LFENCE is always > serializing. Hardware that doesn't have the leaf might also get the > bit set because Xen has performed the necessary arrangements, but > that's not yet exposed to guests. Note that Xen doesn't allow guests > to change the DE_CFG value, so once set by Xen LFENCE will always be > serializing. > > Note that the access to DE_CFG by guests is left as-is: reads will > unconditionally return LFENCE_SERIALISE bit set, while writes are > silently dropped. > > Suggested-by: Andrew Cooper > Signed-off-by: Roger Pau Monné Reviewed-by: Jan Beulich > --- > Note this doesn't yet expose the bit on hardware that doesn't support > leaf 8021. It's still TBD whether we want to hardcode this support > manually, or instead rely on a more general approach like the one > suggested by the shrink max CPUID leaf patch from Jan. I'd like to give Andrew a day or two more to respond there in case he continues to see an issue, before I commit that with your R-b and this one here. I'll assume you'll subsequently take care of that missing piece then - if not, i.e. if e.g. I should, please let me know. Jan
Re: [PATCH v2 3/3] docs/doxygen: doxygen documentation for grant_table.h
On 20.04.2021 11:42, Luca Fancellu wrote: >> On 20 Apr 2021, at 10:14, Jan Beulich wrote: >> On 20.04.2021 10:46, Luca Fancellu wrote: On 19 Apr 2021, at 12:05, Jan Beulich wrote: On 19.04.2021 11:12, Luca Fancellu wrote: > - */ > - > -/* > - * Reference to a grant entry in a specified domain's grant table. > - */ > -typedef uint32_t grant_ref_t; Why does this get moved ... > - > -/* > + * > * A grant table comprises a packed array of grant entries in one or more > * page frames shared between Xen and a guest. > + * > * [XEN]: This field is written by Xen and read by the sharing guest. > + * > * [GST]: This field is written by the guest and read by Xen. > + * > + * @addtogroup grant_table Grant Tables > + * @{ > */ > > -/* > - * Version 1 of the grant table entry structure is maintained purely > - * for backwards compatibility. New guests should use version 2. > +/** > + * Reference to a grant entry in a specified domain's grant table. > */ > +typedef uint32_t grant_ref_t; ... here, past a comment unrelated to it? >>> >>> The comment “Version 1 of the grant table entry […]” was moved on top of >>> the struct grant_entry_v1, in this way >>> Doxygen associate the comment to that structure. >>> >>> The comment “Reference to a grant entry in a specified domain's grant >>> table.” Is moved on top of typedef uint32_t grant_ref_t >>> for the same reason above >> >> But it's the other comment ("A grant table comprises ...") that I >> was asking about. > > I thought it was part of the brief about grant tables, am I wrong? For that > reason I moved it. Well, the comment talks about grant table entries (the layout of which gets defined immediately below, as visible in the original patch), not grant references. > @@ -243,23 +258,27 @@ union grant_entry_v2 { > * In that case, the frame field has the same semantics as the > * field of the same name in the V1 entry structure. > */ > +/** @cond skip anonymous struct/union for doxygen */ >struct { >grant_entry_header_t hdr; >uint32_t pad0; >uint64_t frame; >} full_page; > +/** @endcond */ > >/* > * If the grant type is GTF_grant_access and GTF_sub_page is set, > * @domid is allowed to access bytes [@page_off,@page_off+@length) > * in frame @frame. > */ > +/** @cond skip anonymous struct/union for doxygen */ >struct { >grant_entry_header_t hdr; >uint16_t page_off; >uint16_t length; >uint64_t frame; >} sub_page; > +/** @endcond */ > >/* > * If the grant is GTF_transitive, @domid is allowed to use the > @@ -270,12 +289,14 @@ union grant_entry_v2 { > * The current version of Xen does not allow transitive grants > * to be mapped. > */ > +/** @cond skip anonymous struct/union for doxygen */ >struct { >grant_entry_header_t hdr; >domid_t trans_domid; >uint16_t pad0; >grant_ref_t gref; >} transitive; > +/** @endcond */ While already better than the introduction of strange struct tags, I'm still not convinced we want this extra clutter (sorry). Plus - don't these additions mean the sub-structures then won't be represented in the generated doc, rendering it (partly) useless? >>> >>> Are you suggesting to remove the structure from docs? >> >> Just yet I'm not suggesting anything here - I was merely guessing at >> the effect of "@cond" and the associated "skip ..." to mean that this >> construct makes doxygen skip the enclosed section. If that's not the >> effect, then maybe the comment wants rewording? (And then - what does >> @cond mean?) > > Yes you were right, in the documentation the structure grant_entry_v2 won’t > display the > anonymous union. In which case I'm now completely unclear about your prior question. Jan
Re: [PATCH] xen-mapcache: avoid a race on memory map while using MAP_FIXED
On Tue, Apr 20, 2021 at 10:45:03AM +0100, Igor Druzhinin wrote: > On 20/04/2021 09:53, Roger Pau Monné wrote: > > On Tue, Apr 20, 2021 at 04:35:02AM +0100, Igor Druzhinin wrote: > > > When we're replacing the existing mapping there is possibility of a race > > > on memory map with other threads doing mmap operations - the address being > > > unmapped/re-mapped could be occupied by another thread in between. > > > > > > Linux mmap man page recommends keeping the existing mappings in place to > > > reserve the place and instead utilize the fact that the next mmap > > > operation > > > with MAP_FIXED flag passed will implicitly destroy the existing mappings > > > behind the chosen address. This behavior is guaranteed by POSIX / BSD and > > > therefore is portable. > > > > > > Note that it wouldn't make the replacement atomic for parallel accesses to > > > the replaced region - those might still fail with SIGBUS due to > > > xenforeignmemory_map not being atomic. So we're still not expecting those. > > > > > > Tested-by: Anthony PERARD > > > Signed-off-by: Igor Druzhinin > > > > Should we add a 'Fixes: 759235653de ...' or similar tag here? > > I was thinking about it and decided - no. There wasn't a bug here until QEMU > introduced usage of iothreads at the state loading phase. Originally this > process was totally single-threaded. And it's hard to pinpoint the exact > moment when it happened which is also not directly related to the change > here. Right, might be worth mentioning in the commit message then, that the code was designed to be used without any parallelism, and that the addition of iothreads is what actually triggered the bug here. Thanks, Roger.
[linux-linus test] 161298: regressions - FAIL
flight 161298 linux-linus real [real] http://logs.test-lab.xenproject.org/osstest/logs/161298/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-i386-xl-qemuu-ws16-amd64 7 xen-install fail REGR. vs. 152332 test-amd64-i386-xl-qemuu-debianhvm-amd64-shadow 7 xen-install fail REGR. vs. 152332 test-amd64-i386-xl-xsm7 xen-install fail REGR. vs. 152332 test-amd64-i386-qemut-rhel6hvm-intel 7 xen-install fail REGR. vs. 152332 test-amd64-i386-xl-qemuu-dmrestrict-amd64-dmrestrict 7 xen-install fail REGR. vs. 152332 test-amd64-i386-xl-qemut-debianhvm-amd64 7 xen-install fail REGR. vs. 152332 test-amd64-i386-examine 6 xen-install fail REGR. vs. 152332 test-amd64-i386-libvirt 7 xen-install fail REGR. vs. 152332 test-amd64-i386-xl-qemuu-debianhvm-i386-xsm 7 xen-install fail REGR. vs. 152332 test-amd64-i386-xl-qemuu-debianhvm-amd64 7 xen-install fail REGR. vs. 152332 test-amd64-coresched-i386-xl 7 xen-install fail REGR. vs. 152332 test-amd64-i386-pair 10 xen-install/src_host fail REGR. vs. 152332 test-amd64-i386-pair 11 xen-install/dst_host fail REGR. vs. 152332 test-amd64-i386-qemut-rhel6hvm-amd 7 xen-installfail REGR. vs. 152332 test-amd64-i386-xl7 xen-install fail REGR. vs. 152332 test-amd64-i386-xl-qemut-ws16-amd64 7 xen-install fail REGR. vs. 152332 test-amd64-i386-qemuu-rhel6hvm-amd 7 xen-installfail REGR. vs. 152332 test-amd64-i386-qemuu-rhel6hvm-intel 7 xen-install fail REGR. vs. 152332 test-amd64-i386-xl-pvshim 7 xen-install fail REGR. vs. 152332 test-amd64-i386-freebsd10-amd64 7 xen-install fail REGR. vs. 152332 test-amd64-i386-xl-raw7 xen-install fail REGR. vs. 152332 test-amd64-i386-xl-qemut-debianhvm-i386-xsm 7 xen-install fail REGR. vs. 152332 test-amd64-i386-xl-shadow 7 xen-install fail REGR. vs. 152332 test-amd64-i386-freebsd10-i386 7 xen-installfail REGR. vs. 152332 test-amd64-i386-libvirt-xsm 7 xen-install fail REGR. vs. 152332 test-amd64-i386-xl-qemuu-ovmf-amd64 7 xen-install fail REGR. vs. 152332 test-amd64-i386-xl-qemut-win7-amd64 7 xen-install fail REGR. vs. 152332 test-amd64-i386-libvirt-pair 10 xen-install/src_host fail REGR. vs. 152332 test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm 7 xen-install fail REGR. vs. 152332 test-amd64-i386-libvirt-pair 11 xen-install/dst_host fail REGR. vs. 152332 test-amd64-i386-xl-qemuu-win7-amd64 7 xen-install fail REGR. vs. 152332 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 7 xen-install fail REGR. vs. 152332 test-amd64-amd64-amd64-pvgrub 20 guest-stop fail REGR. vs. 152332 test-amd64-amd64-i386-pvgrub 20 guest-stop fail REGR. vs. 152332 Tests which are failing intermittently (not blocking): test-amd64-amd64-examine4 memdisk-try-append fail in 161284 pass in 161298 test-amd64-amd64-xl-qemut-debianhvm-i386-xsm 12 debian-hvm-install fail pass in 161284 Tests which did not succeed, but are not blocking: test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 152332 test-armhf-armhf-libvirt 16 saverestore-support-checkfail like 152332 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 152332 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 152332 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 152332 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail like 152332 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 152332 test-arm64-arm64-xl-seattle 15 migrate-support-checkfail never pass test-arm64-arm64-xl-seattle 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt 15 migrate-support-checkfail never pass test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl 15 migrate-support-checkfail never pass test-arm64-arm64-xl 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit1 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit1 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit2 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail never pass test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail never pass
Re: [PATCH] xen-mapcache: avoid a race on memory map while using MAP_FIXED
On 20/04/2021 04:39, no-re...@patchew.org wrote: Patchew URL: https://patchew.org/QEMU/1618889702-13104-1-git-send-email-igor.druzhi...@citrix.com/ Hi, This series seems to have some coding style problems. See output below for more information: Type: series Message-id: 1618889702-13104-1-git-send-email-igor.druzhi...@citrix.com Subject: [PATCH] xen-mapcache: avoid a race on memory map while using MAP_FIXED === TEST SCRIPT BEGIN === #!/bin/bash git rev-parse base > /dev/null || exit 0 git config --local diff.renamelimit 0 git config --local diff.renames True git config --local diff.algorithm histogram ./scripts/checkpatch.pl --mailback base.. === TEST SCRIPT END === Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384 From https://github.com/patchew-project/qemu * [new tag] patchew/1618889702-13104-1-git-send-email-igor.druzhi...@citrix.com -> patchew/1618889702-13104-1-git-send-email-igor.druzhi...@citrix.com Switched to a new branch 'test' 3102519 xen-mapcache: avoid a race on memory map while using MAP_FIXED === OUTPUT BEGIN === ERROR: Author email address is mangled by the mailing list #2: Author: Igor Druzhinin via total: 1 errors, 0 warnings, 21 lines checked Anthony, Is there any action required from me here? I don't completely understand how that happened. But I found this: "In some cases the Author: email address in patches submitted to the list gets mangled such that it says John Doe via Qemu-devel This change is a result of workarounds for DMARC policies. Subsystem maintainers accepting patches need to catch these and fix them before sending pull requests, so a checkpatch.pl test is highly desirable." Igor
Re: [PATCH] xen-mapcache: avoid a race on memory map while using MAP_FIXED
On 20/04/2021 09:53, Roger Pau Monné wrote: On Tue, Apr 20, 2021 at 04:35:02AM +0100, Igor Druzhinin wrote: When we're replacing the existing mapping there is possibility of a race on memory map with other threads doing mmap operations - the address being unmapped/re-mapped could be occupied by another thread in between. Linux mmap man page recommends keeping the existing mappings in place to reserve the place and instead utilize the fact that the next mmap operation with MAP_FIXED flag passed will implicitly destroy the existing mappings behind the chosen address. This behavior is guaranteed by POSIX / BSD and therefore is portable. Note that it wouldn't make the replacement atomic for parallel accesses to the replaced region - those might still fail with SIGBUS due to xenforeignmemory_map not being atomic. So we're still not expecting those. Tested-by: Anthony PERARD Signed-off-by: Igor Druzhinin Should we add a 'Fixes: 759235653de ...' or similar tag here? I was thinking about it and decided - no. There wasn't a bug here until QEMU introduced usage of iothreads at the state loading phase. Originally this process was totally single-threaded. And it's hard to pinpoint the exact moment when it happened which is also not directly related to the change here. Igor
Re: [PATCH v2 3/3] docs/doxygen: doxygen documentation for grant_table.h
> On 20 Apr 2021, at 10:14, Jan Beulich wrote: > > On 20.04.2021 10:46, Luca Fancellu wrote: >>> On 19 Apr 2021, at 12:05, Jan Beulich wrote: >>> >>> On 19.04.2021 11:12, Luca Fancellu wrote: Modification to include/public/grant_table.h: 1) Add doxygen tags to: - Create Grant tables section - include variables in the generated documentation 2) Add .rst file for grant table for Arm64 >>> >>> I'm missing some reasoning about at least some of the changes done >>> to grant_table.h. Looking at this and the earlier patches I also >>> couldn't spot any general outline of what is acceptable or even >>> necessary in such a header to be understood by doxygen. Without >>> this written down somewhere (or, if documented elsewhere, a >>> pointer provided to that doc) I'm afraid things might get screwed >>> up again later on. >> >> Doxygen is a tool that generates documentation based on parsing the source >> code comments, it recognises some >> commands in the comments and builds the documentation sticking to what you >> wrote. >> >> Here the doxygen docs: https://www.doxygen.nl/manual/docblocks.html >> >> Basically it doesn’t react to all comments, it parses only some well crafted >> comments as explained in its documentation. > > Providing this link somewhere might be helpful. However, also seeing > some of your further comments, it feels like to edit a public header > enabled for doxygen one has to know fair parts of this documentation. > While I'm certainly in favor of having a way to generate docs from > the headers, I'm afraid I don't think such knowledge should become a > prereq to e.g. adding a new sub-function of a hypercall. So far I was > assuming that the formatting requirements would be quite limited, and > that it would hence be possible to just glean them from existing code. > But e.g. the "/**<" notation isn't going to be obvious to spot. Yes I see, some knowledge of the documentation tool is required but I’m sure that if anyone is able to put his/her hands on the xen code, he/she will be able to use some of the doxygen tag too, as you can see from this patch the usage is easy and some less obvious thing can be understood looking the source code and the corresponding generated doc page. > --- a/docs/hypercall-interfaces/arm64.rst +++ b/docs/hypercall-interfaces/arm64.rst @@ -8,6 +8,7 @@ Starting points .. toctree:: :maxdepth: 2 + arm64/grant_tables Functions diff --git a/docs/hypercall-interfaces/arm64/grant_tables.rst b/docs/hypercall-interfaces/arm64/grant_tables.rst new file mode 100644 index 00..8955ec5812 --- /dev/null +++ b/docs/hypercall-interfaces/arm64/grant_tables.rst @@ -0,0 +1,8 @@ +.. SPDX-License-Identifier: CC-BY-4.0 + +Grant Tables + + +.. doxygengroup:: grant_table >>> >>> Why is this Arm64-specific? >> >> This particular one is Arm64 specific, but it doesn’t mean that grant tables >> are arm specific, it is only that for now I’m >> Introducing a partial documentation in the arm side. Any other user can in >> the future add more documentation for >> each platform. > > I'm of the pretty strong opinion that common hypercalls should be > documented as common, and hence not live in an arch-specific > section. Yes sure, I agree, but this is the first step to enable a kind of documentation, it’s a work in progress. On a future serie this can be modify. For now I’ve tried to reproduce this page: https://xenbits.xen.org/docs/unstable/ In the section “Hypercall Interfaces" > @@ -73,20 +75,25 @@ * frame, or zero if none. * 3. Write memory barrier (WMB). * 4. Write ent->flags, inc. valid type. + * @endcode * * Invalidating an unused GTF_permit_access entry: + * @code * 1. flags = ent->flags. * 2. Observe that !(flags & (GTF_reading|GTF_writing)). * 3. Check result of SMP-safe CMPXCHG(>flags, flags, 0). * NB. No need for WMB as reuse of entry is control-dependent on success of * step 3, and all architectures guarantee ordering of ctrl-dep writes. + * @endcode * * Invalidating an in-use GTF_permit_access entry: + * * This cannot be done directly. Request assistance from the domain controller * which can set a timeout on the use of a grant entry and take necessary * action. (NB. This is not yet implemented!). * * Invalidating an unused GTF_accept_transfer entry: + * @code * 1. flags = ent->flags. * 2. Observe that !(flags & GTF_transfer_committed). [*] * 3. Check result of SMP-safe CMPXCHG(>flags, flags, 0). @@ -97,47 +104,55 @@ * transferred frame is written. It is safe for the guest to spin waiting * for this to occur (detect by observing GTF_transfer_completed in * ent->flags).
Re: [PATCH v3 5/6] x86/vpic: issue dpci EOI for cleared pins at ICW1
On 26.01.2021 17:57, Jan Beulich wrote: > On 26.01.2021 14:45, Roger Pau Monne wrote: >> When pins are cleared from either ISR or IRR as part of the >> initialization sequence forward the clearing of those pins to the dpci >> EOI handler, as it is equivalent to an EOI. Not doing so can bring the >> interrupt controller state out of sync with the dpci handling logic, >> that expects a notification when a pin has been EOI'ed. >> >> Fixes: 7b3cb5e5416 ('IRQ injection changes for HVM PCI passthru.') >> Signed-off-by: Roger Pau Monné > > As said before, under the assumption that the clearing of IRR > and ISR that we do is correct, I agree with the change. I'd > like to give it some time though before giving my R-b here, for > the inquiry to hopefully get answered. After all there's still > the possibility of us needing to instead squash that clearing > (which then would seem to result in getting things in sync the > other way around). Still haven't heard anything, so Reviewed-by: Jan Beulich In the worst case we'd need to consider reverting later on. Jan
[qemu-mainline bisection] complete test-arm64-arm64-libvirt-xsm
branch xen-unstable xenbranch xen-unstable job test-arm64-arm64-libvirt-xsm testid guest-start Tree: libvirt git://xenbits.xen.org/libvirt.git Tree: libvirt_keycodemapdb https://gitlab.com/keycodemap/keycodemapdb.git Tree: linux git://xenbits.xen.org/linux-pvops.git Tree: linuxfirmware git://xenbits.xen.org/osstest/linux-firmware.git Tree: ovmf git://xenbits.xen.org/osstest/ovmf.git Tree: qemuu git://git.qemu.org/qemu.git Tree: seabios git://xenbits.xen.org/osstest/seabios.git Tree: xen git://xenbits.xen.org/xen.git *** Found and reproduced problem changeset *** Bug is in tree: qemuu git://git.qemu.org/qemu.git Bug introduced: 8d17adf34f501ded65a106572740760f0a75577c Bug not present: e67d8e2928200e24ecb47c7be3ea8270077f2996 Last fail repro: http://logs.test-lab.xenproject.org/osstest/logs/161318/ commit 8d17adf34f501ded65a106572740760f0a75577c Author: Daniel P. Berrangé Date: Mon Feb 22 11:16:32 2021 + block: remove support for using "file" driver with block/char devices The 'host_device' and 'host_cdrom' drivers must be used instead. Reviewed-by: Eric Blake Signed-off-by: Daniel P. Berrangé For bisection revision-tuple graph see: http://logs.test-lab.xenproject.org/osstest/results/bisect/qemu-mainline/test-arm64-arm64-libvirt-xsm.guest-start.html Revision IDs in each graph node refer, respectively, to the Trees above. Running cs-bisection-step --graph-out=/home/logs/results/bisect/qemu-mainline/test-arm64-arm64-libvirt-xsm.guest-start --summary-out=tmp/161318.bisection-summary --basis-template=152631 --blessings=real,real-bisect,real-retry qemu-mainline test-arm64-arm64-libvirt-xsm guest-start Searching for failure / basis pass: 161290 fail [host=rochester1] / 160125 [host=laxton0] 160119 [host=rochester0] 160113 [host=laxton1] 160104 ok. Failure / basis pass flights: 161290 / 160104 Tree: libvirt git://xenbits.xen.org/libvirt.git Tree: libvirt_keycodemapdb https://gitlab.com/keycodemap/keycodemapdb.git Tree: linux git://xenbits.xen.org/linux-pvops.git Tree: linuxfirmware git://xenbits.xen.org/osstest/linux-firmware.git Tree: ovmf git://xenbits.xen.org/osstest/ovmf.git Tree: qemuu git://git.qemu.org/qemu.git Tree: seabios git://xenbits.xen.org/osstest/seabios.git Tree: xen git://xenbits.xen.org/xen.git Latest 2c846fa6bcc11929c9fb857a22430fb9945654ad 27acf0ef828bf719b2053ba398b195829413dbdd a6c5dd1dbaffe4cc398d8454546ba9246b9a95c9 c530a75c1e6a472b0eb9558310b518f0dfcd8860 99e7e48cc7117c37fc1c08a741872d0875595796 8fe9f1f891eff4e37f82622b7480ee748bf4af74 b0d61ecef66eb05bd7a4eb7ada88ec5dab06dfee dd22a64de7e02b48312839a15179528c8f7db5c6 Basis pass 2c846fa6bcc11929c9fb857a22430fb9945654ad 27acf0ef828bf719b2053ba398b195829413dbdd a6c5dd1dbaffe4cc398d8454546ba9246b9a95c9 c530a75c1e6a472b0eb9558310b518f0dfcd8860 f3bdfc41866edf7c256e689deb9d091a950c8fca 5b7f5586d182b0cafb1f8d558992a14763e2953e b0d61ecef66eb05bd7a4eb7ada88ec5dab06dfee b4011741e6b39a8fd0ed5aded96c16c45ead5888 Generating revisions with ./adhoc-revtuple-generator git://xenbits.xen.org/libvirt.git#2c846fa6bcc11929c9fb857a22430fb9945654ad-2c846fa6bcc11929c9fb857a22430fb9945654ad https://gitlab.com/keycodemap/keycodemapdb.git#27acf0ef828bf719b2053ba398b195829413dbdd-27acf0ef828bf719b2053ba398b195829413dbdd git://xenbits.xen.org/linux-pvops.git#a6c5dd1dbaffe4cc398d8454546ba9246b9a95c9-a6c5dd1dbaffe4cc398d8454546ba9246b9a95c9 git://xenbits.xen.org/osstest/linux-firmware.git#c530a75c1e6a472b0eb9558310b518f0\ dfcd8860-c530a75c1e6a472b0eb9558310b518f0dfcd8860 git://xenbits.xen.org/osstest/ovmf.git#f3bdfc41866edf7c256e689deb9d091a950c8fca-99e7e48cc7117c37fc1c08a741872d0875595796 git://git.qemu.org/qemu.git#5b7f5586d182b0cafb1f8d558992a14763e2953e-8fe9f1f891eff4e37f82622b7480ee748bf4af74 git://xenbits.xen.org/osstest/seabios.git#b0d61ecef66eb05bd7a4eb7ada88ec5dab06dfee-b0d61ecef66eb05bd7a4eb7ada88ec5dab06dfee git://xenbits.xen.org/xen.git#b4011741e6b39a8fd0ed5aded96c16c45ead5888-dd22a64de7e02b48312839a1\ 5179528c8f7db5c6 Loaded 44247 nodes in revision graph Searching for test results: 160104 pass 2c846fa6bcc11929c9fb857a22430fb9945654ad 27acf0ef828bf719b2053ba398b195829413dbdd a6c5dd1dbaffe4cc398d8454546ba9246b9a95c9 c530a75c1e6a472b0eb9558310b518f0dfcd8860 f3bdfc41866edf7c256e689deb9d091a950c8fca 5b7f5586d182b0cafb1f8d558992a14763e2953e b0d61ecef66eb05bd7a4eb7ada88ec5dab06dfee b4011741e6b39a8fd0ed5aded96c16c45ead5888 160113 [host=laxton1] 160119 [host=rochester0] 160125 [host=laxton0] 160134 fail irrelevant 160147 fail 2c846fa6bcc11929c9fb857a22430fb9945654ad 27acf0ef828bf719b2053ba398b195829413dbdd a6c5dd1dbaffe4cc398d8454546ba9246b9a95c9 c530a75c1e6a472b0eb9558310b518f0dfcd8860 eb07bfb09ef5483ad58ed0eba713f32fb0c909f9 2e1293cbaac75e84f541f9acfa8e26749f4c3562 b0d61ecef66eb05bd7a4eb7ada88ec5dab06dfee dae3c3e8b257cd27d6b35a467a34bf79a6650340 160167 fail
Re: swiotlb cleanups v3
On Sat, Apr 17, 2021 at 11:39:22AM -0500, Tom Lendacky wrote: > Somewhere between the 1st and 2nd patch, specifying a specific swiotlb > for an SEV guest is no longer honored. For example, if I start an SEV > guest with 16GB of memory and specify swiotlb=131072 I used to get a > 256MB SWIOTLB. However, after the 2nd patch, the swiotlb=131072 is no > longer honored and I get a 982MB SWIOTLB (as set via sev_setup_arch() in > arch/x86/mm/mem_encrypt.c). > > I can't be sure which patch caused the issue since an SEV guest fails to > boot with the 1st patch but can boot with the 2nd patch, at which point > the SWIOTLB comes in at 982MB (I haven't had a chance to debug it and so > I'm hoping you might be able to quickly spot what's going on). Can you try this patch? diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c index 0a5b6f7e75bce6..ac81ef97df32f5 100644 --- a/kernel/dma/swiotlb.c +++ b/kernel/dma/swiotlb.c @@ -71,15 +71,17 @@ struct io_tlb_mem *io_tlb_default_mem; */ static unsigned int max_segment; -static unsigned long default_nslabs = IO_TLB_DEFAULT_SIZE >> IO_TLB_SHIFT; +static unsigned long swiotlb_cmdline_size; static int __init setup_io_tlb_npages(char *str) { if (isdigit(*str)) { /* avoid tail segment of size < IO_TLB_SEGSIZE */ - default_nslabs = - ALIGN(simple_strtoul(str, , 0), IO_TLB_SEGSIZE); + unsigned long nslabs = simple_strtoul(str, , 0); + + swiotlb_cmdline_size = + ALIGN(nslabs, IO_TLB_SEGSIZE) << IO_TLB_SHIFT; } if (*str == ',') ++str; @@ -108,7 +110,9 @@ void swiotlb_set_max_segment(unsigned int val) unsigned long swiotlb_size_or_default(void) { - return default_nslabs << IO_TLB_SHIFT; + if (swiotlb_cmdline_size) + return swiotlb_cmdline_size; + return IO_TLB_DEFAULT_SIZE; } void __init swiotlb_adjust_size(unsigned long size) @@ -118,9 +122,10 @@ void __init swiotlb_adjust_size(unsigned long size) * architectures such as those supporting memory encryption to * adjust/expand SWIOTLB size for their use. */ - size = ALIGN(size, IO_TLB_SIZE); - default_nslabs = ALIGN(size >> IO_TLB_SHIFT, IO_TLB_SEGSIZE); - pr_info("SWIOTLB bounce buffer size adjusted to %luMB", size >> 20); + if (!swiotlb_cmdline_size) + swiotlb_cmdline_size = ALIGN(size, IO_TLB_SIZE); + pr_info("SWIOTLB bounce buffer size adjusted to %luMB", + swiotlb_cmdline_size >> 20); } void swiotlb_print_info(void) @@ -209,7 +214,7 @@ int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose) void __init swiotlb_init(int verbose) { - size_t bytes = PAGE_ALIGN(default_nslabs << IO_TLB_SHIFT); + size_t bytes = PAGE_ALIGN(swiotlb_size_or_default()); void *tlb; if (swiotlb_force == SWIOTLB_NO_FORCE) @@ -219,7 +224,7 @@ swiotlb_init(int verbose) tlb = memblock_alloc_low(bytes, PAGE_SIZE); if (!tlb) goto fail; - if (swiotlb_init_with_tbl(tlb, default_nslabs, verbose)) + if (swiotlb_init_with_tbl(tlb, bytes >> IO_TLB_SHIFT, verbose)) goto fail_free_mem; return;
[libvirt test] 161311: regressions - FAIL
flight 161311 libvirt real [real] http://logs.test-lab.xenproject.org/osstest/logs/161311/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: build-armhf-libvirt 6 libvirt-buildfail REGR. vs. 151777 build-amd64-libvirt 6 libvirt-buildfail REGR. vs. 151777 build-arm64-libvirt 6 libvirt-buildfail REGR. vs. 151777 build-i386-libvirt6 libvirt-buildfail REGR. vs. 151777 Tests which did not succeed, but are not blocking: test-amd64-amd64-libvirt 1 build-check(1) blocked n/a test-amd64-amd64-libvirt-pair 1 build-check(1) blocked n/a test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked n/a test-amd64-amd64-libvirt-vhd 1 build-check(1) blocked n/a test-amd64-amd64-libvirt-xsm 1 build-check(1) blocked n/a test-amd64-i386-libvirt 1 build-check(1) blocked n/a test-amd64-i386-libvirt-pair 1 build-check(1) blocked n/a test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked n/a test-amd64-i386-libvirt-xsm 1 build-check(1) blocked n/a test-arm64-arm64-libvirt 1 build-check(1) blocked n/a test-arm64-arm64-libvirt-qcow2 1 build-check(1) blocked n/a test-arm64-arm64-libvirt-xsm 1 build-check(1) blocked n/a test-armhf-armhf-libvirt 1 build-check(1) blocked n/a test-armhf-armhf-libvirt-raw 1 build-check(1) blocked n/a version targeted for testing: libvirt 1c34211c22de28127a509edbf2cf2f44cb0d891e baseline version: libvirt 2c846fa6bcc11929c9fb857a22430fb9945654ad Last test of basis 151777 2020-07-10 04:19:19 Z 284 days Failing since151818 2020-07-11 04:18:52 Z 283 days 276 attempts Testing same since 161311 2021-04-20 04:18:55 Z0 days1 attempts People who touched revisions under test: Adolfo Jayme Barrientos Aleksandr Alekseev Aleksei Zakharov Andika Triwidada Andrea Bolognani Balázs Meskó Barrett Schonefeld Bastien Orivel BiaoXiang Ye Bihong Yu Binfeng Wu Boris Fiuczynski Brian Turek Bruno Haible Chris Mayo Christian Ehrhardt Christian Schoenebeck Cole Robinson Collin Walling Cornelia Huck Cédric Bosdonnat Côme Borsoi Daniel Henrique Barboza Daniel Letai Daniel P. Berrange Daniel P. Berrangé Dmytro Linkin Eiichi Tsukata Eric Farman Erik Skultety Fabian Affolter Fabian Freyer Fangge Jin Farhan Ali Fedora Weblate Translation gongwei Guoyi Tu Göran Uddeborg Halil Pasic Han Han Hao Wang Hela Basa Helmut Grohne Ian Wienand Jakob Meng Jamie Strandboge Jamie Strandboge Jan Kuparinen Jean-Baptiste Holcroft Jianan Gao Jim Fehlig Jin Yan Jiri Denemark John Ferlan Jonathan Watt Jonathon Jongsma Julio Faracco Ján Tomko Kashyap Chamarthy Kevin Locke Kristina Hanicova Laine Stump Laszlo Ersek Liao Pingfang Lin Ma Lin Ma Lin Ma Luke Yue Luyao Zhong Marc Hartmayer Marc-André Lureau Marek Marczykowski-Górecki Markus Schade Martin Kletzander Masayoshi Mizuma Matt Coleman Matt Coleman Mauro Matteo Cascella Meina Li Michal Privoznik Michał Smyk Milo Casagrande Moshe Levi Muha Aliss Neal Gompa Nick Shyrokovskiy Nickys Music Group Nico Pache Nikolay Shirokovskiy Olaf Hering Olesya Gerasimenko Orion Poplawski Pany Patrick Magauran Paulo de Rezende Pinatti Pavel Hrdina Peng Liang Peter Krempa Pino Toscano Pino Toscano Piotr Drąg Prathamesh Chavan Ricky Tigg Roman Bogorodskiy Roman Bolshakov Ryan Gahagan Ryan Schmidt Sam Hartman Scott Shambarger Sebastian Mitterle SeongHyun Jo Shalini Chellathurai Saroja Shaojun Yang Shi Lei simmon Simon Gaiser Stefan Bader Stefan Berger Stefan Berger Szymon Scholz Thomas Huth Tim Wiederhake Tomáš Golembiovský Tomáš Janoušek Tuguoyi Ville Skyttä Wang Xin WangJian Weblate Yalei Li <274268...@qq.com> Yalei Li Yang Hang Yanqiu Zhang Yaroslav Kargin Yi Li Yi Wang Yuri Chornoivan Zheng Chuan zhenwei pi Zhenyu Zheng jobs: build-amd64-xsm pass build-arm64-xsm pass build-i386-xsm pass build-amd64 pass build-arm64 pass build-armhf pass build-i386
Re: [PATCH v2 3/3] docs/doxygen: doxygen documentation for grant_table.h
On 20.04.2021 10:46, Luca Fancellu wrote: >> On 19 Apr 2021, at 12:05, Jan Beulich wrote: >> >> On 19.04.2021 11:12, Luca Fancellu wrote: >>> Modification to include/public/grant_table.h: >>> >>> 1) Add doxygen tags to: >>> - Create Grant tables section >>> - include variables in the generated documentation >>> 2) Add .rst file for grant table for Arm64 >> >> I'm missing some reasoning about at least some of the changes done >> to grant_table.h. Looking at this and the earlier patches I also >> couldn't spot any general outline of what is acceptable or even >> necessary in such a header to be understood by doxygen. Without >> this written down somewhere (or, if documented elsewhere, a >> pointer provided to that doc) I'm afraid things might get screwed >> up again later on. > > Doxygen is a tool that generates documentation based on parsing the source > code comments, it recognises some > commands in the comments and builds the documentation sticking to what you > wrote. > > Here the doxygen docs: https://www.doxygen.nl/manual/docblocks.html > > Basically it doesn’t react to all comments, it parses only some well crafted > comments as explained in its documentation. Providing this link somewhere might be helpful. However, also seeing some of your further comments, it feels like to edit a public header enabled for doxygen one has to know fair parts of this documentation. While I'm certainly in favor of having a way to generate docs from the headers, I'm afraid I don't think such knowledge should become a prereq to e.g. adding a new sub-function of a hypercall. So far I was assuming that the formatting requirements would be quite limited, and that it would hence be possible to just glean them from existing code. But e.g. the "/**<" notation isn't going to be obvious to spot. >>> --- a/docs/hypercall-interfaces/arm64.rst >>> +++ b/docs/hypercall-interfaces/arm64.rst >>> @@ -8,6 +8,7 @@ Starting points >>> .. toctree:: >>>:maxdepth: 2 >>> >>> + arm64/grant_tables >>> >>> >>> Functions >>> diff --git a/docs/hypercall-interfaces/arm64/grant_tables.rst >>> b/docs/hypercall-interfaces/arm64/grant_tables.rst >>> new file mode 100644 >>> index 00..8955ec5812 >>> --- /dev/null >>> +++ b/docs/hypercall-interfaces/arm64/grant_tables.rst >>> @@ -0,0 +1,8 @@ >>> +.. SPDX-License-Identifier: CC-BY-4.0 >>> + >>> +Grant Tables >>> + >>> + >>> +.. doxygengroup:: grant_table >> >> Why is this Arm64-specific? > > This particular one is Arm64 specific, but it doesn’t mean that grant tables > are arm specific, it is only that for now I’m > Introducing a partial documentation in the arm side. Any other user can in > the future add more documentation for > each platform. I'm of the pretty strong opinion that common hypercalls should be documented as common, and hence not live in an arch-specific section. >>> @@ -73,20 +75,25 @@ >>> * frame, or zero if none. >>> * 3. Write memory barrier (WMB). >>> * 4. Write ent->flags, inc. valid type. >>> + * @endcode >>> * >>> * Invalidating an unused GTF_permit_access entry: >>> + * @code >>> * 1. flags = ent->flags. >>> * 2. Observe that !(flags & (GTF_reading|GTF_writing)). >>> * 3. Check result of SMP-safe CMPXCHG(>flags, flags, 0). >>> * NB. No need for WMB as reuse of entry is control-dependent on success of >>> * step 3, and all architectures guarantee ordering of ctrl-dep writes. >>> + * @endcode >>> * >>> * Invalidating an in-use GTF_permit_access entry: >>> + * >>> * This cannot be done directly. Request assistance from the domain >>> controller >>> * which can set a timeout on the use of a grant entry and take necessary >>> * action. (NB. This is not yet implemented!). >>> * >>> * Invalidating an unused GTF_accept_transfer entry: >>> + * @code >>> * 1. flags = ent->flags. >>> * 2. Observe that !(flags & GTF_transfer_committed). [*] >>> * 3. Check result of SMP-safe CMPXCHG(>flags, flags, 0). >>> @@ -97,47 +104,55 @@ >>> * transferred frame is written. It is safe for the guest to spin >>> waiting >>> * for this to occur (detect by observing GTF_transfer_completed in >>> * ent->flags). >>> + * @endcode >>> * >>> * Invalidating a committed GTF_accept_transfer entry: >>> * 1. Wait for (ent->flags & GTF_transfer_completed). >>> * >>> * Changing a GTF_permit_access from writable to read-only: >>> + * >>> * Use SMP-safe CMPXCHG to set GTF_readonly, while checking !GTF_writing. >>> * >>> * Changing a GTF_permit_access from read-only to writable: >>> + * >>> * Use SMP-safe bit-setting instruction. >> >> For example - are the blank lines you add necessary or merely nice >> to have in your personal opinion? > > The blank lines makes the docs output more good looking I'm inclined to suggest to split beautification from basic enabling. >>> - */ >>> - >>> -/* >>> - * Reference to a grant entry in a specified domain's grant table. >>> - */ >>> -typedef uint32_t
Re: [PATCH] xen-mapcache: avoid a race on memory map while using MAP_FIXED
On Tue, Apr 20, 2021 at 04:35:02AM +0100, Igor Druzhinin wrote: > When we're replacing the existing mapping there is possibility of a race > on memory map with other threads doing mmap operations - the address being > unmapped/re-mapped could be occupied by another thread in between. > > Linux mmap man page recommends keeping the existing mappings in place to > reserve the place and instead utilize the fact that the next mmap operation > with MAP_FIXED flag passed will implicitly destroy the existing mappings > behind the chosen address. This behavior is guaranteed by POSIX / BSD and > therefore is portable. > > Note that it wouldn't make the replacement atomic for parallel accesses to > the replaced region - those might still fail with SIGBUS due to > xenforeignmemory_map not being atomic. So we're still not expecting those. > > Tested-by: Anthony PERARD > Signed-off-by: Igor Druzhinin Should we add a 'Fixes: 759235653de ...' or similar tag here? > --- > hw/i386/xen/xen-mapcache.c | 15 ++- > 1 file changed, 14 insertions(+), 1 deletion(-) > > diff --git a/hw/i386/xen/xen-mapcache.c b/hw/i386/xen/xen-mapcache.c > index 5b120ed..e82b7dc 100644 > --- a/hw/i386/xen/xen-mapcache.c > +++ b/hw/i386/xen/xen-mapcache.c > @@ -171,7 +171,20 @@ static void xen_remap_bucket(MapCacheEntry *entry, > if (!(entry->flags & XEN_MAPCACHE_ENTRY_DUMMY)) { > ram_block_notify_remove(entry->vaddr_base, entry->size); > } > -if (munmap(entry->vaddr_base, entry->size) != 0) { > + > +/* > + * If an entry is being replaced by another mapping and we're using > + * MAP_FIXED flag for it - there is possibility of a race for vaddr > + * address with another thread doing an mmap call itself > + * (see man 2 mmap). To avoid that we skip explicit unmapping here > + * and allow the kernel to destroy the previous mappings by replacing > + * them in mmap call later. > + * > + * Non-identical replacements are not allowed therefore. > + */ > +assert(!vaddr || (entry->vaddr_base == vaddr && entry->size == > size)); > + > +if (!vaddr && munmap(entry->vaddr_base, entry->size) != 0) { Oh, so remappings of foreign addresses must always use the same virtual address space, I somehow assumed that you could remap an existing foreign map (or dummy mapping) into a different virtual address if you so wanted. Thanks, Roger.
Re: [PATCH v2 3/3] docs/doxygen: doxygen documentation for grant_table.h
> On 19 Apr 2021, at 12:05, Jan Beulich wrote: > > On 19.04.2021 11:12, Luca Fancellu wrote: >> Modification to include/public/grant_table.h: >> >> 1) Add doxygen tags to: >> - Create Grant tables section >> - include variables in the generated documentation >> 2) Add .rst file for grant table for Arm64 > > I'm missing some reasoning about at least some of the changes done > to grant_table.h. Looking at this and the earlier patches I also > couldn't spot any general outline of what is acceptable or even > necessary in such a header to be understood by doxygen. Without > this written down somewhere (or, if documented elsewhere, a > pointer provided to that doc) I'm afraid things might get screwed > up again later on. Hi Jan, Doxygen is a tool that generates documentation based on parsing the source code comments, it recognises some commands in the comments and builds the documentation sticking to what you wrote. Here the doxygen docs: https://www.doxygen.nl/manual/docblocks.html Basically it doesn’t react to all comments, it parses only some well crafted comments as explained in its documentation. > >> --- a/docs/hypercall-interfaces/arm64.rst >> +++ b/docs/hypercall-interfaces/arm64.rst >> @@ -8,6 +8,7 @@ Starting points >> .. toctree:: >>:maxdepth: 2 >> >> + arm64/grant_tables >> >> >> Functions >> diff --git a/docs/hypercall-interfaces/arm64/grant_tables.rst >> b/docs/hypercall-interfaces/arm64/grant_tables.rst >> new file mode 100644 >> index 00..8955ec5812 >> --- /dev/null >> +++ b/docs/hypercall-interfaces/arm64/grant_tables.rst >> @@ -0,0 +1,8 @@ >> +.. SPDX-License-Identifier: CC-BY-4.0 >> + >> +Grant Tables >> + >> + >> +.. doxygengroup:: grant_table > > Why is this Arm64-specific? This particular one is Arm64 specific, but it doesn’t mean that grant tables are arm specific, it is only that for now I’m Introducing a partial documentation in the arm side. Any other user can in the future add more documentation for each platform. > >> @@ -73,20 +75,25 @@ >> * frame, or zero if none. >> * 3. Write memory barrier (WMB). >> * 4. Write ent->flags, inc. valid type. >> + * @endcode >> * >> * Invalidating an unused GTF_permit_access entry: >> + * @code >> * 1. flags = ent->flags. >> * 2. Observe that !(flags & (GTF_reading|GTF_writing)). >> * 3. Check result of SMP-safe CMPXCHG(>flags, flags, 0). >> * NB. No need for WMB as reuse of entry is control-dependent on success of >> * step 3, and all architectures guarantee ordering of ctrl-dep writes. >> + * @endcode >> * >> * Invalidating an in-use GTF_permit_access entry: >> + * >> * This cannot be done directly. Request assistance from the domain >> controller >> * which can set a timeout on the use of a grant entry and take necessary >> * action. (NB. This is not yet implemented!). >> * >> * Invalidating an unused GTF_accept_transfer entry: >> + * @code >> * 1. flags = ent->flags. >> * 2. Observe that !(flags & GTF_transfer_committed). [*] >> * 3. Check result of SMP-safe CMPXCHG(>flags, flags, 0). >> @@ -97,47 +104,55 @@ >> * transferred frame is written. It is safe for the guest to spin >> waiting >> * for this to occur (detect by observing GTF_transfer_completed in >> * ent->flags). >> + * @endcode >> * >> * Invalidating a committed GTF_accept_transfer entry: >> * 1. Wait for (ent->flags & GTF_transfer_completed). >> * >> * Changing a GTF_permit_access from writable to read-only: >> + * >> * Use SMP-safe CMPXCHG to set GTF_readonly, while checking !GTF_writing. >> * >> * Changing a GTF_permit_access from read-only to writable: >> + * >> * Use SMP-safe bit-setting instruction. > > For example - are the blank lines you add necessary or merely nice > to have in your personal opinion? The blank lines makes the docs output more good looking > >> - */ >> - >> -/* >> - * Reference to a grant entry in a specified domain's grant table. >> - */ >> -typedef uint32_t grant_ref_t; > > Why does this get moved ... > >> - >> -/* >> + * >> * A grant table comprises a packed array of grant entries in one or more >> * page frames shared between Xen and a guest. >> + * >> * [XEN]: This field is written by Xen and read by the sharing guest. >> + * >> * [GST]: This field is written by the guest and read by Xen. >> + * >> + * @addtogroup grant_table Grant Tables >> + * @{ >> */ >> >> -/* >> - * Version 1 of the grant table entry structure is maintained purely >> - * for backwards compatibility. New guests should use version 2. >> +/** >> + * Reference to a grant entry in a specified domain's grant table. >> */ >> +typedef uint32_t grant_ref_t; > > ... here, past a comment unrelated to it? The comment “Version 1 of the grant table entry […]” was moved on top of the struct grant_entry_v1, in this way Doxygen associate the comment to that structure. The comment “Reference to a grant entry in a specified domain's grant
Re: [PATCH v3] x86/CPUID: shrink max_{,sub}leaf fields according to actual leaf contents
On Mon, Apr 19, 2021 at 02:29:38PM +0200, Jan Beulich wrote: > On 19.04.2021 14:09, Roger Pau Monné wrote: > > On Mon, Apr 19, 2021 at 01:46:02PM +0200, Jan Beulich wrote: > >> On 19.04.2021 11:16, Roger Pau Monné wrote: > >>> Adding Paul also for the Viridian part. > >>> > >>> On Fri, Apr 16, 2021 at 03:16:41PM +0200, Jan Beulich wrote: > Zapping leaf data for out of range leaves is just one half of it: To > avoid guests (bogusly or worse) inferring information from mere leaf > presence, also shrink maximum indicators such that the respective > trailing entry is not all blank (unless of course it's the initial > subleaf of a leaf that's not the final one). > > This is also in preparation of bumping the maximum basic leaf we > support, to ensure guests not getting exposed related features won't > observe a change in behavior. > > Signed-off-by: Jan Beulich > --- > v3: Record the actual non-empty subleaf in p->basic.raw[0x7], rather > than subleaf 0. Re-base over Viridian leaf 4005 addition. > v2: New. > > --- a/tools/tests/cpu-policy/test-cpu-policy.c > +++ b/tools/tests/cpu-policy/test-cpu-policy.c > @@ -8,10 +8,13 @@ > #include > > #include > +#include > #include > #include > #include > > +#define XSTATE_FP_SSE (X86_XCR0_FP | X86_XCR0_SSE) > + > static unsigned int nr_failures; > #define fail(fmt, ...) \ > ({ \ > @@ -553,6 +556,103 @@ static void test_cpuid_out_of_range_clea > } > } > > +static void test_cpuid_maximum_leaf_shrinking(void) > +{ > +static const struct test { > +const char *name; > +struct cpuid_policy p; > +} tests[] = { > +{ > +.name = "basic", > +.p = { > +/* Very basic information only. */ > +.basic.max_leaf = 1, > +.basic.raw_fms = 0xc2, > +}, > +}, > +{ > +.name = "cache", > +.p = { > +/* Cache subleaves present. */ > +.basic.max_leaf = 4, > +.cache.subleaf[0].type = 1, > >>> > >>> On a private conversation with Andrew he raised the issue that the > >>> shrinking might be overly simplistic. For example if the x2APIC > >>> feature bit in leaf 1 is set then the max leaf should be at least 0xb > >>> in order to be able to fetch the x2APIC ID, even if it's 0. > >> > >> But in such a case the "type" field of leaf 0xb's first sub-leaf is > >> going to be non-zero, isn't it? > > > > Right, as type 0 is invalid according to Intel SDM, so you will never > > be able to shrink below 0xb while having x2APIC set. > > > > I still wonder however if there's any other such dependency, where > > shrinking the max cpuid leaf could force us to drop features exposed > > in inferior leaves. > > My take is that, just like for the x2APIC case, such leaves won't be > all blank if the qualifying bit is set. Or if one ends up being all > blank, a bug likely sits elsewhere. Reviewed-by: Roger Pau Monné I'm not able to spot any more dependencies ATM, so worse case we discover some of those along the way and add the missing logic if required, in any case seems like a fine starting point. Thanks, Roger.
[xen-unstable test] 161296: tolerable FAIL - PUSHED
flight 161296 xen-unstable real [real] flight 161315 xen-unstable real-retest [real] http://logs.test-lab.xenproject.org/osstest/logs/161296/ http://logs.test-lab.xenproject.org/osstest/logs/161315/ Failures :-/ but no regressions. Tests which are failing intermittently (not blocking): test-amd64-amd64-xl-qemut-debianhvm-i386-xsm 12 debian-hvm-install fail pass in 161315-retest Tests which did not succeed, but are not blocking: test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 161281 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 161281 test-armhf-armhf-libvirt 16 saverestore-support-checkfail like 161281 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 161281 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 161281 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 161281 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 161281 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail like 161281 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 161281 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 161281 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 161281 test-amd64-i386-xl-pvshim14 guest-start fail never pass test-arm64-arm64-xl-seattle 15 migrate-support-checkfail never pass test-arm64-arm64-xl-seattle 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt 15 migrate-support-checkfail never pass test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail never pass test-amd64-i386-libvirt-xsm 15 migrate-support-checkfail never pass test-amd64-i386-libvirt 15 migrate-support-checkfail never pass test-arm64-arm64-xl 15 migrate-support-checkfail never pass test-arm64-arm64-xl 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit2 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 16 saverestore-support-checkfail never pass test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-arm64-arm64-xl-credit1 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit1 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail never pass test-armhf-armhf-xl 15 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 15 migrate-support-checkfail never pass test-armhf-armhf-xl 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit1 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit1 16 saverestore-support-checkfail never pass test-armhf-armhf-libvirt 15 migrate-support-checkfail never pass test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 14 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 15 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit2 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-arndale 15 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 16 saverestore-support-checkfail never pass version targeted for testing: xen b5b93627dd1c398a90b832af765b4720fc71814e baseline version: xen 05031fa87357fad155f659cfc2dcce6614834684 Last test of basis 161281 2021-04-19 04:14:43 Z1 days Testing same since 161296 2021-04-19 17:07:45 Z0 days1 attempts People who touched revisions under test: Andrew Cooper jobs:
[PATCH 7/9] arm/time,vtimer: Get rid of READ/WRITE_SYSREG32
AArch64 system registers are 64bit whereas AArch32 ones are 32bit or 64bit. MSR/MRS are expecting 64bit values thus we should get rid of helpers READ/WRITE_SYSREG32 in favour of using READ/WRITE_SYSREG. We should also use register_t type when reading sysregs which can correspond to uint64_t or uint32_t. Even though many AArch64 sysregs have upper 32bit reserved it does not mean that they can't be widen in the future. Modify type of vtimer structure's member: ctl to register_t. Signed-off-by: Michal Orzel --- xen/arch/arm/time.c | 28 ++-- xen/arch/arm/vtimer.c| 10 +- xen/include/asm-arm/domain.h | 2 +- 3 files changed, 20 insertions(+), 20 deletions(-) diff --git a/xen/arch/arm/time.c b/xen/arch/arm/time.c index b0021c2c69..e6c96eb90c 100644 --- a/xen/arch/arm/time.c +++ b/xen/arch/arm/time.c @@ -145,7 +145,7 @@ void __init preinit_xen_time(void) preinit_acpi_xen_time(); if ( !cpu_khz ) -cpu_khz = READ_SYSREG32(CNTFRQ_EL0) / 1000; +cpu_khz = (unsigned long)(READ_SYSREG(CNTFRQ_EL0) / 1000); res = platform_init_time(); if ( res ) @@ -205,13 +205,13 @@ int reprogram_timer(s_time_t timeout) if ( timeout == 0 ) { -WRITE_SYSREG32(0, CNTHP_CTL_EL2); +WRITE_SYSREG(0, CNTHP_CTL_EL2); return 1; } deadline = ns_to_ticks(timeout) + boot_count; WRITE_SYSREG64(deadline, CNTHP_CVAL_EL2); -WRITE_SYSREG32(CNTx_CTL_ENABLE, CNTHP_CTL_EL2); +WRITE_SYSREG(CNTx_CTL_ENABLE, CNTHP_CTL_EL2); isb(); /* No need to check for timers in the past; the Generic Timer fires @@ -223,23 +223,23 @@ int reprogram_timer(s_time_t timeout) static void timer_interrupt(int irq, void *dev_id, struct cpu_user_regs *regs) { if ( irq == (timer_irq[TIMER_HYP_PPI]) && - READ_SYSREG32(CNTHP_CTL_EL2) & CNTx_CTL_PENDING ) + READ_SYSREG(CNTHP_CTL_EL2) & CNTx_CTL_PENDING ) { perfc_incr(hyp_timer_irqs); /* Signal the generic timer code to do its work */ raise_softirq(TIMER_SOFTIRQ); /* Disable the timer to avoid more interrupts */ -WRITE_SYSREG32(0, CNTHP_CTL_EL2); +WRITE_SYSREG(0, CNTHP_CTL_EL2); } if ( irq == (timer_irq[TIMER_PHYS_NONSECURE_PPI]) && - READ_SYSREG32(CNTP_CTL_EL0) & CNTx_CTL_PENDING ) + READ_SYSREG(CNTP_CTL_EL0) & CNTx_CTL_PENDING ) { perfc_incr(phys_timer_irqs); /* Signal the generic timer code to do its work */ raise_softirq(TIMER_SOFTIRQ); /* Disable the timer to avoid more interrupts */ -WRITE_SYSREG32(0, CNTP_CTL_EL0); +WRITE_SYSREG(0, CNTP_CTL_EL0); } } @@ -260,8 +260,8 @@ static void vtimer_interrupt(int irq, void *dev_id, struct cpu_user_regs *regs) perfc_incr(virt_timer_irqs); -current->arch.virt_timer.ctl = READ_SYSREG32(CNTV_CTL_EL0); -WRITE_SYSREG32(current->arch.virt_timer.ctl | CNTx_CTL_MASK, CNTV_CTL_EL0); +current->arch.virt_timer.ctl = READ_SYSREG(CNTV_CTL_EL0); +WRITE_SYSREG(current->arch.virt_timer.ctl | CNTx_CTL_MASK, CNTV_CTL_EL0); vgic_inject_irq(current->domain, current, current->arch.virt_timer.irq, true); } @@ -297,9 +297,9 @@ void init_timer_interrupt(void) /* Sensible defaults */ WRITE_SYSREG64(0, CNTVOFF_EL2); /* No VM-specific offset */ /* Do not let the VMs program the physical timer, only read the physical counter */ -WRITE_SYSREG32(CNTHCTL_EL2_EL1PCTEN, CNTHCTL_EL2); -WRITE_SYSREG32(0, CNTP_CTL_EL0);/* Physical timer disabled */ -WRITE_SYSREG32(0, CNTHP_CTL_EL2); /* Hypervisor's timer disabled */ +WRITE_SYSREG(CNTHCTL_EL2_EL1PCTEN, CNTHCTL_EL2); +WRITE_SYSREG(0, CNTP_CTL_EL0);/* Physical timer disabled */ +WRITE_SYSREG(0, CNTHP_CTL_EL2); /* Hypervisor's timer disabled */ isb(); request_irq(timer_irq[TIMER_HYP_PPI], 0, timer_interrupt, @@ -320,8 +320,8 @@ void init_timer_interrupt(void) */ static void deinit_timer_interrupt(void) { -WRITE_SYSREG32(0, CNTP_CTL_EL0);/* Disable physical timer */ -WRITE_SYSREG32(0, CNTHP_CTL_EL2); /* Disable hypervisor's timer */ +WRITE_SYSREG(0, CNTP_CTL_EL0);/* Disable physical timer */ +WRITE_SYSREG(0, CNTHP_CTL_EL2); /* Disable hypervisor's timer */ isb(); release_irq(timer_irq[TIMER_HYP_PPI], NULL); diff --git a/xen/arch/arm/vtimer.c b/xen/arch/arm/vtimer.c index c2b27915c6..167fc6127a 100644 --- a/xen/arch/arm/vtimer.c +++ b/xen/arch/arm/vtimer.c @@ -138,8 +138,8 @@ void virt_timer_save(struct vcpu *v) { ASSERT(!is_idle_vcpu(v)); -v->arch.virt_timer.ctl = READ_SYSREG32(CNTV_CTL_EL0); -WRITE_SYSREG32(v->arch.virt_timer.ctl & ~CNTx_CTL_ENABLE, CNTV_CTL_EL0); +v->arch.virt_timer.ctl = READ_SYSREG(CNTV_CTL_EL0); +WRITE_SYSREG(v->arch.virt_timer.ctl & ~CNTx_CTL_ENABLE, CNTV_CTL_EL0); v->arch.virt_timer.cval = READ_SYSREG64(CNTV_CVAL_EL0); if (
[PATCH 6/9] arm/page: Get rid of READ/WRITE_SYSREG32
AArch64 system registers are 64bit whereas AArch32 ones are 32bit or 64bit. MSR/MRS are expecting 64bit values thus we should get rid of helpers READ/WRITE_SYSREG32 in favour of using READ/WRITE_SYSREG. We should also use register_t type when reading sysregs which can correspond to uint64_t or uint32_t. Even though many AArch64 sysregs have upper 32bit reserved it does not mean that they can't be widen in the future. Modify accesses to CTR_EL0 to use READ/WRITE_SYSREG. Signed-off-by: Michal Orzel --- xen/include/asm-arm/page.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/xen/include/asm-arm/page.h b/xen/include/asm-arm/page.h index 131507a517..c6f9fb0d4e 100644 --- a/xen/include/asm-arm/page.h +++ b/xen/include/asm-arm/page.h @@ -137,10 +137,10 @@ extern size_t dcache_line_bytes; static inline size_t read_dcache_line_bytes(void) { -uint32_t ctr; +register_t ctr; /* Read CTR */ -ctr = READ_SYSREG32(CTR_EL0); +ctr = READ_SYSREG(CTR_EL0); /* Bits 16-19 are the log2 number of words in the cacheline. */ return (size_t) (4 << ((ctr >> 16) & 0xf)); -- 2.29.0
[PATCH 5/9] arm/mm: Get rid of READ/WRITE_SYSREG32
AArch64 system registers are 64bit whereas AArch32 ones are 32bit or 64bit. MSR/MRS are expecting 64bit values thus we should get rid of helpers READ/WRITE_SYSREG32 in favour of using READ/WRITE_SYSREG. We should also use register_t type when reading sysregs which can correspond to uint64_t or uint32_t. Even though many AArch64 sysregs have upper 32bit reserved it does not mean that they can't be widen in the future. Modify SCTLR_EL2 accesses to use READ/WRITE_SYSREG. Signed-off-by: Michal Orzel --- xen/arch/arm/mm.c| 2 +- xen/arch/arm/traps.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c index 59f8a3f15f..0e07335291 100644 --- a/xen/arch/arm/mm.c +++ b/xen/arch/arm/mm.c @@ -613,7 +613,7 @@ void __init remove_early_mappings(void) */ static void xen_pt_enforce_wnx(void) { -WRITE_SYSREG32(READ_SYSREG32(SCTLR_EL2) | SCTLR_Axx_ELx_WXN, SCTLR_EL2); +WRITE_SYSREG(READ_SYSREG(SCTLR_EL2) | SCTLR_Axx_ELx_WXN, SCTLR_EL2); /* * The TLBs may cache SCTLR_EL2.WXN. So ensure it is synchronized * before flushing the TLBs. diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c index c7acdb2087..e7384381cc 100644 --- a/xen/arch/arm/traps.c +++ b/xen/arch/arm/traps.c @@ -915,7 +915,7 @@ static void _show_registers(const struct cpu_user_regs *regs, printk(" VTTBR_EL2: %016"PRIx64"\n", ctxt->vttbr_el2); printk("\n"); -printk(" SCTLR_EL2: %08"PRIx32"\n", READ_SYSREG32(SCTLR_EL2)); +printk(" SCTLR_EL2: %"PRIregister"\n", READ_SYSREG(SCTLR_EL2)); printk(" HCR_EL2: %"PRIregister"\n", READ_SYSREG(HCR_EL2)); printk(" TTBR0_EL2: %016"PRIx64"\n", READ_SYSREG64(TTBR0_EL2)); printk("\n"); -- 2.29.0
[PATCH 4/9] arm/p2m: Get rid of READ/WRITE_SYSREG32
AArch64 system registers are 64bit whereas AArch32 ones are 32bit or 64bit. MSR/MRS are expecting 64bit values thus we should get rid of helpers READ/WRITE_SYSREG32 in favour of using READ/WRITE_SYSREG. We should also use register_t type when reading sysregs which can correspond to uint64_t or uint32_t. Even though many AArch64 sysregs have upper 32bit reserved it does not mean that they can't be widen in the future. Modify type of vtcr to register_t. Signed-off-by: Michal Orzel --- xen/arch/arm/p2m.c | 8 xen/arch/arm/traps.c | 2 +- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c index ac50312620..d414c4feb9 100644 --- a/xen/arch/arm/p2m.c +++ b/xen/arch/arm/p2m.c @@ -1973,11 +1973,11 @@ void __init p2m_restrict_ipa_bits(unsigned int ipa_bits) } /* VTCR value to be configured by all CPUs. Set only once by the boot CPU */ -static uint32_t __read_mostly vtcr; +static register_t __read_mostly vtcr; static void setup_virt_paging_one(void *data) { -WRITE_SYSREG32(vtcr, VTCR_EL2); +WRITE_SYSREG(vtcr, VTCR_EL2); /* * ARM64_WORKAROUND_AT_SPECULATE: We want to keep the TLBs free from @@ -2000,7 +2000,7 @@ static void setup_virt_paging_one(void *data) void __init setup_virt_paging(void) { /* Setup Stage 2 address translation */ -unsigned long val = VTCR_RES1|VTCR_SH0_IS|VTCR_ORGN0_WBWA|VTCR_IRGN0_WBWA; +register_t val = VTCR_RES1|VTCR_SH0_IS|VTCR_ORGN0_WBWA|VTCR_IRGN0_WBWA; #ifdef CONFIG_ARM_32 if ( p2m_ipa_bits < 40 ) @@ -2089,7 +2089,7 @@ void __init setup_virt_paging(void) pa_range_info[pa_range].pabits, ( MAX_VMID == MAX_VMID_16_BIT ) ? 16 : 8); #endif -printk("P2M: %d levels with order-%d root, VTCR 0x%lx\n", +printk("P2M: %d levels with order-%d root, VTCR 0x%"PRIregister"\n", 4 - P2M_ROOT_LEVEL, P2M_ROOT_ORDER, val); p2m_vmid_allocator_init(); diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c index ccc0827107..c7acdb2087 100644 --- a/xen/arch/arm/traps.c +++ b/xen/arch/arm/traps.c @@ -911,7 +911,7 @@ static void _show_registers(const struct cpu_user_regs *regs, show_registers_32(regs, ctxt, guest_mode, v); #endif } -printk(" VTCR_EL2: %08"PRIx32"\n", READ_SYSREG32(VTCR_EL2)); +printk(" VTCR_EL2: %"PRIregister"\n", READ_SYSREG(VTCR_EL2)); printk(" VTTBR_EL2: %016"PRIx64"\n", ctxt->vttbr_el2); printk("\n"); -- 2.29.0
[PATCH 9/9] xen/arm64: Remove READ/WRITE_SYSREG32 helper macros
AArch64 system registers are 64bit whereas AArch32 ones are 32bit or 64bit. MSR/MRS are expecting 64bit values thus we should get rid of helpers READ/WRITE_SYSREG32 in favour of using READ/WRITE_SYSREG. We should also use register_t type when reading sysregs which can correspond to uint64_t or uint32_t. Even though many AArch64 sysregs have upper 32bit reserved it does not mean that they can't be widen in the future. As there are no other places in the code using READ/WRITE_SYSREG32, remove the helper macros. Signed-off-by: Michal Orzel --- xen/arch/arm/vcpreg.c | 16 xen/include/asm-arm/arm64/sysregs.h | 5 - 2 files changed, 16 insertions(+), 5 deletions(-) diff --git a/xen/arch/arm/vcpreg.c b/xen/arch/arm/vcpreg.c index c7f516ee0a..6cb7f3108c 100644 --- a/xen/arch/arm/vcpreg.c +++ b/xen/arch/arm/vcpreg.c @@ -48,6 +48,7 @@ */ /* The name is passed from the upper macro to workaround macro expansion. */ +#ifdef CONFIG_ARM_32 #define TVM_REG(sz, func, reg...) \ static bool func(struct cpu_user_regs *regs, uint##sz##_t *r, bool read)\ { \ @@ -61,6 +62,21 @@ static bool func(struct cpu_user_regs *regs, uint##sz##_t *r, bool read)\ \ return true;\ } +#else /* CONFIG_ARM_64 */ +#define TVM_REG(sz, func, reg...) \ +static bool func(struct cpu_user_regs *regs, uint##sz##_t *r, bool read)\ +{ \ +struct vcpu *v = current; \ +bool cache_enabled = vcpu_has_cache_enabled(v); \ +\ +GUEST_BUG_ON(read); \ +WRITE_SYSREG(*r, reg); \ +\ +p2m_toggle_cache(v, cache_enabled); \ +\ +return true;\ +} +#endif #define TVM_REG32(regname, xreg) TVM_REG(32, vreg_emulate_##regname, xreg) #define TVM_REG64(regname, xreg) TVM_REG(64, vreg_emulate_##regname, xreg) diff --git a/xen/include/asm-arm/arm64/sysregs.h b/xen/include/asm-arm/arm64/sysregs.h index 077fd95fb7..83a1157ac4 100644 --- a/xen/include/asm-arm/arm64/sysregs.h +++ b/xen/include/asm-arm/arm64/sysregs.h @@ -86,11 +86,6 @@ #endif /* Access to system registers */ - -#define READ_SYSREG32(name) ((uint32_t)READ_SYSREG64(name)) - -#define WRITE_SYSREG32(v, name) WRITE_SYSREG64((uint64_t)v, name) - #define WRITE_SYSREG64(v, name) do {\ uint64_t _r = v;\ asm volatile("msr "__stringify(name)", %0" : : "r" (_r)); \ -- 2.29.0
[PATCH 3/9] arm/gic: Get rid of READ/WRITE_SYSREG32
AArch64 system registers are 64bit whereas AArch32 ones are 32bit or 64bit. MSR/MRS are expecting 64bit values thus we should get rid of helpers READ/WRITE_SYSREG32 in favour of using READ/WRITE_SYSREG. We should also use register_t type when reading sysregs which can correspond to uint64_t or uint32_t. Even though many AArch64 sysregs have upper 32bit reserved it does not mean that they can't be widen in the future. Modify types of following members of struct gic_v3 to register_t: -hcr(not used at all in Xen) -vmcr -sre_el1 -apr0 -apr1 Signed-off-by: Michal Orzel --- xen/arch/arm/gic-v3-lpi.c | 2 +- xen/arch/arm/gic-v3.c | 96 --- xen/include/asm-arm/gic.h | 6 +-- 3 files changed, 54 insertions(+), 50 deletions(-) diff --git a/xen/arch/arm/gic-v3-lpi.c b/xen/arch/arm/gic-v3-lpi.c index 869bc97fa1..e1594dd20e 100644 --- a/xen/arch/arm/gic-v3-lpi.c +++ b/xen/arch/arm/gic-v3-lpi.c @@ -178,7 +178,7 @@ void gicv3_do_LPI(unsigned int lpi) irq_enter(); /* EOI the LPI already. */ -WRITE_SYSREG32(lpi, ICC_EOIR1_EL1); +WRITE_SYSREG(lpi, ICC_EOIR1_EL1); /* Find out if a guest mapped something to this physical LPI. */ hlpip = gic_get_host_lpi(lpi); diff --git a/xen/arch/arm/gic-v3.c b/xen/arch/arm/gic-v3.c index ac28013c19..0634013a67 100644 --- a/xen/arch/arm/gic-v3.c +++ b/xen/arch/arm/gic-v3.c @@ -246,12 +246,12 @@ static void gicv3_ich_write_lr(int lr, uint64_t val) */ static void gicv3_enable_sre(void) { -uint32_t val; +register_t val; -val = READ_SYSREG32(ICC_SRE_EL2); +val = READ_SYSREG(ICC_SRE_EL2); val |= GICC_SRE_EL2_SRE; -WRITE_SYSREG32(val, ICC_SRE_EL2); +WRITE_SYSREG(val, ICC_SRE_EL2); isb(); } @@ -315,16 +315,16 @@ static void restore_aprn_regs(const union gic_state_data *d) switch ( gicv3.nr_priorities ) { case 7: -WRITE_SYSREG32(d->v3.apr0[2], ICH_AP0R2_EL2); -WRITE_SYSREG32(d->v3.apr1[2], ICH_AP1R2_EL2); +WRITE_SYSREG(d->v3.apr0[2], ICH_AP0R2_EL2); +WRITE_SYSREG(d->v3.apr1[2], ICH_AP1R2_EL2); /* Fall through */ case 6: -WRITE_SYSREG32(d->v3.apr0[1], ICH_AP0R1_EL2); -WRITE_SYSREG32(d->v3.apr1[1], ICH_AP1R1_EL2); +WRITE_SYSREG(d->v3.apr0[1], ICH_AP0R1_EL2); +WRITE_SYSREG(d->v3.apr1[1], ICH_AP1R1_EL2); /* Fall through */ case 5: -WRITE_SYSREG32(d->v3.apr0[0], ICH_AP0R0_EL2); -WRITE_SYSREG32(d->v3.apr1[0], ICH_AP1R0_EL2); +WRITE_SYSREG(d->v3.apr0[0], ICH_AP0R0_EL2); +WRITE_SYSREG(d->v3.apr1[0], ICH_AP1R0_EL2); break; default: BUG(); @@ -338,16 +338,16 @@ static void save_aprn_regs(union gic_state_data *d) switch ( gicv3.nr_priorities ) { case 7: -d->v3.apr0[2] = READ_SYSREG32(ICH_AP0R2_EL2); -d->v3.apr1[2] = READ_SYSREG32(ICH_AP1R2_EL2); +d->v3.apr0[2] = READ_SYSREG(ICH_AP0R2_EL2); +d->v3.apr1[2] = READ_SYSREG(ICH_AP1R2_EL2); /* Fall through */ case 6: -d->v3.apr0[1] = READ_SYSREG32(ICH_AP0R1_EL2); -d->v3.apr1[1] = READ_SYSREG32(ICH_AP1R1_EL2); +d->v3.apr0[1] = READ_SYSREG(ICH_AP0R1_EL2); +d->v3.apr1[1] = READ_SYSREG(ICH_AP1R1_EL2); /* Fall through */ case 5: -d->v3.apr0[0] = READ_SYSREG32(ICH_AP0R0_EL2); -d->v3.apr1[0] = READ_SYSREG32(ICH_AP1R0_EL2); +d->v3.apr0[0] = READ_SYSREG(ICH_AP0R0_EL2); +d->v3.apr1[0] = READ_SYSREG(ICH_AP1R0_EL2); break; default: BUG(); @@ -371,15 +371,15 @@ static void gicv3_save_state(struct vcpu *v) dsb(sy); gicv3_save_lrs(v); save_aprn_regs(>arch.gic); -v->arch.gic.v3.vmcr = READ_SYSREG32(ICH_VMCR_EL2); -v->arch.gic.v3.sre_el1 = READ_SYSREG32(ICC_SRE_EL1); +v->arch.gic.v3.vmcr = READ_SYSREG(ICH_VMCR_EL2); +v->arch.gic.v3.sre_el1 = READ_SYSREG(ICC_SRE_EL1); } static void gicv3_restore_state(const struct vcpu *v) { -uint32_t val; +register_t val; -val = READ_SYSREG32(ICC_SRE_EL2); +val = READ_SYSREG(ICC_SRE_EL2); /* * Don't give access to system registers when the guest is using * GICv2 @@ -388,7 +388,7 @@ static void gicv3_restore_state(const struct vcpu *v) val &= ~GICC_SRE_EL2_ENEL1; else val |= GICC_SRE_EL2_ENEL1; -WRITE_SYSREG32(val, ICC_SRE_EL2); +WRITE_SYSREG(val, ICC_SRE_EL2); /* * VFIQEn is RES1 if ICC_SRE_EL1.SRE is 1. This causes a Group0 @@ -398,9 +398,9 @@ static void gicv3_restore_state(const struct vcpu *v) * want before starting to mess with the rest of the GIC, and * VMCR_EL1 in particular. */ -WRITE_SYSREG32(v->arch.gic.v3.sre_el1, ICC_SRE_EL1); +WRITE_SYSREG(v->arch.gic.v3.sre_el1, ICC_SRE_EL1); isb(); -WRITE_SYSREG32(v->arch.gic.v3.vmcr, ICH_VMCR_EL2); +WRITE_SYSREG(v->arch.gic.v3.vmcr, ICH_VMCR_EL2); restore_aprn_regs(>arch.gic); gicv3_restore_lrs(v);
[PATCH 8/9] arm: Change type of hsr to register_t
AArch64 system registers are 64bit whereas AArch32 ones are 32bit or 64bit. MSR/MRS are expecting 64bit values thus we should get rid of helpers READ/WRITE_SYSREG32 in favour of using READ/WRITE_SYSREG. We should also use register_t type when reading sysregs which can correspond to uint64_t or uint32_t. Even though many AArch64 sysregs have upper 32bit reserved it does not mean that they can't be widen in the future. Modify type of hsr to register_t. When on AArch64 add 32bit RES0 members to structures inside hsr union. Signed-off-by: Michal Orzel --- xen/arch/arm/arm64/entry.S| 2 +- xen/arch/arm/arm64/traps.c| 2 +- xen/arch/arm/arm64/vsysreg.c | 3 ++- xen/arch/arm/traps.c | 20 +--- xen/arch/arm/vcpreg.c | 13 +- xen/include/asm-arm/arm32/processor.h | 2 +- xen/include/asm-arm/arm64/processor.h | 5 ++-- xen/include/asm-arm/hsr.h | 34 ++- 8 files changed, 59 insertions(+), 22 deletions(-) diff --git a/xen/arch/arm/arm64/entry.S b/xen/arch/arm/arm64/entry.S index ab9a65fc14..218b063c97 100644 --- a/xen/arch/arm/arm64/entry.S +++ b/xen/arch/arm/arm64/entry.S @@ -155,7 +155,7 @@ add x21, sp, #UREGS_CPSR mrs x22, spsr_el2 mrs x23, esr_el2 -stp w22, w23, [x21] +stp x22, x23, [x21] .endm diff --git a/xen/arch/arm/arm64/traps.c b/xen/arch/arm/arm64/traps.c index babfc1d884..9113a15c7a 100644 --- a/xen/arch/arm/arm64/traps.c +++ b/xen/arch/arm/arm64/traps.c @@ -36,7 +36,7 @@ void do_bad_mode(struct cpu_user_regs *regs, int reason) union hsr hsr = { .bits = regs->hsr }; printk("Bad mode in %s handler detected\n", handler[reason]); -printk("ESR=0x%08"PRIx32": EC=%"PRIx32", IL=%"PRIx32", ISS=%"PRIx32"\n", +printk("ESR=%#"PRIregister": EC=%"PRIx32", IL=%"PRIx32", ISS=%"PRIx32"\n", hsr.bits, hsr.ec, hsr.len, hsr.iss); local_irq_disable(); diff --git a/xen/arch/arm/arm64/vsysreg.c b/xen/arch/arm/arm64/vsysreg.c index 41f18612c6..3c10d464e7 100644 --- a/xen/arch/arm/arm64/vsysreg.c +++ b/xen/arch/arm/arm64/vsysreg.c @@ -368,7 +368,8 @@ void do_sysreg(struct cpu_user_regs *regs, sysreg.op2, sysreg.read ? "=>" : "<=", sysreg.reg, regs->pc); -gdprintk(XENLOG_ERR, "unhandled 64-bit sysreg access %#x\n", +gdprintk(XENLOG_ERR, "unhandled 64-bit sysreg access" + " %#"PRIregister"\n", hsr.bits & HSR_SYSREG_REGS_MASK); inject_undef_exception(regs, hsr); return; diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c index e7384381cc..db15a2c647 100644 --- a/xen/arch/arm/traps.c +++ b/xen/arch/arm/traps.c @@ -546,7 +546,7 @@ void inject_undef64_exception(struct cpu_user_regs *regs, int instr_len) PSR_IRQ_MASK | PSR_DBG_MASK; regs->pc = handler; -WRITE_SYSREG32(esr.bits, ESR_EL1); +WRITE_SYSREG(esr.bits, ESR_EL1); } /* Inject an abort exception into a 64 bit guest */ @@ -580,7 +580,7 @@ static void inject_abt64_exception(struct cpu_user_regs *regs, regs->pc = handler; WRITE_SYSREG(addr, FAR_EL1); -WRITE_SYSREG32(esr.bits, ESR_EL1); +WRITE_SYSREG(esr.bits, ESR_EL1); } static void inject_dabt64_exception(struct cpu_user_regs *regs, @@ -919,7 +919,7 @@ static void _show_registers(const struct cpu_user_regs *regs, printk(" HCR_EL2: %"PRIregister"\n", READ_SYSREG(HCR_EL2)); printk(" TTBR0_EL2: %016"PRIx64"\n", READ_SYSREG64(TTBR0_EL2)); printk("\n"); -printk(" ESR_EL2: %08"PRIx32"\n", regs->hsr); +printk(" ESR_EL2: %"PRIregister"\n", regs->hsr); printk(" HPFAR_EL2: %"PRIregister"\n", READ_SYSREG(HPFAR_EL2)); #ifdef CONFIG_ARM_32 @@ -2004,13 +2004,15 @@ static void do_trap_stage2_abort_guest(struct cpu_user_regs *regs, break; default: -gprintk(XENLOG_WARNING, "Unsupported FSC: HSR=%#x DFSC=%#x\n", +gprintk(XENLOG_WARNING, "Unsupported FSC:" +" HSR=%#"PRIregister" DFSC=%#x\n", hsr.bits, xabt.fsc); } inject_abt: -gdprintk(XENLOG_DEBUG, "HSR=0x%x pc=%#"PRIregister" gva=%#"PRIvaddr - " gpa=%#"PRIpaddr"\n", hsr.bits, regs->pc, gva, gpa); +gdprintk(XENLOG_DEBUG, "HSR=%#"PRIregister" pc=%#"PRIregister"" + " gva=%#"PRIvaddr" gpa=%#"PRIpaddr"\n", + hsr.bits, regs->pc, gva, gpa); if ( is_data ) inject_dabt_exception(regs, gva, hsr.len); else @@ -2204,7 +2206,8 @@ void do_trap_guest_sync(struct cpu_user_regs *regs) default: gprintk(XENLOG_WARNING, -"Unknown Guest Trap. HSR=0x%x EC=0x%x IL=%x Syndrome=0x%"PRIx32"\n", +"Unknown Guest Trap. HSR=%#"PRIregister" EC=0x%x IL=%x" +" Syndrome=0x%"PRIx32"\n", hsr.bits, hsr.ec, hsr.len, hsr.iss);
[PATCH 2/9] arm/domain: Get rid of READ/WRITE_SYSREG32
AArch64 system registers are 64bit whereas AArch32 ones are 32bit or 64bit. MSR/MRS are expecting 64bit values thus we should get rid of helpers READ/WRITE_SYSREG32 in favour of using READ/WRITE_SYSREG. We should also use register_t type when reading sysregs which can correspond to uint64_t or uint32_t. Even though many AArch64 sysregs have upper 32bit reserved it does not mean that they can't be widen in the future. Modify type of registers: actlr, cntkctl to register_t. Signed-off-by: Michal Orzel --- xen/arch/arm/domain.c| 20 ++-- xen/include/asm-arm/domain.h | 4 ++-- 2 files changed, 12 insertions(+), 12 deletions(-) diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c index bdd3d3e5b5..c021a03c61 100644 --- a/xen/arch/arm/domain.c +++ b/xen/arch/arm/domain.c @@ -113,13 +113,13 @@ static void ctxt_switch_from(struct vcpu *p) p->arch.tpidr_el1 = READ_SYSREG(TPIDR_EL1); /* Arch timer */ -p->arch.cntkctl = READ_SYSREG32(CNTKCTL_EL1); +p->arch.cntkctl = READ_SYSREG(CNTKCTL_EL1); virt_timer_save(p); if ( is_32bit_domain(p->domain) && cpu_has_thumbee ) { -p->arch.teecr = READ_SYSREG32(TEECR32_EL1); -p->arch.teehbr = READ_SYSREG32(TEEHBR32_EL1); +p->arch.teecr = READ_SYSREG(TEECR32_EL1); +p->arch.teehbr = READ_SYSREG(TEEHBR32_EL1); } #ifdef CONFIG_ARM_32 @@ -175,7 +175,7 @@ static void ctxt_switch_from(struct vcpu *p) static void ctxt_switch_to(struct vcpu *n) { -uint32_t vpidr; +register_t vpidr; /* When the idle VCPU is running, Xen will always stay in hypervisor * mode. Therefore we don't need to restore the context of an idle VCPU. @@ -183,8 +183,8 @@ static void ctxt_switch_to(struct vcpu *n) if ( is_idle_vcpu(n) ) return; -vpidr = READ_SYSREG32(MIDR_EL1); -WRITE_SYSREG32(vpidr, VPIDR_EL2); +vpidr = READ_SYSREG(MIDR_EL1); +WRITE_SYSREG(vpidr, VPIDR_EL2); WRITE_SYSREG(n->arch.vmpidr, VMPIDR_EL2); /* VGIC */ @@ -257,8 +257,8 @@ static void ctxt_switch_to(struct vcpu *n) if ( is_32bit_domain(n->domain) && cpu_has_thumbee ) { -WRITE_SYSREG32(n->arch.teecr, TEECR32_EL1); -WRITE_SYSREG32(n->arch.teehbr, TEEHBR32_EL1); +WRITE_SYSREG(n->arch.teecr, TEECR32_EL1); +WRITE_SYSREG(n->arch.teehbr, TEEHBR32_EL1); } #ifdef CONFIG_ARM_32 @@ -274,7 +274,7 @@ static void ctxt_switch_to(struct vcpu *n) /* This is could trigger an hardware interrupt from the virtual * timer. The interrupt needs to be injected into the guest. */ -WRITE_SYSREG32(n->arch.cntkctl, CNTKCTL_EL1); +WRITE_SYSREG(n->arch.cntkctl, CNTKCTL_EL1); virt_timer_restore(n); } @@ -330,7 +330,7 @@ static void schedule_tail(struct vcpu *prev) static void continue_new_vcpu(struct vcpu *prev) { -current->arch.actlr = READ_SYSREG32(ACTLR_EL1); +current->arch.actlr = READ_SYSREG(ACTLR_EL1); processor_vcpu_initialise(current); schedule_tail(prev); diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h index 0a74df9931..2d4f38c669 100644 --- a/xen/include/asm-arm/domain.h +++ b/xen/include/asm-arm/domain.h @@ -156,7 +156,7 @@ struct arch_vcpu /* Control Registers */ register_t sctlr; -uint32_t actlr; +register_t actlr; uint32_t cpacr; uint32_t contextidr; @@ -190,7 +190,7 @@ struct arch_vcpu struct vgic_cpu vgic; /* Timer registers */ -uint32_t cntkctl; +register_t cntkctl; struct vtimer phys_timer; struct vtimer virt_timer; -- 2.29.0
[PATCH 0/9] xen/arm64: Get rid of READ/WRITE_SYSREG32
The purpose of this patch series is to remove 32bit helper macros READ/WRITE_SYSREG32 on arm64 as the idea of them is not following the latest ARMv8 specification and mrs/msr instructions are expecting 64bit values. According to ARM DDI 0487G.a all the AArch64 system registers are 64bit wide even though many of them have upper 32bit reserved. This does not mean that in the newer versions of ARMv8 or in the next architecture, some of the sysregs will get widen. Also when dealing with system registers we should use register_t type. This patch series removes the use of READ/WRITE_SYSREG32 and replaces these calls with READ/WRITE_SYSREG. The change was splited into several small patches to make the review proces easier. This patch series focuses on removing READ/WRITE_SYSREG32. There are still some AArch64 system registers defined as 32bit like cpsr that should be changed later on. They were not changed as they did not appear inside READ/WRITE_SYSREG32. The next thing to do is to also get rid of vreg_emulate_sysreg32. Michal Orzel (9): arm64/vfp: Get rid of READ/WRITE_SYSREG32 arm/domain: Get rid of READ/WRITE_SYSREG32 arm/gic: Get rid of READ/WRITE_SYSREG32 arm/p2m: Get rid of READ/WRITE_SYSREG32 arm/mm: Get rid of READ/WRITE_SYSREG32 arm/page: Get rid of READ/WRITE_SYSREG32 arm/time,vtimer: Get rid of READ/WRITE_SYSREG32 arm: Change type of hsr to register_t xen/arm64: Remove READ/WRITE_SYSREG32 helper macros xen/arch/arm/arm64/entry.S| 2 +- xen/arch/arm/arm64/traps.c| 2 +- xen/arch/arm/arm64/vfp.c | 12 ++-- xen/arch/arm/arm64/vsysreg.c | 3 +- xen/arch/arm/domain.c | 20 +++--- xen/arch/arm/gic-v3-lpi.c | 2 +- xen/arch/arm/gic-v3.c | 96 ++- xen/arch/arm/mm.c | 2 +- xen/arch/arm/p2m.c| 8 +-- xen/arch/arm/time.c | 28 xen/arch/arm/traps.c | 24 --- xen/arch/arm/vcpreg.c | 29 ++-- xen/arch/arm/vtimer.c | 10 +-- xen/include/asm-arm/arm32/processor.h | 2 +- xen/include/asm-arm/arm64/processor.h | 5 +- xen/include/asm-arm/arm64/sysregs.h | 5 -- xen/include/asm-arm/arm64/vfp.h | 6 +- xen/include/asm-arm/domain.h | 6 +- xen/include/asm-arm/gic.h | 6 +- xen/include/asm-arm/hsr.h | 34 +- xen/include/asm-arm/page.h| 4 +- 21 files changed, 179 insertions(+), 127 deletions(-) -- 2.29.0
[PATCH 1/9] arm64/vfp: Get rid of READ/WRITE_SYSREG32
AArch64 system registers are 64bit whereas AArch32 ones are 32bit or 64bit. MSR/MRS are expecting 64bit values thus we should get rid of helpers READ/WRITE_SYSREG32 in favour of using READ/WRITE_SYSREG. We should also use register_t type when reading sysregs which can correspond to uint64_t or uint32_t. Even though many AArch64 sysregs have upper 32bit reserved it does not mean that they can't be widen in the future. Modify type of FPCR, FPSR, FPEXC32_EL2 to register_t. Signed-off-by: Michal Orzel --- xen/arch/arm/arm64/vfp.c| 12 ++-- xen/include/asm-arm/arm64/vfp.h | 6 +++--- 2 files changed, 9 insertions(+), 9 deletions(-) diff --git a/xen/arch/arm/arm64/vfp.c b/xen/arch/arm/arm64/vfp.c index 999a0d58a5..47885e76ba 100644 --- a/xen/arch/arm/arm64/vfp.c +++ b/xen/arch/arm/arm64/vfp.c @@ -26,10 +26,10 @@ void vfp_save_state(struct vcpu *v) "stp q30, q31, [%1, #16 * 30]\n\t" : "=Q" (*v->arch.vfp.fpregs) : "r" (v->arch.vfp.fpregs)); -v->arch.vfp.fpsr = READ_SYSREG32(FPSR); -v->arch.vfp.fpcr = READ_SYSREG32(FPCR); +v->arch.vfp.fpsr = READ_SYSREG(FPSR); +v->arch.vfp.fpcr = READ_SYSREG(FPCR); if ( is_32bit_domain(v->domain) ) -v->arch.vfp.fpexc32_el2 = READ_SYSREG32(FPEXC32_EL2); +v->arch.vfp.fpexc32_el2 = READ_SYSREG(FPEXC32_EL2); } void vfp_restore_state(struct vcpu *v) @@ -55,8 +55,8 @@ void vfp_restore_state(struct vcpu *v) "ldp q30, q31, [%1, #16 * 30]\n\t" : : "Q" (*v->arch.vfp.fpregs), "r" (v->arch.vfp.fpregs)); -WRITE_SYSREG32(v->arch.vfp.fpsr, FPSR); -WRITE_SYSREG32(v->arch.vfp.fpcr, FPCR); +WRITE_SYSREG(v->arch.vfp.fpsr, FPSR); +WRITE_SYSREG(v->arch.vfp.fpcr, FPCR); if ( is_32bit_domain(v->domain) ) -WRITE_SYSREG32(v->arch.vfp.fpexc32_el2, FPEXC32_EL2); +WRITE_SYSREG(v->arch.vfp.fpexc32_el2, FPEXC32_EL2); } diff --git a/xen/include/asm-arm/arm64/vfp.h b/xen/include/asm-arm/arm64/vfp.h index 6ab5d36c6c..e6e8c363bc 100644 --- a/xen/include/asm-arm/arm64/vfp.h +++ b/xen/include/asm-arm/arm64/vfp.h @@ -7,9 +7,9 @@ struct vfp_state { uint64_t fpregs[64] __vfp_aligned; -uint32_t fpcr; -uint32_t fpexc32_el2; -uint32_t fpsr; +register_t fpcr; +register_t fpexc32_el2; +register_t fpsr; }; #endif /* _ARM_ARM64_VFP_H */ -- 2.29.0
Re: [PATCH] xen-mapcache: avoid a race on memory map while using MAP_FIXED
On 20/04/2021 04:35, Igor Druzhinin wrote: When we're replacing the existing mapping there is possibility of a race on memory map with other threads doing mmap operations - the address being unmapped/re-mapped could be occupied by another thread in between. Linux mmap man page recommends keeping the existing mappings in place to reserve the place and instead utilize the fact that the next mmap operation with MAP_FIXED flag passed will implicitly destroy the existing mappings behind the chosen address. This behavior is guaranteed by POSIX / BSD and therefore is portable. Note that it wouldn't make the replacement atomic for parallel accesses to the replaced region - those might still fail with SIGBUS due to xenforeignmemory_map not being atomic. So we're still not expecting those. Tested-by: Anthony PERARD Signed-off-by: Igor Druzhinin Reviewed-by: Paul Durrant --- hw/i386/xen/xen-mapcache.c | 15 ++- 1 file changed, 14 insertions(+), 1 deletion(-) diff --git a/hw/i386/xen/xen-mapcache.c b/hw/i386/xen/xen-mapcache.c index 5b120ed..e82b7dc 100644 --- a/hw/i386/xen/xen-mapcache.c +++ b/hw/i386/xen/xen-mapcache.c @@ -171,7 +171,20 @@ static void xen_remap_bucket(MapCacheEntry *entry, if (!(entry->flags & XEN_MAPCACHE_ENTRY_DUMMY)) { ram_block_notify_remove(entry->vaddr_base, entry->size); } -if (munmap(entry->vaddr_base, entry->size) != 0) { + +/* + * If an entry is being replaced by another mapping and we're using + * MAP_FIXED flag for it - there is possibility of a race for vaddr + * address with another thread doing an mmap call itself + * (see man 2 mmap). To avoid that we skip explicit unmapping here + * and allow the kernel to destroy the previous mappings by replacing + * them in mmap call later. + * + * Non-identical replacements are not allowed therefore. + */ +assert(!vaddr || (entry->vaddr_base == vaddr && entry->size == size)); + +if (!vaddr && munmap(entry->vaddr_base, entry->size) != 0) { perror("unmap fails"); exit(-1); }