Re: [PATCH v3 01/11] xen/manage: keep track of the on-going suspend mode
On Thu, Oct 01, 2020 at 08:43:58AM -0400, boris.ostrov...@oracle.com wrote: > CAUTION: This email originated from outside of the organization. Do not click > links or open attachments unless you can confirm the sender and know the > content is safe. > > > > >>> Also, wrt KASLR stuff, that issue is still seen sometimes but I > >>> haven't had > >>> bandwidth to dive deep into the issue and fix it. > So what's the plan there? You first mentioned this issue early this year > and judged by your response it is not clear whether you will ever spend > time looking at it. > > >>> I do want to fix it and did do some debugging earlier this year just > >>> haven't > >>> gotten back to it. Also, wanted to understand if the issue is a blocker > >>> to this > >>> series? > >> > >> Integrating code with known bugs is less than ideal. > >> > > So for this series to be accepted, KASLR needs to be fixed along with other > > comments of course? > > > Yes, please. > > > > >>> I had some theories when debugging around this like if the random base > >>> address picked by kaslr for the > >>> resuming kernel mismatches the suspended kernel and just jogging my > >>> memory, I didn't find that as the case. > >>> Another hunch was if physical address of registered vcpu info at boot is > >>> different from what suspended kernel > >>> has and that can cause CPU's to get stuck when coming online. > >> > >> I'd think if this were the case you'd have 100% failure rate. And we are > >> also re-registering vcpu info on xen restore and I am not aware of any > >> failures due to KASLR. > >> > > What I meant there wrt VCPU info was that VCPU info is not unregistered > > during hibernation, > > so Xen still remembers the old physical addresses for the VCPU information, > > created by the > > booting kernel. But since the hibernation kernel may have different physical > > addresses for VCPU info and if mismatch happens, it may cause issues with > > resume. > > During hibernation, the VCPU info register hypercall is not invoked again. > > > I still don't think that's the cause but it's certainly worth having a look. > Hi Boris, Apologies for picking this up after last year. I did some dive deep on the above statement and that is indeed the case that's happening. I did some debugging around KASLR and hibernation using reboot mode. I observed in my debug prints that whenever vcpu_info* address for secondary vcpu assigned in xen_vcpu_setup at boot is different than what is in the image, resume gets stuck for that vcpu in bringup_cpu(). That means we have different addresses for _cpu(xen_vcpu_info, cpu) at boot and after control jumps into the image. I failed to get any prints after it got stuck in bringup_cpu() and I do not have an option to send a sysrq signal to the guest or rather get a kdump. This change is not observed in every hibernate-resume cycle. I am not sure if this is a bug or an expected behavior. Also, I am contemplating the idea that it may be a bug in xen code getting triggered only when KASLR is enabled but I do not have substantial data to prove that. Is this a coincidence that this always happens for 1st vcpu? Moreover, since hypervisor is not aware that guest is hibernated and it looks like a regular shutdown to dom0 during reboot mode, will re-registering vcpu_info for secondary vcpu's even plausible? I could definitely use some advice to debug this further. Some printk's from my debugging: At Boot: xen_vcpu_setup: xen_have_vcpu_info_placement=1 cpu=1, vcpup=0x9e548fa560e0, info.mfn=3996246 info.offset=224, Image Loads: It ends up in the condition: xen_vcpu_setup() { ... if (xen_hvm_domain()) { if (per_cpu(xen_vcpu, cpu) == _cpu(xen_vcpu_info, cpu)) return 0; } ... } xen_vcpu_setup: checking mfn on resume cpu=1, info.mfn=3934806 info.offset=224, _cpu(xen_vcpu_info, cpu)=0x9d7240a560e0 This is tested on c4.2xlarge [8vcpu 15GB mem] instance with 5.10 kernel running in the guest. Thanks, Anchal. > > -boris > >
Re: QEMU backport necessary for building with "recent" toolchain (on openSUSE Tumbleweed)
On Thu, 2021-05-20 at 14:55 +0100, Anthony PERARD wrote: > On Tue, May 18, 2021 at 05:24:30PM +0200, Dario Faggioli wrote: > > > > I think we need the following commit in our QEMU: bbd2d5a812077 > > ("build: -no-pie is no functional linker flag"). > > Hi Dario, > Hi, > I'm hoping to update qemu-xen to a newer version of QEMU (6.0) which > would have the fix, but that's need a fix of libxl, > "Fix libxl with QEMU 6.0 + remove some more deprecated usages." > So I would prefer to avoid adding more to the current branch. > Sure, makes sense. I wanted to bring it up, in case it hadn't been noticed yet. If it has been noticed and there's a plan then we're good, I guess. > The branch stable-4.15 already have the fix if you need, in the mean > time. > Ok, thanks! Regards -- Dario Faggioli, Ph.D http://about.me/dario.faggioli Virtualization Software Engineer SUSE Labs, SUSE https://www.suse.com/ --- <> (Raistlin Majere) signature.asc Description: This is a digitally signed message part
Re: [PATCH v2] libelf: improve PVH elfnote parsing
On 20.05.21 11:28, Jan Beulich wrote: On 20.05.2021 11:27, Roger Pau Monné wrote: On Wed, May 19, 2021 at 12:34:19PM +0200, Jan Beulich wrote: On 18.05.2021 16:47, Roger Pau Monne wrote: @@ -425,8 +425,11 @@ static elf_errorstatus elf_xen_addr_calc_check(struct elf_binary *elf, return -1; } -/* Initial guess for virt_base is 0 if it is not explicitly defined. */ -if ( parms->virt_base == UNSET_ADDR ) +/* + * Initial guess for virt_base is 0 if it is not explicitly defined in the + * PV case. For PVH virt_base is forced to 0 because paging is disabled. + */ +if ( parms->virt_base == UNSET_ADDR || hvm ) { parms->virt_base = 0; elf_msg(elf, "ELF: VIRT_BASE unset, using %#" PRIx64 "\n", This message is wrong now if hvm is true but parms->virt_base != UNSET_ADDR. Best perhaps is to avoid emitting the message altogether when hvm is true. (Since you'll be touching it anyway, perhaps a good opportunity to do away with passing parms->virt_base to elf_msg(), and instead simply use a literal zero.) @@ -441,8 +444,10 @@ static elf_errorstatus elf_xen_addr_calc_check(struct elf_binary *elf, * * If we are using the modern ELF notes interface then the default * is 0. + * + * For PVH this is forced to 0, as it's already a legacy option for PV. */ -if ( parms->elf_paddr_offset == UNSET_ADDR ) +if ( parms->elf_paddr_offset == UNSET_ADDR || hvm ) { if ( parms->elf_note_start ) Don't you want "|| hvm" here as well, or alternatively suppress the fallback to the __xen_guest section in the PVH case (near the end of elf_xen_parse())? The legacy __xen_guest section doesn't support PHYS32_ENTRY, so yes, that part could be completely skipped when called from an HVM container. I think I will fix that in another patch though if you are OK, as it's not strictly related to the calculation fixes done here. That's fine; it wants to be a prereq to the one here then, though, I think. Would it be possible to add some comment to xen/include/public/elfnote.h Indicating which elfnotes are evaluated for which guest types, including a hint which elfnotes _have_ been evaluated before this series? This will help cleaning up guests regarding advertisement of elfnotes (something I've been planning to do for the Linux kernel). Juergen OpenPGP_0xB0DE9DD628BF132F.asc Description: OpenPGP public key OpenPGP_signature Description: OpenPGP digital signature
[linux-linus test] 162106: regressions - FAIL
flight 162106 linux-linus real [real] http://logs.test-lab.xenproject.org/osstest/logs/162106/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-i386-xl-xsm7 xen-install fail REGR. vs. 152332 test-amd64-i386-xl-qemuu-ws16-amd64 7 xen-install fail REGR. vs. 152332 test-amd64-i386-xl-qemuu-debianhvm-amd64-shadow 7 xen-install fail REGR. vs. 152332 test-amd64-i386-xl-qemuu-dmrestrict-amd64-dmrestrict 7 xen-install fail REGR. vs. 152332 test-amd64-i386-qemuu-rhel6hvm-intel 7 xen-install fail REGR. vs. 152332 test-amd64-i386-xl-qemut-debianhvm-amd64 7 xen-install fail REGR. vs. 152332 test-amd64-i386-qemut-rhel6hvm-intel 7 xen-install fail REGR. vs. 152332 test-amd64-i386-xl-qemuu-debianhvm-i386-xsm 7 xen-install fail REGR. vs. 152332 test-amd64-i386-xl-qemut-ws16-amd64 7 xen-install fail REGR. vs. 152332 test-amd64-i386-xl-qemuu-debianhvm-amd64 7 xen-install fail REGR. vs. 152332 test-amd64-i386-libvirt 7 xen-install fail REGR. vs. 152332 test-amd64-i386-pair 10 xen-install/src_host fail REGR. vs. 152332 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 7 xen-install fail REGR. vs. 152332 test-amd64-i386-qemut-rhel6hvm-amd 7 xen-installfail REGR. vs. 152332 test-amd64-i386-libvirt-xsm 7 xen-install fail REGR. vs. 152332 test-amd64-i386-pair 11 xen-install/dst_host fail REGR. vs. 152332 test-amd64-coresched-i386-xl 7 xen-install fail REGR. vs. 152332 test-amd64-i386-xl7 xen-install fail REGR. vs. 152332 test-amd64-i386-qemuu-rhel6hvm-amd 7 xen-installfail REGR. vs. 152332 test-amd64-i386-freebsd10-amd64 7 xen-install fail REGR. vs. 152332 test-amd64-i386-xl-pvshim 7 xen-install fail REGR. vs. 152332 test-amd64-i386-xl-raw7 xen-install fail REGR. vs. 152332 test-amd64-i386-xl-qemut-debianhvm-i386-xsm 7 xen-install fail REGR. vs. 152332 test-amd64-i386-xl-shadow 7 xen-install fail REGR. vs. 152332 test-amd64-i386-freebsd10-i386 7 xen-installfail REGR. vs. 152332 test-amd64-i386-xl-qemuu-ovmf-amd64 7 xen-install fail REGR. vs. 152332 test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm 7 xen-install fail REGR. vs. 152332 test-amd64-i386-xl-qemut-win7-amd64 7 xen-install fail REGR. vs. 152332 test-amd64-i386-xl-qemuu-win7-amd64 7 xen-install fail REGR. vs. 152332 test-amd64-i386-examine 6 xen-install fail REGR. vs. 152332 test-amd64-i386-libvirt-pair 10 xen-install/src_host fail REGR. vs. 152332 test-amd64-i386-libvirt-pair 11 xen-install/dst_host fail REGR. vs. 152332 test-amd64-amd64-libvirt-vhd 13 guest-start fail REGR. vs. 152332 test-amd64-amd64-amd64-pvgrub 19 guest-localmigrate/x10 fail REGR. vs. 152332 test-arm64-arm64-xl-thunderx 13 debian-fixup fail REGR. vs. 152332 test-arm64-arm64-xl 13 debian-fixup fail REGR. vs. 152332 test-arm64-arm64-xl-xsm 13 debian-fixup fail REGR. vs. 152332 test-arm64-arm64-xl-credit1 13 debian-fixup fail REGR. vs. 152332 test-arm64-arm64-libvirt-xsm 13 debian-fixup fail REGR. vs. 152332 test-arm64-arm64-xl-credit2 13 debian-fixup fail REGR. vs. 152332 test-amd64-amd64-i386-pvgrub 20 guest-stop fail REGR. vs. 152332 test-amd64-amd64-xl-qcow213 guest-start fail REGR. vs. 152332 test-armhf-armhf-xl-vhd 13 guest-start fail REGR. vs. 152332 Tests which did not succeed, but are not blocking: test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 152332 test-armhf-armhf-libvirt 16 saverestore-support-checkfail like 152332 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 152332 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 152332 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 152332 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail like 152332 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 152332 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail never pass test-amd64-amd64-libvirt 15 migrate-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-armhf-armhf-xl-arndale 15 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail never pass test-armhf-armhf-xl 15 migrate-support-checkfail never pass test-armhf-armhf-xl 16
[qemu-mainline test] 162104: regressions - FAIL
flight 162104 qemu-mainline real [real] http://logs.test-lab.xenproject.org/osstest/logs/162104/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-amd64-qemuu-freebsd11-amd64 16 guest-saverestore fail REGR. vs. 152631 test-amd64-i386-freebsd10-i386 16 guest-saverestore fail REGR. vs. 152631 test-amd64-i386-freebsd10-amd64 16 guest-saverestore fail REGR. vs. 152631 test-amd64-amd64-qemuu-freebsd12-amd64 16 guest-saverestore fail REGR. vs. 152631 test-amd64-i386-xl-qemuu-debianhvm-amd64 15 guest-saverestore fail REGR. vs. 152631 test-amd64-i386-xl-qemuu-debianhvm-amd64-shadow 15 guest-saverestore fail REGR. vs. 152631 test-amd64-amd64-xl-qemuu-debianhvm-i386-xsm 15 guest-saverestore fail REGR. vs. 152631 test-amd64-i386-xl-qemuu-ovmf-amd64 15 guest-saverestore fail REGR. vs. 152631 test-amd64-amd64-xl-qemuu-debianhvm-amd64 15 guest-saverestore fail REGR. vs. 152631 test-amd64-amd64-xl-qemuu-debianhvm-amd64-shadow 15 guest-saverestore fail REGR. vs. 152631 test-amd64-amd64-xl-qemuu-win7-amd64 15 guest-saverestore fail REGR. vs. 152631 test-amd64-amd64-xl-qemuu-ovmf-amd64 15 guest-saverestore fail REGR. vs. 152631 test-amd64-i386-xl-qemuu-debianhvm-i386-xsm 15 guest-saverestore fail REGR. vs. 152631 test-amd64-amd64-qemuu-nested-intel 20 debian-hvm-install/l1/l2 fail REGR. vs. 152631 test-amd64-i386-xl-qemuu-win7-amd64 15 guest-saverestore fail REGR. vs. 152631 test-amd64-i386-xl-qemuu-ws16-amd64 15 guest-saverestore fail REGR. vs. 152631 test-amd64-amd64-xl-qemuu-ws16-amd64 15 guest-saverestore fail REGR. vs. 152631 Regressions which are regarded as allowable (not blocking): test-amd64-amd64-xl-rtds 20 guest-localmigrate/x10 fail REGR. vs. 152631 Tests which did not succeed, but are not blocking: test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 152631 test-armhf-armhf-libvirt 16 saverestore-support-checkfail like 152631 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail like 152631 test-amd64-i386-xl-pvshim14 guest-start fail never pass test-arm64-arm64-xl-seattle 15 migrate-support-checkfail never pass test-arm64-arm64-xl-seattle 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail never pass test-amd64-amd64-libvirt 15 migrate-support-checkfail never pass test-amd64-i386-libvirt-xsm 15 migrate-support-checkfail never pass test-amd64-i386-libvirt 15 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit1 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit1 16 saverestore-support-checkfail never pass test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail never pass test-arm64-arm64-xl 15 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail never pass test-arm64-arm64-xl 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit2 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 16 saverestore-support-checkfail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail never pass test-armhf-armhf-xl 15 migrate-support-checkfail never pass test-armhf-armhf-xl 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-rtds 15 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit2 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass test-armhf-armhf-libvirt 15 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 14 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 15 saverestore-support-checkfail never pass test-armhf-armhf-xl-arndale 15 migrate-support-checkfail never pass
[linux-linus test] 162103: regressions - FAIL
flight 162103 linux-linus real [real] http://logs.test-lab.xenproject.org/osstest/logs/162103/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-i386-xl-xsm7 xen-install fail REGR. vs. 152332 test-amd64-i386-xl-qemuu-ws16-amd64 7 xen-install fail REGR. vs. 152332 test-amd64-i386-xl-qemuu-debianhvm-amd64-shadow 7 xen-install fail REGR. vs. 152332 test-amd64-i386-xl-qemuu-dmrestrict-amd64-dmrestrict 7 xen-install fail REGR. vs. 152332 test-amd64-i386-qemuu-rhel6hvm-intel 7 xen-install fail REGR. vs. 152332 test-amd64-i386-xl-qemut-debianhvm-amd64 7 xen-install fail REGR. vs. 152332 test-amd64-i386-qemut-rhel6hvm-intel 7 xen-install fail REGR. vs. 152332 test-amd64-i386-xl-qemuu-debianhvm-i386-xsm 7 xen-install fail REGR. vs. 152332 test-amd64-i386-examine 6 xen-install fail REGR. vs. 152332 test-amd64-i386-xl-qemut-ws16-amd64 7 xen-install fail REGR. vs. 152332 test-amd64-i386-xl-qemuu-debianhvm-amd64 7 xen-install fail REGR. vs. 152332 test-amd64-i386-libvirt 7 xen-install fail REGR. vs. 152332 test-amd64-i386-pair 10 xen-install/src_host fail REGR. vs. 152332 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 7 xen-install fail REGR. vs. 152332 test-amd64-i386-qemut-rhel6hvm-amd 7 xen-installfail REGR. vs. 152332 test-amd64-i386-libvirt-xsm 7 xen-install fail REGR. vs. 152332 test-amd64-i386-pair 11 xen-install/dst_host fail REGR. vs. 152332 test-amd64-coresched-i386-xl 7 xen-install fail REGR. vs. 152332 test-amd64-i386-xl7 xen-install fail REGR. vs. 152332 test-amd64-i386-qemuu-rhel6hvm-amd 7 xen-installfail REGR. vs. 152332 test-amd64-i386-freebsd10-amd64 7 xen-install fail REGR. vs. 152332 test-amd64-i386-xl-pvshim 7 xen-install fail REGR. vs. 152332 test-amd64-i386-xl-raw7 xen-install fail REGR. vs. 152332 test-amd64-i386-xl-qemut-debianhvm-i386-xsm 7 xen-install fail REGR. vs. 152332 test-amd64-i386-xl-shadow 7 xen-install fail REGR. vs. 152332 test-amd64-i386-freebsd10-i386 7 xen-installfail REGR. vs. 152332 test-amd64-i386-xl-qemuu-ovmf-amd64 7 xen-install fail REGR. vs. 152332 test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm 7 xen-install fail REGR. vs. 152332 test-amd64-i386-xl-qemut-win7-amd64 7 xen-install fail REGR. vs. 152332 test-amd64-i386-xl-qemuu-win7-amd64 7 xen-install fail REGR. vs. 152332 test-amd64-i386-libvirt-pair 10 xen-install/src_host fail REGR. vs. 152332 test-amd64-i386-libvirt-pair 11 xen-install/dst_host fail REGR. vs. 152332 test-amd64-amd64-libvirt-vhd 13 guest-start fail REGR. vs. 152332 test-arm64-arm64-xl-thunderx 13 debian-fixup fail REGR. vs. 152332 test-arm64-arm64-xl 13 debian-fixup fail REGR. vs. 152332 test-arm64-arm64-libvirt-xsm 13 debian-fixup fail REGR. vs. 152332 test-arm64-arm64-xl-credit2 13 debian-fixup fail REGR. vs. 152332 test-arm64-arm64-xl-credit1 14 guest-start fail REGR. vs. 152332 test-amd64-amd64-amd64-pvgrub 20 guest-stop fail REGR. vs. 152332 test-amd64-amd64-i386-pvgrub 20 guest-stop fail REGR. vs. 152332 test-amd64-amd64-xl-qcow213 guest-start fail REGR. vs. 152332 test-armhf-armhf-xl-vhd 13 guest-start fail REGR. vs. 152332 test-arm64-arm64-xl-xsm 14 guest-startfail in 162097 REGR. vs. 152332 Tests which are failing intermittently (not blocking): test-arm64-arm64-xl-credit1 13 debian-fixup fail in 162097 pass in 162103 test-arm64-arm64-xl-xsm 13 debian-fixup fail pass in 162097 test-amd64-amd64-xl-credit1 22 guest-start/debian.repeat fail pass in 162097 test-amd64-amd64-xl-shadow 22 guest-start/debian.repeat fail pass in 162097 test-amd64-amd64-examine 4 memdisk-try-append fail pass in 162097 Tests which did not succeed, but are not blocking: test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 152332 test-armhf-armhf-libvirt 16 saverestore-support-checkfail like 152332 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 152332 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 152332 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 152332 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail like 152332 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 152332 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail never pass test-amd64-amd64-libvirt 15 migrate-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass
[xen-unstable test] 162102: tolerable FAIL - PUSHED
flight 162102 xen-unstable real [real] flight 162105 xen-unstable real-retest [real] http://logs.test-lab.xenproject.org/osstest/logs/162102/ http://logs.test-lab.xenproject.org/osstest/logs/162105/ Failures :-/ but no regressions. Tests which are failing intermittently (not blocking): test-amd64-amd64-xl-qemuu-dmrestrict-amd64-dmrestrict 14 guest-start/debianhvm.repeat fail pass in 162105-retest Regressions which are regarded as allowable (not blocking): test-armhf-armhf-xl-rtds18 guest-start/debian.repeat fail REGR. vs. 162095 Tests which did not succeed, but are not blocking: test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 162095 test-armhf-armhf-libvirt 16 saverestore-support-checkfail like 162095 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 162095 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 162095 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 162095 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 162095 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 162095 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 162095 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail like 162095 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 162095 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 162095 test-amd64-i386-xl-pvshim14 guest-start fail never pass test-amd64-i386-libvirt-xsm 15 migrate-support-checkfail never pass test-amd64-amd64-libvirt 15 migrate-support-checkfail never pass test-amd64-i386-libvirt 15 migrate-support-checkfail never pass test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-seattle 15 migrate-support-checkfail never pass test-arm64-arm64-xl-seattle 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit1 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit1 16 saverestore-support-checkfail never pass test-arm64-arm64-xl 15 migrate-support-checkfail never pass test-arm64-arm64-xl 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit2 15 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail never pass test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-arm64-arm64-xl-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 15 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 16 saverestore-support-checkfail never pass test-armhf-armhf-xl 15 migrate-support-checkfail never pass test-armhf-armhf-xl 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit1 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit1 16 saverestore-support-checkfail never pass test-armhf-armhf-libvirt 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-rtds 15 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-vhd 14 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 15 saverestore-support-checkfail never pass test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail never pass version targeted for testing: xen aa77acc28098d04945af998f3fc0dbd3759b5b41 baseline version: xen 935abe1cc463917c697c1451ec8d313a5d75f7de Last test of basis 162095 2021-05-19 16:38:54 Z1 days Testing same since 162102 2021-05-20 07:39:10 Z0
Re: [PATCH v2 1/2] xen-pciback: redo VF placement in the virtual topology
On 5/20/21 10:46 AM, Jan Beulich wrote: > On 20.05.2021 16:44, Jan Beulich wrote: >> On 20.05.2021 16:38, Boris Ostrovsky wrote: >>> On 5/20/21 3:43 AM, Jan Beulich wrote: On 20.05.2021 02:36, Boris Ostrovsky wrote: > On 5/18/21 12:13 PM, Jan Beulich wrote: >> >> @@ -95,22 +95,25 @@ static int __xen_pcibk_add_pci_dev(struc >> >> /* >> * Keep multi-function devices together on the virtual PCI bus, >> except >> - * virtual functions. >> + * that we want to keep virtual functions at func 0 on their >> own. They >> + * aren't multi-function devices and hence their presence at >> func 0 >> + * may cause guests to not scan the other functions. > So your reading of the original commit is that whatever the issue it was, > only function zero was causing the problem? In other words, you are not > concerned that pci_scan_slot() may now look at function 1 and skip all > higher-numbered function (assuming the problem is still there)? I'm not sure I understand the question: Whether to look at higher numbered slots is a function of slot 0's multi-function bit alone, aiui. IOW if slot 1 is being looked at in the first place, slots 2-7 should also be looked at. >>> >>> Wasn't the original patch describing a problem strictly as one for >>> single-function devices, so the multi-function bit is not set? I.e. if all >>> VFs (which are single-function devices) are placed in the same slot then >>> pci_scan_slot() would only look at function 0 and ignore anything >>> higher-numbered. >>> >>> >>> My question is whether it would "only look at function 0 and ignore >>> anything higher-numbered" or "only look at the lowest-numbered function and >>> ignore anything higher-numbered". >> The common scanning logic is to look at slot 0 first. If that's populated, >> other slots get looked at only if slot 0 has the multi-function bit set. >> If slot 0 is not populated, nothing is known about the other slots, and >> hence they need to be scanned. > In particular Linux'es next_fn() ends with > > /* dev may be NULL for non-contiguous multifunction devices */ > if (!dev || dev->multifunction) > return (fn + 1) % 8; > > return 0; Ah yes. Reviewed-by: Boris Ostrovsky
Re: Uses of /hypervisor memory range (was: FreeBSD/Xen/ARM issues)
On Thu, 20 May 2021, Julien Grall wrote: > On 20/05/2021 00:25, Stefano Stabellini wrote: > > On Sat, 15 May 2021, Julien Grall wrote: > > > > My feeling is one of two things should happen with the /hypervisor > > > > address range: > > > > > > > > 1> OSes could be encouraged to use it for all foreign mappings. The > > > > range should be dynamic in some fashion. There could be a handy way to > > > > allow configuring the amount of address space thus reserved. > > > > > > In the context of XSA-300 and virtio on Xen on Arm, we discussed about > > > providing a revion for foreign mapping. The main trouble here is figuring > > > out > > > of the size, if you mess it up then you may break all the PV drivers. > > > > > > If the problem is finding space, then I would like to suggest a different > > > approach (I think I may have discussed it with Andrew). Xen is in > > > maintaining > > > the P2M for the guest and therefore now where are the unallocated space. > > > How > > > about introducing a new hypercall to ask for "unallocated space"? > > > > > > This would not work for older hypervisor but you could use the RAM instead > > > (as > > > Linux does). This is has drawbacks (e.g. shattering superpage, reducing > > > the > > > amount of memory usuable...), but for 1> you would also need a hack for > > > older > > > Xen. > > > > I am starting to wonder if it would make sense to add a new device tree > > binding to describe larger free region for foreign mapping rather then a > > hypercall. It could be several GBs or even TBs in size. I like the idea > > of having it device tree because after all this is static information. > > I can see that a hypercall would also work and I am open to it but if > > possible I think it would be better not to extend the hypercall > > interface when there is a good alternative. > > There are two issues with the Device-Tree approach: > 1) This doesn't work on ACPI > 2) It is not clear how to define the size of the region. An admin doesn't > have the right information in hand to know how many mappings will be done (a > backend may map multiple time the same page). > > The advantage of the hypercall solution is the size is "virtually" unlimited > and not based on a specific firmware. Fair enough
Re: [PATCH 00/13] Intel Hardware P-States (HWP) support
On 03.05.2021 21:27, Jason Andryuk wrote: > Open questions: > > HWP defaults to enabled and running in balanced mode. This is useful > for testing, but should the old ACPI cpufreq driver remain the default? As long as this new code is experimental, yes. But once it's deemed stable and fully supported, I think on HWP-capable hardware the new driver should be used by default. > This series unilaterally enables Hardware Duty Cycling (HDC) which is > another feature to autonomously powerdown things. That is enabled if > HWP is enabled. Maybe that want to be configurable? Probably; not even sure what the default would want to be. Jan
Re: [PATCH v2 1/2] xen-pciback: redo VF placement in the virtual topology
On 20.05.2021 16:44, Jan Beulich wrote: > On 20.05.2021 16:38, Boris Ostrovsky wrote: >> >> On 5/20/21 3:43 AM, Jan Beulich wrote: >>> On 20.05.2021 02:36, Boris Ostrovsky wrote: On 5/18/21 12:13 PM, Jan Beulich wrote: > > @@ -95,22 +95,25 @@ static int __xen_pcibk_add_pci_dev(struc > > /* >* Keep multi-function devices together on the virtual PCI bus, except > - * virtual functions. > + * that we want to keep virtual functions at func 0 on their own. They > + * aren't multi-function devices and hence their presence at func 0 > + * may cause guests to not scan the other functions. So your reading of the original commit is that whatever the issue it was, only function zero was causing the problem? In other words, you are not concerned that pci_scan_slot() may now look at function 1 and skip all higher-numbered function (assuming the problem is still there)? >>> I'm not sure I understand the question: Whether to look at higher numbered >>> slots is a function of slot 0's multi-function bit alone, aiui. IOW if >>> slot 1 is being looked at in the first place, slots 2-7 should also be >>> looked at. >> >> >> Wasn't the original patch describing a problem strictly as one for >> single-function devices, so the multi-function bit is not set? I.e. if all >> VFs (which are single-function devices) are placed in the same slot then >> pci_scan_slot() would only look at function 0 and ignore anything >> higher-numbered. >> >> >> My question is whether it would "only look at function 0 and ignore anything >> higher-numbered" or "only look at the lowest-numbered function and ignore >> anything higher-numbered". > > The common scanning logic is to look at slot 0 first. If that's populated, > other slots get looked at only if slot 0 has the multi-function bit set. > If slot 0 is not populated, nothing is known about the other slots, and > hence they need to be scanned. In particular Linux'es next_fn() ends with /* dev may be NULL for non-contiguous multifunction devices */ if (!dev || dev->multifunction) return (fn + 1) % 8; return 0; Jan
Re: [PATCH v2 1/2] xen-pciback: redo VF placement in the virtual topology
On 20.05.2021 16:38, Boris Ostrovsky wrote: > > On 5/20/21 3:43 AM, Jan Beulich wrote: >> On 20.05.2021 02:36, Boris Ostrovsky wrote: >>> On 5/18/21 12:13 PM, Jan Beulich wrote: @@ -95,22 +95,25 @@ static int __xen_pcibk_add_pci_dev(struc /* * Keep multi-function devices together on the virtual PCI bus, except - * virtual functions. + * that we want to keep virtual functions at func 0 on their own. They + * aren't multi-function devices and hence their presence at func 0 + * may cause guests to not scan the other functions. >>> >>> So your reading of the original commit is that whatever the issue it was, >>> only function zero was causing the problem? In other words, you are not >>> concerned that pci_scan_slot() may now look at function 1 and skip all >>> higher-numbered function (assuming the problem is still there)? >> I'm not sure I understand the question: Whether to look at higher numbered >> slots is a function of slot 0's multi-function bit alone, aiui. IOW if >> slot 1 is being looked at in the first place, slots 2-7 should also be >> looked at. > > > Wasn't the original patch describing a problem strictly as one for > single-function devices, so the multi-function bit is not set? I.e. if all > VFs (which are single-function devices) are placed in the same slot then > pci_scan_slot() would only look at function 0 and ignore anything > higher-numbered. > > > My question is whether it would "only look at function 0 and ignore anything > higher-numbered" or "only look at the lowest-numbered function and ignore > anything higher-numbered". The common scanning logic is to look at slot 0 first. If that's populated, other slots get looked at only if slot 0 has the multi-function bit set. If slot 0 is not populated, nothing is known about the other slots, and hence they need to be scanned. Jan
Re: [PATCH v2 1/2] xen-pciback: redo VF placement in the virtual topology
On 5/20/21 3:43 AM, Jan Beulich wrote: > On 20.05.2021 02:36, Boris Ostrovsky wrote: >> On 5/18/21 12:13 PM, Jan Beulich wrote: >>> >>> @@ -95,22 +95,25 @@ static int __xen_pcibk_add_pci_dev(struc >>> >>> /* >>> * Keep multi-function devices together on the virtual PCI bus, except >>> -* virtual functions. >>> +* that we want to keep virtual functions at func 0 on their own. They >>> +* aren't multi-function devices and hence their presence at func 0 >>> +* may cause guests to not scan the other functions. >> >> So your reading of the original commit is that whatever the issue it was, >> only function zero was causing the problem? In other words, you are not >> concerned that pci_scan_slot() may now look at function 1 and skip all >> higher-numbered function (assuming the problem is still there)? > I'm not sure I understand the question: Whether to look at higher numbered > slots is a function of slot 0's multi-function bit alone, aiui. IOW if > slot 1 is being looked at in the first place, slots 2-7 should also be > looked at. Wasn't the original patch describing a problem strictly as one for single-function devices, so the multi-function bit is not set? I.e. if all VFs (which are single-function devices) are placed in the same slot then pci_scan_slot() would only look at function 0 and ignore anything higher-numbered. My question is whether it would "only look at function 0 and ignore anything higher-numbered" or "only look at the lowest-numbered function and ignore anything higher-numbered". -boris
Re: QEMU backport necessary for building with "recent" toolchain (on openSUSE Tumbleweed)
On Tue, May 18, 2021 at 05:24:30PM +0200, Dario Faggioli wrote: > Hello, > > I think we need the following commit in our QEMU: bbd2d5a812077 > ("build: -no-pie is no functional linker flag"). Hi Dario, I'm hoping to update qemu-xen to a newer version of QEMU (6.0) which would have the fix, but that's need a fix of libxl, "Fix libxl with QEMU 6.0 + remove some more deprecated usages." So I would prefer to avoid adding more to the current branch. The branch stable-4.15 already have the fix if you need, in the mean time. Thanks, -- Anthony PERARD
[PATCH v2] x86emul: de-duplicate scatters to the same linear address
The SDM specifically allows for earlier writes to fully overlapping ranges to be dropped. If a guest did so, hvmemul_phys_mmio_access() would crash it if varying data was written to the same address. Detect overlaps early, as doing so in hvmemul_{linear,phys}_mmio_access() would be quite a bit more difficult. To maintain proper faulting behavior, instead of dropping earlier write instances of fully overlapping slots altogether, write the data of the final of these slots multiple times. (We also can't pull ahead the [single] write of the data of the last of the slots, clearing all involved slots' op_mask bits together, as this would yield incorrect results if there were intervening partially overlapping ones.) Note that due to cache slot use being linear address based, there's no similar issue with multiple writes to the same physical address (mapped through different linear addresses). Signed-off-by: Jan Beulich --- v2: Maintain correct faulting behavior. --- a/xen/arch/x86/x86_emulate/x86_emulate.c +++ b/xen/arch/x86/x86_emulate/x86_emulate.c @@ -10073,15 +10073,36 @@ x86_emulate( for ( i = 0; op_mask; ++i ) { -long idx = b & 1 ? index.qw[i] : index.dw[i]; +long idx = (b & 1 ? index.qw[i] + : index.dw[i]) * (1 << state->sib_scale); +unsigned long offs = truncate_ea(ea.mem.off + idx); +unsigned int j, slot; if ( !(op_mask & (1 << i)) ) continue; -rc = ops->write(ea.mem.seg, -truncate_ea(ea.mem.off + -idx * (1 << state->sib_scale)), -(void *)mmvalp + i * op_bytes, op_bytes, ctxt); +/* + * hvmemul_linear_mmio_access() will find a cache slot based on + * linear address. hvmemul_phys_mmio_access() will crash the + * domain if observing varying data getting written to the same + * cache slot. Utilize that squashing earlier writes to fully + * overlapping addresses is permitted by the spec. We can't, + * however, drop the writes altogether, to maintain correct + * faulting behavior. Instead write the data from the last of + * the fully overlapping slots multiple times. + */ +for ( j = (slot = i) + 1; j < n; ++j ) +{ +long idx2 = (b & 1 ? index.qw[j] + : index.dw[j]) * (1 << state->sib_scale); + +if ( (op_mask & (1 << j)) && + truncate_ea(ea.mem.off + idx2) == offs ) +slot = j; +} + +rc = ops->write(ea.mem.seg, offs, +(void *)mmvalp + slot * op_bytes, op_bytes, ctxt); if ( rc != X86EMUL_OKAY ) { /* See comment in gather emulation. */
[qemu-mainline test] 162099: regressions - FAIL
flight 162099 qemu-mainline real [real] http://logs.test-lab.xenproject.org/osstest/logs/162099/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-amd64-qemuu-freebsd11-amd64 16 guest-saverestore fail REGR. vs. 152631 test-amd64-i386-freebsd10-i386 16 guest-saverestore fail REGR. vs. 152631 test-amd64-i386-freebsd10-amd64 16 guest-saverestore fail REGR. vs. 152631 test-amd64-amd64-qemuu-freebsd12-amd64 16 guest-saverestore fail REGR. vs. 152631 test-amd64-i386-xl-qemuu-debianhvm-amd64 15 guest-saverestore fail REGR. vs. 152631 test-amd64-i386-xl-qemuu-debianhvm-amd64-shadow 15 guest-saverestore fail REGR. vs. 152631 test-amd64-amd64-xl-qemuu-debianhvm-i386-xsm 15 guest-saverestore fail REGR. vs. 152631 test-amd64-i386-xl-qemuu-ovmf-amd64 15 guest-saverestore fail REGR. vs. 152631 test-amd64-amd64-xl-qemuu-debianhvm-amd64 15 guest-saverestore fail REGR. vs. 152631 test-amd64-amd64-xl-qemuu-debianhvm-amd64-shadow 15 guest-saverestore fail REGR. vs. 152631 test-amd64-amd64-xl-qemuu-win7-amd64 15 guest-saverestore fail REGR. vs. 152631 test-amd64-amd64-xl-qemuu-ovmf-amd64 15 guest-saverestore fail REGR. vs. 152631 test-amd64-i386-xl-qemuu-debianhvm-i386-xsm 15 guest-saverestore fail REGR. vs. 152631 test-amd64-amd64-qemuu-nested-intel 20 debian-hvm-install/l1/l2 fail REGR. vs. 152631 test-amd64-i386-xl-qemuu-win7-amd64 15 guest-saverestore fail REGR. vs. 152631 test-amd64-i386-xl-qemuu-ws16-amd64 15 guest-saverestore fail REGR. vs. 152631 test-amd64-amd64-xl-qemuu-ws16-amd64 15 guest-saverestore fail REGR. vs. 152631 Tests which did not succeed, but are not blocking: test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 152631 test-armhf-armhf-libvirt 16 saverestore-support-checkfail like 152631 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail like 152631 test-amd64-i386-xl-pvshim14 guest-start fail never pass test-arm64-arm64-xl-seattle 15 migrate-support-checkfail never pass test-arm64-arm64-xl-seattle 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail never pass test-amd64-amd64-libvirt 15 migrate-support-checkfail never pass test-amd64-i386-libvirt-xsm 15 migrate-support-checkfail never pass test-amd64-i386-libvirt 15 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit1 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit1 16 saverestore-support-checkfail never pass test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail never pass test-arm64-arm64-xl 15 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail never pass test-arm64-arm64-xl 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit2 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 16 saverestore-support-checkfail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail never pass test-armhf-armhf-xl 15 migrate-support-checkfail never pass test-armhf-armhf-xl 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-rtds 15 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit2 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail never pass test-armhf-armhf-libvirt 15 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 14 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 15 saverestore-support-checkfail never pass test-armhf-armhf-xl-arndale 15 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit1 15 migrate-support-checkfail never pass
Re: Hand over of the Xen shared info page
Hi, On 5/20/21 12:46 PM, Julien Grall wrote: > > > On 20/05/2021 06:21, Oleksandr Andrushchenko wrote: >> Hi, all! > > Hi, > > >> On 5/20/21 2:11 AM, Stefano Stabellini wrote: >>> On Wed, 19 May 2021, Julien Grall wrote: On 14/05/2021 10:50, Anastasiia Lukianenko wrote: > Hi Julien! Hello, > On Thu, 2021-05-13 at 09:37 +0100, Julien Grall wrote: >> On 13/05/2021 09:03, Anastasiia Lukianenko wrote: >> The alternative is for U-boot to go through the DT and infer which >> regions are free (IOW any region not described). > Thank you for interest in the problem and advice on how to solve it. > Could you please clarify how we could find free regions using DT in U- > boot? I don't know U-boot code, so I can't tell whether what I suggest would work. In theory, the device-tree should described every region allocated in address space. So if you parse the device-tree and create a list (or any datastructure) with the regions, then any range not present in the list would be free region you could use. >>> Yes, any "empty" memory region which is neither memory nor MMIO should >>> work. >> >> You need to account on many things while creating the list of regions: >> >> device register mappings, reserved nodes, memory nodes, device tree >> >> overlays parsing etc. It all seem to be a not-that-trivial task and after >> >> all if implemented it only lives in U-boot and you have to maintain that >> >> code. So, if some other entity needs the same you need to implement >> >> that as well. > > Yes, there are some complexity. I have suggested other approach in a separate > thread. Did you look at them? Yes, so probably we can re-use the solution from that thread when it comes to some conclusion and implementation. > >> And also you can imagine a system without device tree at all... > Xen doesn't provide a stable guest layout. Such system would have to be > tailored to a given guest configuration, Xen version (can be custom)... > > Therefore, hardcoding the value in the system (not in Xen headers!) is not > going to make it much worse. Agree to some extent > >> So, I am not saying it is not possible to implement, but I just question if >> >> such an implementation is really a good way forward. >> >>> >>> However, I realized a few days ago that the magic pages are not described in the DT. We probably want to fix it by marking the page as "reserved" or create a specific bindings. So you will need a specific quirk for them. >>> It should also be possible to keep the shared info page allocated and >>> simply pass the address to the kernel by adding the right node to device >>> tree. >> And then you need to modify your OS and this is not only Linux... >>> To do that, we would have to add a description of the magic pages >>> to device tree, which I think would be good to have in any case. In that >>> case it would be best to add a proper binding for it under the "xen,xen" >>> node. >> >> I would say that querying Xen for such a memory page seems much >> >> more cleaner and allows the guest OS either to continue using the existing >> >> method with memory allocation or using the page provided by Xen. > > IIUC your proposal, you are asking to add an hypercall to query which guest > physical region can be used for the shared info page. > > This may solve the problem you have at hand, but this is not scalable. There > are a few other regions which regions unallocated memory (e.g. grant table, > grant mapping, foreign memory map,). > > So if we were going to involve Xen, then I think it is better to have a > generic way to ask Xen for unallocated space. Agree > > Cheers, > Thank you, Oleksandr
Re: [PATCH v3 2/2] libelf: improve PVH elfnote parsing
On 20.05.2021 14:30, Roger Pau Monne wrote: > Pass an hvm boolean parameter to the elf note parsing and checking > routines, so that better checking can be done in case libelf is > dealing with an hvm container. > > elf_xen_note_check shouldn't return early unless PHYS32_ENTRY is set > and the container is of type HVM, or else the loader and version > checks would be avoided for kernels intended to be booted as PV but > that also have PHYS32_ENTRY set. > > Adjust elf_xen_addr_calc_check so that the virtual addresses are > actually physical ones (by setting virt_base and elf_paddr_offset to > zero) when the container is of type HVM, as that container is always > started with paging disabled. > > Signed-off-by: Roger Pau Monné With the one hunk moved to patch 1 Reviewed-by: Jan Beulich Jan
Re: [PATCH v3 1/2] libelf: don't attempt to parse __xen_guest for PVH
On 20.05.2021 14:30, Roger Pau Monne wrote: > The legacy __xen_guest section doesn't support the PHYS32_ENTRY > elfnote, so it's pointless to attempt to parse the elfnotes from that > section when called from an hvm container. > > Suggested-by: Jan Beulich > Signed-off-by: Roger Pau Monné > --- > Changes since v2: > - New in this version. > --- > xen/common/libelf/libelf-dominfo.c | 6 ++ > 1 file changed, 2 insertions(+), 4 deletions(-) > > diff --git a/xen/common/libelf/libelf-dominfo.c > b/xen/common/libelf/libelf-dominfo.c > index 69c94b6f3bb..abea1011c18 100644 > --- a/xen/common/libelf/libelf-dominfo.c > +++ b/xen/common/libelf/libelf-dominfo.c > @@ -577,10 +577,8 @@ elf_errorstatus elf_xen_parse(struct elf_binary *elf, > > } > > -/* > - * Finally fall back to the __xen_guest section. > - */ > -if ( xen_elfnotes == 0 ) > +/* Finally fall back to the __xen_guest section for PV guests only. */ > +if ( xen_elfnotes == 0 && !hvm ) Isn't this depending on the 2nd patch adding the function parameter? IOW doesn't this break the build, even if just transiently? With the respective hunk from patch 2 moved here Reviewed-by: Jan Beulich Jan
[PATCH v3 1/2] libelf: don't attempt to parse __xen_guest for PVH
The legacy __xen_guest section doesn't support the PHYS32_ENTRY elfnote, so it's pointless to attempt to parse the elfnotes from that section when called from an hvm container. Suggested-by: Jan Beulich Signed-off-by: Roger Pau Monné --- Changes since v2: - New in this version. --- xen/common/libelf/libelf-dominfo.c | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/xen/common/libelf/libelf-dominfo.c b/xen/common/libelf/libelf-dominfo.c index 69c94b6f3bb..abea1011c18 100644 --- a/xen/common/libelf/libelf-dominfo.c +++ b/xen/common/libelf/libelf-dominfo.c @@ -577,10 +577,8 @@ elf_errorstatus elf_xen_parse(struct elf_binary *elf, } -/* - * Finally fall back to the __xen_guest section. - */ -if ( xen_elfnotes == 0 ) +/* Finally fall back to the __xen_guest section for PV guests only. */ +if ( xen_elfnotes == 0 && !hvm ) { shdr = elf_shdr_by_name(elf, "__xen_guest"); if ( ELF_HANDLE_VALID(shdr) ) -- 2.31.1
[PATCH v3 2/2] libelf: improve PVH elfnote parsing
Pass an hvm boolean parameter to the elf note parsing and checking routines, so that better checking can be done in case libelf is dealing with an hvm container. elf_xen_note_check shouldn't return early unless PHYS32_ENTRY is set and the container is of type HVM, or else the loader and version checks would be avoided for kernels intended to be booted as PV but that also have PHYS32_ENTRY set. Adjust elf_xen_addr_calc_check so that the virtual addresses are actually physical ones (by setting virt_base and elf_paddr_offset to zero) when the container is of type HVM, as that container is always started with paging disabled. Signed-off-by: Roger Pau Monné --- Changes since v2: - Do not print messages about VIRT_BASE/ELF_PADDR_OFFSET unset on HVM. - Make it explicit that elf_paddr_offset will be set to 0 regardless of elf_note_start value for HVM. Changes since v1: - Expand comments. - Do not set virt_entry to phys_entry unless it's an HVM container. --- tools/fuzz/libelf/libelf-fuzzer.c | 3 +- tools/libs/guest/xg_dom_elfloader.c | 6 ++-- tools/libs/guest/xg_dom_hvmloader.c | 2 +- xen/arch/x86/hvm/dom0_build.c | 2 +- xen/arch/x86/pv/dom0_build.c| 2 +- xen/common/libelf/libelf-dominfo.c | 43 ++--- xen/include/xen/libelf.h| 2 +- 7 files changed, 37 insertions(+), 23 deletions(-) diff --git a/tools/fuzz/libelf/libelf-fuzzer.c b/tools/fuzz/libelf/libelf-fuzzer.c index 1ba85717114..84fb84720fa 100644 --- a/tools/fuzz/libelf/libelf-fuzzer.c +++ b/tools/fuzz/libelf/libelf-fuzzer.c @@ -17,7 +17,8 @@ int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) return -1; elf_parse_binary(elf); -elf_xen_parse(elf, ); +elf_xen_parse(elf, , false); +elf_xen_parse(elf, , true); return 0; } diff --git a/tools/libs/guest/xg_dom_elfloader.c b/tools/libs/guest/xg_dom_elfloader.c index 06e713fe111..ad71163dd92 100644 --- a/tools/libs/guest/xg_dom_elfloader.c +++ b/tools/libs/guest/xg_dom_elfloader.c @@ -135,7 +135,8 @@ static elf_negerrnoval xc_dom_probe_elf_kernel(struct xc_dom_image *dom) * or else we might be trying to load a plain ELF. */ elf_parse_binary(); -rc = elf_xen_parse(, dom->parms); +rc = elf_xen_parse(, dom->parms, + dom->container_type == XC_DOM_HVM_CONTAINER); if ( rc != 0 ) return rc; @@ -166,7 +167,8 @@ static elf_negerrnoval xc_dom_parse_elf_kernel(struct xc_dom_image *dom) /* parse binary and get xen meta info */ elf_parse_binary(elf); -if ( elf_xen_parse(elf, dom->parms) != 0 ) +if ( elf_xen_parse(elf, dom->parms, + dom->container_type == XC_DOM_HVM_CONTAINER) != 0 ) { rc = -EINVAL; goto out; diff --git a/tools/libs/guest/xg_dom_hvmloader.c b/tools/libs/guest/xg_dom_hvmloader.c index ec6ebad7fd5..3a63b23ba39 100644 --- a/tools/libs/guest/xg_dom_hvmloader.c +++ b/tools/libs/guest/xg_dom_hvmloader.c @@ -73,7 +73,7 @@ static elf_negerrnoval xc_dom_probe_hvm_kernel(struct xc_dom_image *dom) * else we might be trying to load a PV kernel. */ elf_parse_binary(); -rc = elf_xen_parse(, dom->parms); +rc = elf_xen_parse(, dom->parms, true); if ( rc == 0 ) return -EINVAL; diff --git a/xen/arch/x86/hvm/dom0_build.c b/xen/arch/x86/hvm/dom0_build.c index 878dc1d808e..c24b9efdb0a 100644 --- a/xen/arch/x86/hvm/dom0_build.c +++ b/xen/arch/x86/hvm/dom0_build.c @@ -561,7 +561,7 @@ static int __init pvh_load_kernel(struct domain *d, const module_t *image, elf_set_verbose(); #endif elf_parse_binary(); -if ( (rc = elf_xen_parse(, )) != 0 ) +if ( (rc = elf_xen_parse(, , true)) != 0 ) { printk("Unable to parse kernel for ELFNOTES\n"); return rc; diff --git a/xen/arch/x86/pv/dom0_build.c b/xen/arch/x86/pv/dom0_build.c index e0801a9e6d1..af47615b226 100644 --- a/xen/arch/x86/pv/dom0_build.c +++ b/xen/arch/x86/pv/dom0_build.c @@ -353,7 +353,7 @@ int __init dom0_construct_pv(struct domain *d, elf_set_verbose(); elf_parse_binary(); -if ( (rc = elf_xen_parse(, )) != 0 ) +if ( (rc = elf_xen_parse(, , false)) != 0 ) goto out; /* compatibility check */ diff --git a/xen/common/libelf/libelf-dominfo.c b/xen/common/libelf/libelf-dominfo.c index abea1011c18..24d1371dd7a 100644 --- a/xen/common/libelf/libelf-dominfo.c +++ b/xen/common/libelf/libelf-dominfo.c @@ -360,7 +360,7 @@ elf_errorstatus elf_xen_parse_guest_info(struct elf_binary *elf, /* sanity checks*/ static elf_errorstatus elf_xen_note_check(struct elf_binary *elf, - struct elf_dom_parms *parms) + struct elf_dom_parms *parms, bool hvm) { if ( (ELF_PTRVAL_INVALID(parms->elf_note_start)) && (ELF_PTRVAL_INVALID(parms->guest_info)) ) @@ -382,7 +382,7 @@ static elf_errorstatus
[PATCH v3 0/2] libelf: small fixes for PVH
Hello, A couple of small fixes for PVH loading. The first one is likely not very relevant, since PVH couldn't be booted anyway with the data in the __xen_guest section, so it's mostly a cleanup. Second patch fixes the checks for PVH loading, as in that case physical addresses must always be used to perform the bound calculations. Thanks, Roger. Roger Pau Monne (2): libelf: don't attempt to parse __xen_guest for PVH libelf: improve PVH elfnote parsing tools/fuzz/libelf/libelf-fuzzer.c | 3 +- tools/libs/guest/xg_dom_elfloader.c | 6 ++-- tools/libs/guest/xg_dom_hvmloader.c | 2 +- xen/arch/x86/hvm/dom0_build.c | 2 +- xen/arch/x86/pv/dom0_build.c| 2 +- xen/common/libelf/libelf-dominfo.c | 49 + xen/include/xen/libelf.h| 2 +- 7 files changed, 39 insertions(+), 27 deletions(-) -- 2.31.1
Re: [PATCH] x86/Xen: swap NX determination and GDT setup on BSP
On 20.05.21 14:08, Jan Beulich wrote: On 20.05.2021 13:57, Juergen Gross wrote: On 20.05.21 13:42, Jan Beulich wrote: xen_setup_gdt(), via xen_load_gdt_boot(), wants to adjust page tables. For this to work when NX is not available, x86_configure_nx() needs to be called first. Reported-by: Olaf Hering Signed-off-by: Jan Beulich Reviewed-by: Juergen Gross Thanks. I guess I forgot Cc: sta...@vger.kernel.org If you agree, can you please add this before pushing to Linus? Yes, will do that. Juergen OpenPGP_0xB0DE9DD628BF132F.asc Description: OpenPGP public key OpenPGP_signature Description: OpenPGP digital signature
Re: [PATCH] x86/Xen: swap NX determination and GDT setup on BSP
On 20.05.2021 13:57, Juergen Gross wrote: > On 20.05.21 13:42, Jan Beulich wrote: >> xen_setup_gdt(), via xen_load_gdt_boot(), wants to adjust page tables. >> For this to work when NX is not available, x86_configure_nx() needs to >> be called first. >> >> Reported-by: Olaf Hering >> Signed-off-by: Jan Beulich > > Reviewed-by: Juergen Gross Thanks. I guess I forgot Cc: sta...@vger.kernel.org If you agree, can you please add this before pushing to Linus? Jan
Re: [PATCH] x86/Xen: swap NX determination and GDT setup on BSP
On 20.05.21 13:42, Jan Beulich wrote: xen_setup_gdt(), via xen_load_gdt_boot(), wants to adjust page tables. For this to work when NX is not available, x86_configure_nx() needs to be called first. Reported-by: Olaf Hering Signed-off-by: Jan Beulich Reviewed-by: Juergen Gross Juergen OpenPGP_0xB0DE9DD628BF132F.asc Description: OpenPGP public key OpenPGP_signature Description: OpenPGP digital signature
Re: [PATCH] xen-netback: correct success/error reporting for the SKB-with-fraglist case
On 25.02.2021 17:23, Paul Durrant wrote: > On 25/02/2021 14:00, Jan Beulich wrote: >> On 25.02.2021 13:11, Paul Durrant wrote: >>> On 25/02/2021 07:33, Jan Beulich wrote: On 24.02.2021 17:39, Paul Durrant wrote: > On 23/02/2021 16:29, Jan Beulich wrote: >> When re-entering the main loop of xenvif_tx_check_gop() a 2nd time, the >> special considerations for the head of the SKB no longer apply. Don't >> mistakenly report ERROR to the frontend for the first entry in the list, >> even if - from all I can tell - this shouldn't matter much as the overall >> transmit will need to be considered failed anyway. >> >> Signed-off-by: Jan Beulich >> >> --- a/drivers/net/xen-netback/netback.c >> +++ b/drivers/net/xen-netback/netback.c >> @@ -499,7 +499,7 @@ check_frags: >> * the header's copy failed, and they >> are >> * sharing a slot, send an error >> */ >> -if (i == 0 && sharedslot) >> +if (i == 0 && !first_shinfo && >> sharedslot) >> xenvif_idx_release(queue, >> pending_idx, >> >> XEN_NETIF_RSP_ERROR); >> else >> > > I think this will DTRT, but to my mind it would make more sense to clear > 'sharedslot' before the 'goto check_frags' at the bottom of the function. That was my initial idea as well, but - I think it is for a reason that the variable is "const". - There is another use of it which would then instead need further amending (and which I believe is at least part of the reason for the variable to be "const"). >>> >>> Oh, yes. But now that I look again, don't you want: >>> >>> if (i == 0 && first_shinfo && sharedslot) >>> >>> ? (i.e no '!') >>> >>> The comment states that the error should be indicated when the first >>> frag contains the header in the case that the map succeeded but the >>> prior copy from the same ref failed. This can only possibly be the case >>> if this is the 'first_shinfo' >> >> I don't think so, no - there's a difference between "first frag" >> (at which point first_shinfo is NULL) and first frag list entry >> (at which point first_shinfo is non-NULL). > > Yes, I realise I got it backwards. Confusing name but the comment above > its declaration does explain. > >> >>> (which is why I still think it is safe to unconst 'sharedslot' and >>> clear it). >> >> And "no" here as well - this piece of code >> >> /* First error: if the header haven't shared a slot with the >> * first frag, release it as well. >> */ >> if (!sharedslot) >> xenvif_idx_release(queue, >> XENVIF_TX_CB(skb)->pending_idx, >> XEN_NETIF_RSP_OKAY); >> >> specifically requires sharedslot to have the value that was >> assigned to it at the start of the function (this property >> doesn't go away when switching from fragments to frag list). >> Note also how it uses XENVIF_TX_CB(skb)->pending_idx, i.e. the >> value the local variable pending_idx was set from at the start >> of the function. >> > > True, we do have to deal with freeing up the header if the first map > error comes on the frag list. > > Reviewed-by: Paul Durrant Since I've not seen this go into 5.13-rc, may I ask what the disposition of this is? Jan
[PATCH] x86/Xen: swap NX determination and GDT setup on BSP
xen_setup_gdt(), via xen_load_gdt_boot(), wants to adjust page tables. For this to work when NX is not available, x86_configure_nx() needs to be called first. Reported-by: Olaf Hering Signed-off-by: Jan Beulich --- a/arch/x86/xen/enlighten_pv.c +++ b/arch/x86/xen/enlighten_pv.c @@ -1273,16 +1273,16 @@ asmlinkage __visible void __init xen_sta /* Get mfn list */ xen_build_dynamic_phys_to_machine(); + /* Work out if we support NX */ + get_cpu_cap(_cpu_data); + x86_configure_nx(); + /* * Set up kernel GDT and segment registers, mainly so that * -fstack-protector code can be executed. */ xen_setup_gdt(0); - /* Work out if we support NX */ - get_cpu_cap(_cpu_data); - x86_configure_nx(); - /* Determine virtual and physical address sizes */ get_cpu_address_sizes(_cpu_data);
Re: Uses of /hypervisor memory range (was: FreeBSD/Xen/ARM issues)
Hi Stefano, On 20/05/2021 00:25, Stefano Stabellini wrote: On Sat, 15 May 2021, Julien Grall wrote: My feeling is one of two things should happen with the /hypervisor address range: 1> OSes could be encouraged to use it for all foreign mappings. The range should be dynamic in some fashion. There could be a handy way to allow configuring the amount of address space thus reserved. In the context of XSA-300 and virtio on Xen on Arm, we discussed about providing a revion for foreign mapping. The main trouble here is figuring out of the size, if you mess it up then you may break all the PV drivers. If the problem is finding space, then I would like to suggest a different approach (I think I may have discussed it with Andrew). Xen is in maintaining the P2M for the guest and therefore now where are the unallocated space. How about introducing a new hypercall to ask for "unallocated space"? This would not work for older hypervisor but you could use the RAM instead (as Linux does). This is has drawbacks (e.g. shattering superpage, reducing the amount of memory usuable...), but for 1> you would also need a hack for older Xen. I am starting to wonder if it would make sense to add a new device tree binding to describe larger free region for foreign mapping rather then a hypercall. It could be several GBs or even TBs in size. I like the idea of having it device tree because after all this is static information. I can see that a hypercall would also work and I am open to it but if possible I think it would be better not to extend the hypercall interface when there is a good alternative. There are two issues with the Device-Tree approach: 1) This doesn't work on ACPI 2) It is not clear how to define the size of the region. An admin doesn't have the right information in hand to know how many mappings will be done (a backend may map multiple time the same page). The advantage of the hypercall solution is the size is "virtually" unlimited and not based on a specific firmware. Cheers, -- Julien Grall
Re: Hand over of the Xen shared info page
On 20/05/2021 06:21, Oleksandr Andrushchenko wrote: Hi, all! Hi, On 5/20/21 2:11 AM, Stefano Stabellini wrote: On Wed, 19 May 2021, Julien Grall wrote: On 14/05/2021 10:50, Anastasiia Lukianenko wrote: Hi Julien! Hello, On Thu, 2021-05-13 at 09:37 +0100, Julien Grall wrote: On 13/05/2021 09:03, Anastasiia Lukianenko wrote: The alternative is for U-boot to go through the DT and infer which regions are free (IOW any region not described). Thank you for interest in the problem and advice on how to solve it. Could you please clarify how we could find free regions using DT in U- boot? I don't know U-boot code, so I can't tell whether what I suggest would work. In theory, the device-tree should described every region allocated in address space. So if you parse the device-tree and create a list (or any datastructure) with the regions, then any range not present in the list would be free region you could use. Yes, any "empty" memory region which is neither memory nor MMIO should work. You need to account on many things while creating the list of regions: device register mappings, reserved nodes, memory nodes, device tree overlays parsing etc. It all seem to be a not-that-trivial task and after all if implemented it only lives in U-boot and you have to maintain that code. So, if some other entity needs the same you need to implement that as well. Yes, there are some complexity. I have suggested other approach in a separate thread. Did you look at them? And also you can imagine a system without device tree at all... Xen doesn't provide a stable guest layout. Such system would have to be tailored to a given guest configuration, Xen version (can be custom)... Therefore, hardcoding the value in the system (not in Xen headers!) is not going to make it much worse. So, I am not saying it is not possible to implement, but I just question if such an implementation is really a good way forward. However, I realized a few days ago that the magic pages are not described in the DT. We probably want to fix it by marking the page as "reserved" or create a specific bindings. So you will need a specific quirk for them. It should also be possible to keep the shared info page allocated and simply pass the address to the kernel by adding the right node to device tree. And then you need to modify your OS and this is not only Linux... To do that, we would have to add a description of the magic pages to device tree, which I think would be good to have in any case. In that case it would be best to add a proper binding for it under the "xen,xen" node. I would say that querying Xen for such a memory page seems much more cleaner and allows the guest OS either to continue using the existing method with memory allocation or using the page provided by Xen. IIUC your proposal, you are asking to add an hypercall to query which guest physical region can be used for the shared info page. This may solve the problem you have at hand, but this is not scalable. There are a few other regions which regions unallocated memory (e.g. grant table, grant mapping, foreign memory map,). So if we were going to involve Xen, then I think it is better to have a generic way to ask Xen for unallocated space. Cheers, -- Julien Grall
Re: [PATCH 03/10] xen/arm: introduce PGC_reserved
Hi Jan, On 20/05/2021 10:27, Jan Beulich wrote: On 20.05.2021 10:59, Julien Grall wrote: On 20/05/2021 09:40, Penny Zheng wrote: Oh, Second thought on this. And I think you are referring to balloon in/out here, hmm, also, like Yes I am referring to balloon in/out. I replied there: "For issues on ballooning out or in, it is not supported here. So long you are not using the solution in prod then you are fine (see below)... But then we should make clear this feature is a tech preview. Domain on Static Allocation and 1:1 direct-map are all based on dom0-less right now, so no PV, grant table, event channel, etc, considered. Right now, it only supports device got passthrough into the guest." So we are not creating the hypervisor node in the DT for dom0less domU. However, the hypercalls are still accessible by a domU if it really wants to use them. Therefore, a guest can easily mess up with your static configuration and predictability. IMHO, this is a must to solve before "static memory" can be used in production. I'm having trouble seeing why it can't be addressed right away: It can be solved right away. Dom0less domUs don't officially know they are running on Xen (they could bruteforce it though), so I think it would be fine to merge without it for a tech preview. Such guests can balloon in only after they've ballooned out some pages, and such balloon-in requests would be satisfied from the same static memory that is associated by the guest anyway. This would require some bookeeping to know the page was used previously. But nothing very challenging though. Cheers, -- Julien Grall
Re: regression in recent pvops kernels, dom0 crashes early
Am Thu, 20 May 2021 09:45:03 +0200 schrieb Olaf Hering : > Am Thu, 20 May 2021 09:03:34 +0200 > schrieb Jan Beulich : > > > Just to be sure - you did not need the other patch that I said I suspect > > is needed as a prereq? > Yes, I needed just this single patch which moves x86_configure_nx up. I tried the very same approach with the SLE12-SP4-LTSS branch, which also fixed dom0 boot. Olaf pgp1uyw73SrGz.pgp Description: Digitale Signatur von OpenPGP
Re: [PATCH 04/10] xen/arm: static memory initialization
On 20.05.2021 11:04, Penny Zheng wrote: > Hi Jan > >> -Original Message- >> From: Jan Beulich >> Sent: Tuesday, May 18, 2021 6:43 PM >> To: Penny Zheng >> Cc: Bertrand Marquis ; Wei Chen >> ; nd ; xen-devel@lists.xenproject.org; >> sstabell...@kernel.org; jul...@xen.org >> Subject: Re: [PATCH 04/10] xen/arm: static memory initialization >> >> On 18.05.2021 11:51, Penny Zheng wrote: From: Jan Beulich Sent: Tuesday, May 18, 2021 3:16 PM On 18.05.2021 07:21, Penny Zheng wrote: > This patch introduces static memory initialization, during system > RAM boot up. > > New func init_staticmem_pages is the equivalent of init_heap_pages, > responsible for static memory initialization. > > Helper func free_staticmem_pages is the equivalent of > free_heap_pages, to free nr_pfns pages of static memory. > For each page, it includes the following steps: > 1. change page state from in-use(also initialization state) to free > state and grant PGC_reserved. > 2. set its owner NULL and make sure this page is not a guest frame > any more But isn't the goal (as per the previous patch) to associate such pages with a _specific_ domain? >>> >>> Free_staticmem_pages are alike free_heap_pages, are not used only for >> initialization. >>> Freeing used pages to unused are also included. >>> Here, setting its owner NULL is to set owner in used field NULL. >> >> I'm afraid I still don't understand. >> > > When initializing heap, xen is using free_heap_pages to do the initialization. > And also when normal domain get destroyed/rebooted, xen is using > free_domheap_pages, > which calls free_heap_pages to free the pages. > > So here, since free_staticmem_pages is the equivalent of the free_heap_pages > for static > memory, I'm also considering both two scenarios here. And if it is domain get > destroyed/rebooted, > Page state is in inuse state(PGC_inuse), and the page_info.v.inuse.domain is > holding the > domain owner. > When freeing then, we need to switch the page state to free state(PGC_free) > and set its inuse owner to NULL. Perhaps my confusion comes from your earlier outline missing 3. re-associate the page with the domain (as represented in free pages) The property of "designated for Dom" should never go away, if I understand the overall proposal correctly. Jan
Re: [PATCH v2] libelf: improve PVH elfnote parsing
On 20.05.2021 11:27, Roger Pau Monné wrote: > On Wed, May 19, 2021 at 12:34:19PM +0200, Jan Beulich wrote: >> On 18.05.2021 16:47, Roger Pau Monne wrote: >>> @@ -425,8 +425,11 @@ static elf_errorstatus elf_xen_addr_calc_check(struct >>> elf_binary *elf, >>> return -1; >>> } >>> >>> -/* Initial guess for virt_base is 0 if it is not explicitly defined. */ >>> -if ( parms->virt_base == UNSET_ADDR ) >>> +/* >>> + * Initial guess for virt_base is 0 if it is not explicitly defined in >>> the >>> + * PV case. For PVH virt_base is forced to 0 because paging is >>> disabled. >>> + */ >>> +if ( parms->virt_base == UNSET_ADDR || hvm ) >>> { >>> parms->virt_base = 0; >>> elf_msg(elf, "ELF: VIRT_BASE unset, using %#" PRIx64 "\n", >> >> This message is wrong now if hvm is true but parms->virt_base != UNSET_ADDR. >> Best perhaps is to avoid emitting the message altogether when hvm is true. >> (Since you'll be touching it anyway, perhaps a good opportunity to do away >> with passing parms->virt_base to elf_msg(), and instead simply use a literal >> zero.) >> >>> @@ -441,8 +444,10 @@ static elf_errorstatus elf_xen_addr_calc_check(struct >>> elf_binary *elf, >>> * >>> * If we are using the modern ELF notes interface then the default >>> * is 0. >>> + * >>> + * For PVH this is forced to 0, as it's already a legacy option for PV. >>> */ >>> -if ( parms->elf_paddr_offset == UNSET_ADDR ) >>> +if ( parms->elf_paddr_offset == UNSET_ADDR || hvm ) >>> { >>> if ( parms->elf_note_start ) >> >> Don't you want "|| hvm" here as well, or alternatively suppress the >> fallback to the __xen_guest section in the PVH case (near the end of >> elf_xen_parse())? > > The legacy __xen_guest section doesn't support PHYS32_ENTRY, so yes, > that part could be completely skipped when called from an HVM > container. > > I think I will fix that in another patch though if you are OK, as > it's not strictly related to the calculation fixes done here. That's fine; it wants to be a prereq to the one here then, though, I think. Jan
Re: [PATCH v2] libelf: improve PVH elfnote parsing
On Wed, May 19, 2021 at 12:34:19PM +0200, Jan Beulich wrote: > On 18.05.2021 16:47, Roger Pau Monne wrote: > > @@ -425,8 +425,11 @@ static elf_errorstatus elf_xen_addr_calc_check(struct > > elf_binary *elf, > > return -1; > > } > > > > -/* Initial guess for virt_base is 0 if it is not explicitly defined. */ > > -if ( parms->virt_base == UNSET_ADDR ) > > +/* > > + * Initial guess for virt_base is 0 if it is not explicitly defined in > > the > > + * PV case. For PVH virt_base is forced to 0 because paging is > > disabled. > > + */ > > +if ( parms->virt_base == UNSET_ADDR || hvm ) > > { > > parms->virt_base = 0; > > elf_msg(elf, "ELF: VIRT_BASE unset, using %#" PRIx64 "\n", > > This message is wrong now if hvm is true but parms->virt_base != UNSET_ADDR. > Best perhaps is to avoid emitting the message altogether when hvm is true. > (Since you'll be touching it anyway, perhaps a good opportunity to do away > with passing parms->virt_base to elf_msg(), and instead simply use a literal > zero.) > > > @@ -441,8 +444,10 @@ static elf_errorstatus elf_xen_addr_calc_check(struct > > elf_binary *elf, > > * > > * If we are using the modern ELF notes interface then the default > > * is 0. > > + * > > + * For PVH this is forced to 0, as it's already a legacy option for PV. > > */ > > -if ( parms->elf_paddr_offset == UNSET_ADDR ) > > +if ( parms->elf_paddr_offset == UNSET_ADDR || hvm ) > > { > > if ( parms->elf_note_start ) > > Don't you want "|| hvm" here as well, or alternatively suppress the > fallback to the __xen_guest section in the PVH case (near the end of > elf_xen_parse())? The legacy __xen_guest section doesn't support PHYS32_ENTRY, so yes, that part could be completely skipped when called from an HVM container. I think I will fix that in another patch though if you are OK, as it's not strictly related to the calculation fixes done here. Thanks, Roger.
Re: [PATCH 03/10] xen/arm: introduce PGC_reserved
On 20.05.2021 10:59, Julien Grall wrote: > On 20/05/2021 09:40, Penny Zheng wrote: >> Oh, Second thought on this. >> And I think you are referring to balloon in/out here, hmm, also, like > > Yes I am referring to balloon in/out. > >> I replied there: >> "For issues on ballooning out or in, it is not supported here. > > So long you are not using the solution in prod then you are fine (see > below)... But then we should make clear this feature is a tech preview. > >> Domain on Static Allocation and 1:1 direct-map are all based on >> dom0-less right now, so no PV, grant table, event channel, etc, considered. >> >> Right now, it only supports device got passthrough into the guest." > > So we are not creating the hypervisor node in the DT for dom0less domU. > However, the hypercalls are still accessible by a domU if it really > wants to use them. > > Therefore, a guest can easily mess up with your static configuration and > predictability. > > IMHO, this is a must to solve before "static memory" can be used in > production. I'm having trouble seeing why it can't be addressed right away: Such guests can balloon in only after they've ballooned out some pages, and such balloon-in requests would be satisfied from the same static memory that is associated by the guest anyway. Jan
Re: [PATCH] Xen: Design doc for 1:1 direct-map and static allocation
On 20/05/2021 06:36, Penny Zheng wrote: Hi Julien Hi Penny, + +Later, when domain get destroyed and memory relinquished, only pages +in `page_list` go back to heap, and pages in `reserved_page_list` shall not. While going through the series, I could not find any code implementing this. However, this is not enough to prevent a page to go to the heap allocator because a domain can release memory at runtime using hypercalls like XENMEM_remove_from_physmap. One of the use case is when the guest decides to balloon out some memory. This will call free_domheap_pages(). Effectively, you are treating static memory as domheap pages. So I think it would be better if you hook in free_domheap_pages() to decide which allocator is used. Now, if a guest can balloon out memory, it can also balloon in memory. There are two cases: 1) The region used to be RAM region statically allocated 2) The region used to be unallocated. I think for 1), we need to be able to re-use the page previously. For 2), it is not clear to me whether a guest with memory statically allocated should be allowed to allocate "dynamic" pages. Yeah, I share the same with you of hooking in free_domheap_pages(). I'm thinking that if pages of PGC_reserved, we may create a new func free_staticmem_pages to free them. For issues on ballooning out or in, it is not supported here. It is fine that the implementation doesn't yet implement it. However, I think the design document should take into account ballooning. This is because even if... Domain on Static Allocation and 1:1 direct-map are all based on dom0-less right now, so no PV, grant table, event channel, etc, considered. ... there is no PV support & co, a guest is still able to issue hypercalls (they are not hidden). Therefore your guest will be able to disturb your static allocation. Right now, it only supports device got passthrough into the guest. +### Memory Allocation for Domains on Static Allocation + +RAM regions pre-defined as static memory for one specifc domain shall +be parsed and reserved from the beginning. And they shall never go to +any memory allocator for any use. Technically, you are introducing a new allocator. So do you mean they should not be given to neither the buddy allocator nor the bot allocator? Yes. These pre-defined RAM regions will not be given to any current memory allocator. If be given there, there is no guarantee that it will not be allocated for other use. And right now, in my current design, these pre-defined RAM regions are either for one specific domain as guest RAM or as XEN heap. + +Later when allocating static memory for this specific domain, after +acquiring those reserved regions, users need to a do set of +verification before assigning. +For each page there, it at least includes the following steps: +1. Check if it is in free state and has zero reference count. +2. Check if the page is reserved(`PGC_reserved`). + +Then, assigning these pages to this specific domain, and all pages go +to one new linked page list `reserved_page_list`. + +At last, set up guest P2M mapping. By default, it shall be mapped to +the fixed guest RAM address `GUEST_RAM0_BASE`, `GUEST_RAM1_BASE`, +just like normal domains. But later in 1:1 direct-map design, if +`direct-map` is set, the guest physical address will equal to physical address. + +### Static Allocation for Xen itself + +### New Deivce Tree Node: `xen,reserved_heap` s/Deivce/Device/ Thx. + +Static memory for Xen heap refers to parts of RAM reserved in the +beginning for Xen heap only. The memory is pre-defined through XEN +configuration using physical address ranges. + +The reserved memory for Xen heap is an optional feature and can be +enabled by adding a device tree property in the `chosen` node. +Currently, this feature is only supported on AArch64. + +Here is one example: + + +chosen { +xen,reserved-heap = <0x0 0x3000 0x0 0x4000>; +... +}; + +RAM at 0x3000 of 1G size will be reserved as heap memory. Later, +heap allocator will allocate memory only from this specific region. This section is quite confusing. I think we need to clearly differentiate heap vs allocator. In Xen we have two heaps: 1) Xen heap: It is always mapped in Xen virtual address space. This is mainly used for xen internal allocation. 2) Domain heap: It may not always be mapped in Xen virtual address space. This is mainly used for domain memory and mapped on-demand. For Arm64 (and x86), two heaps are allocated from the same region. But on Arm32, they are different. We also have two allocator: 1) Boot allocator: This is used during boot only. There is no concept of heap at this time. 2) Buddy allocator: This is the current runtime allocator. This can either allocator from either heap. AFAICT, this design is introducing a 3rd allocator that will return domain heap pages. Now, back to this section, are you saying you
Re: [PATCH v2 2/2] automation: fix dependencies on openSUSE Tumbleweed containers
On Wed, 2021-05-19 at 20:25 +0100, Andrew Cooper wrote: > > I built the container locally, which reused some of the layers you > pushed, and it all pushed successfully for me. > > I've committed this series so xen.git matches reality. Lets see how > the > updated container fairs... > Well, something still looks off, although I don't know what. In fact, we still see: https://gitlab.com/xen-project/xen/-/jobs/1277448985 checking for deflateCopy in -lz... no 2608configure: error: Could not find zlib 2609configure: error: ./configure failed for tools Which means we're missing libz. In fact, if I pull the container that's currently in the registry, I can see that: dario@b17aaa86cacf:~> rpm -qa|grep zlib zlib-devel-1.2.11-18.3.x86_64 However, if I build it locally, with: dario@Solace:~/src/xen/xen.git/automation/build> make suse/opensuse-tumbleweed And then enter and use it for building, the package is there and configure works. dario@51e463d1d47e:~> rpm -qa|grep libz libzstd1-1.4.9-1.2.x86_64 libz1-1.2.11-18.3.x86_64 libzck1-1.1.11-1.1.x86_64 libzypp-17.25.10-1.1.x86_64 libzio1-1.06-4.26.x86_64 libzzip-0-13-0.13.72-1.2.x86_64 libzstd-devel-1.4.9-1.2.x86_64 dario@51e463d1d47e:~> ./configure checking build system type... x86_64-pc-linux-gnu checking host system type... x86_64-pc-linux-gnu ... ... ... checking for pandoc... /usr/bin/pandoc checking for perl... /usr/bin/perl configure: creating ./config.status So, what am I missing or doing wrong? Thanks and Regards -- Dario Faggioli, Ph.D http://about.me/dario.faggioli Virtualization Software Engineer SUSE Labs, SUSE https://www.suse.com/ --- <> (Raistlin Majere) signature.asc Description: This is a digitally signed message part
[linux-linus test] 162097: regressions - FAIL
flight 162097 linux-linus real [real] http://logs.test-lab.xenproject.org/osstest/logs/162097/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-i386-xl-xsm7 xen-install fail REGR. vs. 152332 test-amd64-i386-xl-qemuu-ws16-amd64 7 xen-install fail REGR. vs. 152332 test-amd64-i386-xl-qemuu-debianhvm-amd64-shadow 7 xen-install fail REGR. vs. 152332 test-amd64-i386-xl-qemuu-dmrestrict-amd64-dmrestrict 7 xen-install fail REGR. vs. 152332 test-amd64-i386-qemuu-rhel6hvm-intel 7 xen-install fail REGR. vs. 152332 test-amd64-i386-xl-qemut-debianhvm-amd64 7 xen-install fail REGR. vs. 152332 test-amd64-i386-qemut-rhel6hvm-intel 7 xen-install fail REGR. vs. 152332 test-amd64-i386-xl-qemuu-debianhvm-i386-xsm 7 xen-install fail REGR. vs. 152332 test-amd64-i386-examine 6 xen-install fail REGR. vs. 152332 test-amd64-i386-xl-qemut-ws16-amd64 7 xen-install fail REGR. vs. 152332 test-amd64-i386-xl-qemuu-debianhvm-amd64 7 xen-install fail REGR. vs. 152332 test-amd64-i386-libvirt 7 xen-install fail REGR. vs. 152332 test-amd64-i386-pair 10 xen-install/src_host fail REGR. vs. 152332 test-amd64-i386-pair 11 xen-install/dst_host fail REGR. vs. 152332 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 7 xen-install fail REGR. vs. 152332 test-amd64-i386-qemut-rhel6hvm-amd 7 xen-installfail REGR. vs. 152332 test-amd64-i386-libvirt-xsm 7 xen-install fail REGR. vs. 152332 test-amd64-coresched-i386-xl 7 xen-install fail REGR. vs. 152332 test-amd64-i386-xl7 xen-install fail REGR. vs. 152332 test-amd64-i386-qemuu-rhel6hvm-amd 7 xen-installfail REGR. vs. 152332 test-amd64-i386-freebsd10-amd64 7 xen-install fail REGR. vs. 152332 test-amd64-i386-xl-pvshim 7 xen-install fail REGR. vs. 152332 test-amd64-i386-xl-raw7 xen-install fail REGR. vs. 152332 test-amd64-i386-xl-qemut-debianhvm-i386-xsm 7 xen-install fail REGR. vs. 152332 test-amd64-i386-xl-shadow 7 xen-install fail REGR. vs. 152332 test-amd64-i386-freebsd10-i386 7 xen-installfail REGR. vs. 152332 test-amd64-i386-xl-qemuu-ovmf-amd64 7 xen-install fail REGR. vs. 152332 test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm 7 xen-install fail REGR. vs. 152332 test-amd64-i386-xl-qemut-win7-amd64 7 xen-install fail REGR. vs. 152332 test-amd64-i386-xl-qemuu-win7-amd64 7 xen-install fail REGR. vs. 152332 test-amd64-i386-libvirt-pair 10 xen-install/src_host fail REGR. vs. 152332 test-amd64-i386-libvirt-pair 11 xen-install/dst_host fail REGR. vs. 152332 test-amd64-amd64-libvirt-vhd 13 guest-start fail REGR. vs. 152332 test-arm64-arm64-xl-thunderx 13 debian-fixup fail REGR. vs. 152332 test-arm64-arm64-xl 13 debian-fixup fail REGR. vs. 152332 test-arm64-arm64-xl-xsm 14 guest-start fail REGR. vs. 152332 test-arm64-arm64-xl-credit1 13 debian-fixup fail REGR. vs. 152332 test-arm64-arm64-libvirt-xsm 13 debian-fixup fail REGR. vs. 152332 test-arm64-arm64-xl-credit2 13 debian-fixup fail REGR. vs. 152332 test-amd64-amd64-amd64-pvgrub 20 guest-stop fail REGR. vs. 152332 test-amd64-amd64-i386-pvgrub 20 guest-stop fail REGR. vs. 152332 test-amd64-amd64-xl-qcow213 guest-start fail REGR. vs. 152332 test-armhf-armhf-xl-vhd 13 guest-start fail REGR. vs. 152332 Tests which did not succeed, but are not blocking: test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 152332 test-armhf-armhf-libvirt 16 saverestore-support-checkfail like 152332 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 152332 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 152332 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 152332 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail like 152332 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 152332 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail never pass test-amd64-amd64-libvirt 15 migrate-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-armhf-armhf-xl-arndale 15 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit2 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 16
RE: [PATCH 04/10] xen/arm: static memory initialization
Hi Jan > -Original Message- > From: Jan Beulich > Sent: Tuesday, May 18, 2021 6:43 PM > To: Penny Zheng > Cc: Bertrand Marquis ; Wei Chen > ; nd ; xen-devel@lists.xenproject.org; > sstabell...@kernel.org; jul...@xen.org > Subject: Re: [PATCH 04/10] xen/arm: static memory initialization > > On 18.05.2021 11:51, Penny Zheng wrote: > >> From: Jan Beulich > >> Sent: Tuesday, May 18, 2021 3:16 PM > >> > >> On 18.05.2021 07:21, Penny Zheng wrote: > >>> This patch introduces static memory initialization, during system > >>> RAM boot > >> up. > >>> > >>> New func init_staticmem_pages is the equivalent of init_heap_pages, > >>> responsible for static memory initialization. > >>> > >>> Helper func free_staticmem_pages is the equivalent of > >>> free_heap_pages, to free nr_pfns pages of static memory. > >>> For each page, it includes the following steps: > >>> 1. change page state from in-use(also initialization state) to free > >>> state and grant PGC_reserved. > >>> 2. set its owner NULL and make sure this page is not a guest frame > >>> any more > >> > >> But isn't the goal (as per the previous patch) to associate such > >> pages with a _specific_ domain? > >> > > > > Free_staticmem_pages are alike free_heap_pages, are not used only for > initialization. > > Freeing used pages to unused are also included. > > Here, setting its owner NULL is to set owner in used field NULL. > > I'm afraid I still don't understand. > When initializing heap, xen is using free_heap_pages to do the initialization. And also when normal domain get destroyed/rebooted, xen is using free_domheap_pages, which calls free_heap_pages to free the pages. So here, since free_staticmem_pages is the equivalent of the free_heap_pages for static memory, I'm also considering both two scenarios here. And if it is domain get destroyed/rebooted, Page state is in inuse state(PGC_inuse), and the page_info.v.inuse.domain is holding the domain owner. When freeing then, we need to switch the page state to free state(PGC_free) and set its inuse owner to NULL. I'll clarify it more clearly in commit message. > > Still, I need to add more explanation here. > > Yes please. > > >>> @@ -1512,6 +1515,49 @@ static void free_heap_pages( > >>> spin_unlock(_lock); > >>> } > >>> > >>> +/* Equivalent of free_heap_pages to free nr_pfns pages of static > >>> +memory. */ static void free_staticmem_pages(struct page_info *pg, > >> unsigned long nr_pfns, > >>> + bool need_scrub) > >> > >> Right now this function gets called only from an __init one. Unless > >> it is intended to gain further callers, it should be marked __init itself > >> then. > >> Otherwise it should be made sure that other architectures don't > >> include this (dead there) code. > >> > > > > Sure, I'll add __init. Thx. > > Didn't I see a 2nd call to the function later in the series? That one doesn't > permit the function to be __init, iirc. > > Jan Cheers Penny
Re: [PATCH 03/10] xen/arm: introduce PGC_reserved
On 20/05/2021 09:40, Penny Zheng wrote: Hi julien Hi Penny, -Original Message- From: Penny Zheng Sent: Thursday, May 20, 2021 2:20 PM To: Julien Grall ; xen-devel@lists.xenproject.org; sstabell...@kernel.org Cc: Bertrand Marquis ; Wei Chen ; nd Subject: RE: [PATCH 03/10] xen/arm: introduce PGC_reserved Hi Julien -Original Message- From: Julien Grall Sent: Thursday, May 20, 2021 3:46 AM To: Penny Zheng ; xen-devel@lists.xenproject.org; sstabell...@kernel.org Cc: Bertrand Marquis ; Wei Chen ; nd Subject: Re: [PATCH 03/10] xen/arm: introduce PGC_reserved On 19/05/2021 04:16, Penny Zheng wrote: Hi Julien Hi Penny, -Original Message- From: Julien Grall Sent: Tuesday, May 18, 2021 5:46 PM To: Penny Zheng ; xen-devel@lists.xenproject.org; sstabell...@kernel.org Cc: Bertrand Marquis ; Wei Chen ; nd Subject: Re: [PATCH 03/10] xen/arm: introduce PGC_reserved On 18/05/2021 06:21, Penny Zheng wrote: In order to differentiate pages of static memory, from those allocated from heap, this patch introduces a new page flag PGC_reserved to tell. New struct reserved in struct page_info is to describe reserved page info, that is, which specific domain this page is reserved to. > Helper page_get_reserved_owner and page_set_reserved_owner are designated to get/set reserved page's owner. Struct domain is enlarged to more than PAGE_SIZE, due to newly-imported struct reserved in struct page_info. struct domain may embed a pointer to a struct page_info but never directly embed the structure. So can you clarify what you mean? Signed-off-by: Penny Zheng --- xen/include/asm-arm/mm.h | 16 +++- 1 file changed, 15 insertions(+), 1 deletion(-) diff --git a/xen/include/asm-arm/mm.h b/xen/include/asm-arm/mm.h index 0b7de3102e..d8922fd5db 100644 --- a/xen/include/asm-arm/mm.h +++ b/xen/include/asm-arm/mm.h @@ -88,7 +88,15 @@ struct page_info */ u32 tlbflush_timestamp; }; -u64 pad; + +/* Page is reserved. */ +struct { +/* + * Reserved Owner of this page, + * if this page is reserved to a specific domain. + */ +struct domain *domain; +} reserved; The space in page_info is quite tight, so I would like to avoid introducing new fields unless we can't get away from it. In this case, it is not clear why we need to differentiate the "Owner" vs the "Reserved Owner". It might be clearer if this change is folded in the first user of the field. As an aside, for 32-bit Arm, you need to add a 4-byte padding. Yeah, I may delete this change. I imported this change as considering the functionality of rebooting domain on static allocation. A little more discussion on rebooting domain on static allocation. Considering the major user cases for domain on static allocation are system has a total pre-defined, static behavior all the time. No domain allocation on runtime, while there still exists domain rebooting. Hmmm... With this series it is still possible to allocate memory at runtime outside of the static allocation (see my comment on the design document [1]). So is it meant to be complete? I'm guessing that the "allocate memory at runtime outside of the static allocation" is referring to XEN heap on static allocation, that is, users pre- defining heap in device tree configuration to let the whole system more static and predictable. And I've replied you in the design there, sorry for the late reply. Save your time, and I’ll paste here: "Right now, on AArch64, all RAM, except reserved memory, will be finally given to buddy allocator as heap, like you said, guest RAM for normal domain will be allocated from there, xmalloc eventually is get memory from there, etc. So we want to refine the heap here, not iterating through `bootinfo.mem` to set up XEN heap, but like iterating `bootinfo. reserved_heap` to set up XEN heap. On ARM32, xen heap and domain heap are separately mapped, which is more complicated here. That's why I only talking about implementing these features on AArch64 as first step." Above implementation will be delivered through another patch Serie. This patch Serie Is only focusing on Domain on Static Allocation. Oh, Second thought on this. And I think you are referring to balloon in/out here, hmm, also, like Yes I am referring to balloon in/out. I replied there: "For issues on ballooning out or in, it is not supported here. So long you are not using the solution in prod then you are fine (see below)... But then we should make clear this feature is a tech preview. Domain on Static Allocation and 1:1 direct-map are all based on dom0-less right now, so no PV, grant table, event channel, etc, considered. Right now, it only supports device got passthrough into the guest." So we are not creating the hypervisor node in the DT for dom0less domU. However, the hypercalls are still accessible by a domU if it really wants to use them. Therefore,
Re: [PATCH 01/10] xen/arm: introduce domain on Static Allocation
Hi, On 20/05/2021 07:07, Penny Zheng wrote: It will be consistent with the ones defined in the parent node, domUx. Hmmm... To take the example you provided, the parent would be chosen. However, from the example, I would expect the property #{address, size}-cells in domU1 to be used. What did I miss? Yeah, the property #{address, size}-cells in domU1 will be used. And the parent node will be domU1. You may have misunderstood what I meant. "domU1" is the node that contains the property "xen,static-mem". The parent node would be the one above (in our case "chosen"). The dtb property should look like as follows: chosen { domU1 { compatible = "xen,domain"; #address-cells = <0x2>; #size-cells = <0x2>; cpus = <2>; xen,static-mem = <0x0 0x3000 0x0 0x2000>; ... }; }; +DOMU1 on Static Allocation has reserved RAM bank at 0x3000 of 512MB size +Static Allocation is only supported on AArch64 for now. The code doesn't seem to be AArch64 specific. So why can't this be used for 32-bit Arm? True, we have plans to make it also workable in AArch32 in the future. Because we considered XEN on cortex-R52. All the code seems to be implemented in arm generic code. So isn't it already working? static int __init early_scan_node(const void *fdt, int node, const char *name, int depth, u32 address_cells, u32 size_cells, @@ -345,6 +394,9 @@ static int __init early_scan_node(const void *fdt, process_multiboot_node(fdt, node, name, address_cells, size_cells); else if ( depth == 1 && device_tree_node_matches(fdt, node, "chosen") ) process_chosen_node(fdt, node, name, address_cells, size_cells); +else if ( depth == 2 && fdt_get_property(fdt, node, + "xen,static-mem", NULL) ) +process_static_memory(fdt, node, "xen,static-mem", address_cells, + size_cells, _mem); I am a bit concerned to add yet another method to parse the DT and all the extra code it will add like in patch #2. From the host PoV, they are memory reserved for a specific purpose. Would it be possible to consider the reserve-memory binding for that purpose? This will happen outside of chosen, but we could use a phandle to refer the region. Correct me if I understand wrongly, do you mean what this device tree snippet looks like: Yes, this is what I had in mind. Although I have one small remark below. reserved-memory { #address-cells = <2>; #size-cells = <2>; ranges; static-mem-domU1: static-mem@0x3000{ I think the node would need to contain a compatible (name to be defined). Ok, maybe, hmmm, how about "xen,static-memory"? I would possibly add "domain" in the name to make clear this is domain memory. Stefano, what do you think? Cheers, -- Julien Grall
RE: [PATCH 03/10] xen/arm: introduce PGC_reserved
Hi julien > -Original Message- > From: Penny Zheng > Sent: Thursday, May 20, 2021 2:20 PM > To: Julien Grall ; xen-devel@lists.xenproject.org; > sstabell...@kernel.org > Cc: Bertrand Marquis ; Wei Chen > ; nd > Subject: RE: [PATCH 03/10] xen/arm: introduce PGC_reserved > > Hi Julien > > > -Original Message- > > From: Julien Grall > > Sent: Thursday, May 20, 2021 3:46 AM > > To: Penny Zheng ; xen-devel@lists.xenproject.org; > > sstabell...@kernel.org > > Cc: Bertrand Marquis ; Wei Chen > > ; nd > > Subject: Re: [PATCH 03/10] xen/arm: introduce PGC_reserved > > > > > > > > On 19/05/2021 04:16, Penny Zheng wrote: > > > Hi Julien > > > > Hi Penny, > > > > > > > >> -Original Message- > > >> From: Julien Grall > > >> Sent: Tuesday, May 18, 2021 5:46 PM > > >> To: Penny Zheng ; > > >> xen-devel@lists.xenproject.org; sstabell...@kernel.org > > >> Cc: Bertrand Marquis ; Wei Chen > > >> ; nd > > >> Subject: Re: [PATCH 03/10] xen/arm: introduce PGC_reserved > > >> > > >> > > >> > > >> On 18/05/2021 06:21, Penny Zheng wrote: > > >>> In order to differentiate pages of static memory, from those > > >>> allocated from heap, this patch introduces a new page flag > > >>> PGC_reserved > > to tell. > > >>> > > >>> New struct reserved in struct page_info is to describe reserved > > >>> page info, that is, which specific domain this page is reserved > > >>> to. > Helper page_get_reserved_owner and page_set_reserved_owner > > >>> are designated to get/set reserved page's owner. > > >>> > > >>> Struct domain is enlarged to more than PAGE_SIZE, due to > > >>> newly-imported struct reserved in struct page_info. > > >> > > >> struct domain may embed a pointer to a struct page_info but never > > >> directly embed the structure. So can you clarify what you mean? > > >> > > >>> > > >>> Signed-off-by: Penny Zheng > > >>> --- > > >>>xen/include/asm-arm/mm.h | 16 +++- > > >>>1 file changed, 15 insertions(+), 1 deletion(-) > > >>> > > >>> diff --git a/xen/include/asm-arm/mm.h b/xen/include/asm-arm/mm.h > > >> index > > >>> 0b7de3102e..d8922fd5db 100644 > > >>> --- a/xen/include/asm-arm/mm.h > > >>> +++ b/xen/include/asm-arm/mm.h > > >>> @@ -88,7 +88,15 @@ struct page_info > > >>> */ > > >>>u32 tlbflush_timestamp; > > >>>}; > > >>> -u64 pad; > > >>> + > > >>> +/* Page is reserved. */ > > >>> +struct { > > >>> +/* > > >>> + * Reserved Owner of this page, > > >>> + * if this page is reserved to a specific domain. > > >>> + */ > > >>> +struct domain *domain; > > >>> +} reserved; > > >> > > >> The space in page_info is quite tight, so I would like to avoid > > >> introducing new fields unless we can't get away from it. > > >> > > >> In this case, it is not clear why we need to differentiate the "Owner" > > >> vs the "Reserved Owner". It might be clearer if this change is > > >> folded in the first user of the field. > > >> > > >> As an aside, for 32-bit Arm, you need to add a 4-byte padding. > > >> > > > > > > Yeah, I may delete this change. I imported this change as > > > considering the functionality of rebooting domain on static allocation. > > > > > > A little more discussion on rebooting domain on static allocation. > > > Considering the major user cases for domain on static allocation are > > > system has a total pre-defined, static behavior all the time. No > > > domain allocation on runtime, while there still exists domain rebooting. > > > > Hmmm... With this series it is still possible to allocate memory at > > runtime outside of the static allocation (see my comment on the design > document [1]). > > So is it meant to be complete? > > > > I'm guessing that the "allocate memory at runtime outside of the static > allocation" is referring to XEN heap on static allocation, that is, users pre- > defining heap in device tree configuration to let the whole system more static > and predictable. > > And I've replied you in the design there, sorry for the late reply. Save your > time, > and I’ll paste here: > > "Right now, on AArch64, all RAM, except reserved memory, will be finally > given to buddy allocator as heap, like you said, guest RAM for normal domain > will be allocated from there, xmalloc eventually is get memory from there, > etc. > So we want to refine the heap here, not iterating through `bootinfo.mem` to > set up XEN heap, but like iterating `bootinfo. reserved_heap` to set up XEN > heap. > > On ARM32, xen heap and domain heap are separately mapped, which is more > complicated here. That's why I only talking about implementing these features > on AArch64 as first step." > > Above implementation will be delivered through another patch Serie. This > patch Serie Is only focusing on Domain on Static Allocation. > Oh, Second thought on this. And I think you are referring to balloon in/out here, hmm, also, like I replied there: "For issues on ballooning out or in, it is not
[libvirt test] 162100: regressions - FAIL
flight 162100 libvirt real [real] http://logs.test-lab.xenproject.org/osstest/logs/162100/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: build-armhf-libvirt 6 libvirt-buildfail REGR. vs. 151777 build-amd64-libvirt 6 libvirt-buildfail REGR. vs. 151777 build-i386-libvirt6 libvirt-buildfail REGR. vs. 151777 build-arm64-libvirt 6 libvirt-buildfail REGR. vs. 151777 Tests which did not succeed, but are not blocking: test-amd64-amd64-libvirt 1 build-check(1) blocked n/a test-amd64-amd64-libvirt-pair 1 build-check(1) blocked n/a test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked n/a test-amd64-amd64-libvirt-vhd 1 build-check(1) blocked n/a test-amd64-amd64-libvirt-xsm 1 build-check(1) blocked n/a test-amd64-i386-libvirt 1 build-check(1) blocked n/a test-amd64-i386-libvirt-pair 1 build-check(1) blocked n/a test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked n/a test-amd64-i386-libvirt-xsm 1 build-check(1) blocked n/a test-arm64-arm64-libvirt 1 build-check(1) blocked n/a test-arm64-arm64-libvirt-qcow2 1 build-check(1) blocked n/a test-arm64-arm64-libvirt-xsm 1 build-check(1) blocked n/a test-armhf-armhf-libvirt 1 build-check(1) blocked n/a test-armhf-armhf-libvirt-raw 1 build-check(1) blocked n/a version targeted for testing: libvirt 0ad0204ce7f7b512ee349dfbf5cdd751ab0adc1c baseline version: libvirt 2c846fa6bcc11929c9fb857a22430fb9945654ad Last test of basis 151777 2020-07-10 04:19:19 Z 314 days Failing since151818 2020-07-11 04:18:52 Z 313 days 306 attempts Testing same since 162100 2021-05-20 04:20:01 Z0 days1 attempts People who touched revisions under test: Adolfo Jayme Barrientos Aleksandr Alekseev Aleksei Zakharov Andika Triwidada Andrea Bolognani Balázs Meskó Barrett Schonefeld Bastian Germann Bastien Orivel BiaoXiang Ye Bihong Yu Binfeng Wu Boris Fiuczynski Brian Turek Bruno Haible Chris Mayo Christian Ehrhardt Christian Schoenebeck Cole Robinson Collin Walling Cornelia Huck Cédric Bosdonnat Côme Borsoi Daniel Henrique Barboza Daniel Letai Daniel P. Berrange Daniel P. Berrangé Dmytro Linkin Eiichi Tsukata Eric Farman Erik Skultety Fabian Affolter Fabian Freyer Fangge Jin Farhan Ali Fedora Weblate Translation gongwei Guoyi Tu Göran Uddeborg Halil Pasic Han Han Hao Wang Hela Basa Helmut Grohne Ian Wienand Jakob Meng Jamie Strandboge Jamie Strandboge Jan Kuparinen Jean-Baptiste Holcroft Jianan Gao Jim Fehlig Jin Yan Jiri Denemark John Ferlan Jonathan Watt Jonathon Jongsma Julio Faracco Ján Tomko Kashyap Chamarthy Kevin Locke Kristina Hanicova Laine Stump Laszlo Ersek Liao Pingfang Lin Ma Lin Ma Lin Ma Luke Yue Luyao Zhong Marc Hartmayer Marc-André Lureau Marek Marczykowski-Górecki Markus Schade Martin Kletzander Masayoshi Mizuma Matt Coleman Matt Coleman Mauro Matteo Cascella Meina Li Michal Privoznik Michał Smyk Milo Casagrande Moshe Levi Muha Aliss Neal Gompa Nick Shyrokovskiy Nickys Music Group Nico Pache Nikolay Shirokovskiy Olaf Hering Olesya Gerasimenko Orion Poplawski Pany Patrick Magauran Paulo de Rezende Pinatti Pavel Hrdina Peng Liang Peter Krempa Pino Toscano Pino Toscano Piotr Drąg Prathamesh Chavan Ricky Tigg Roman Bogorodskiy Roman Bolshakov Ryan Gahagan Ryan Schmidt Sam Hartman Scott Shambarger Sebastian Mitterle SeongHyun Jo Shalini Chellathurai Saroja Shaojun Yang Shi Lei simmon Simon Gaiser Stefan Bader Stefan Berger Stefan Berger Szymon Scholz Thomas Huth Tim Wiederhake Tomáš Golembiovský Tomáš Janoušek Tuguoyi Ville Skyttä Wang Xin WangJian Weblate Yalei Li <274268...@qq.com> Yalei Li Yang Hang Yanqiu Zhang Yaroslav Kargin Yi Li Yi Wang Yuri Chornoivan Zheng Chuan zhenwei pi Zhenyu Zheng jobs: build-amd64-xsm pass build-arm64-xsm pass build-i386-xsm pass build-amd64 pass build-arm64 pass build-armhf pass build-i386
Re: regression in recent pvops kernels, dom0 crashes early
Am Thu, 20 May 2021 09:03:34 +0200 schrieb Jan Beulich : > Just to be sure - you did not need the other patch that I said I suspect > is needed as a prereq? Yes, I needed just this single patch which moves x86_configure_nx up. Olaf pgpmf2cPPAnBx.pgp Description: Digitale Signatur von OpenPGP
Re: [PATCH v2 1/2] xen-pciback: redo VF placement in the virtual topology
On 20.05.2021 02:36, Boris Ostrovsky wrote: > > On 5/18/21 12:13 PM, Jan Beulich wrote: >> >> @@ -95,22 +95,25 @@ static int __xen_pcibk_add_pci_dev(struc >> >> /* >> * Keep multi-function devices together on the virtual PCI bus, except >> - * virtual functions. >> + * that we want to keep virtual functions at func 0 on their own. They >> + * aren't multi-function devices and hence their presence at func 0 >> + * may cause guests to not scan the other functions. > > > So your reading of the original commit is that whatever the issue it was, > only function zero was causing the problem? In other words, you are not > concerned that pci_scan_slot() may now look at function 1 and skip all > higher-numbered function (assuming the problem is still there)? I'm not sure I understand the question: Whether to look at higher numbered slots is a function of slot 0's multi-function bit alone, aiui. IOW if slot 1 is being looked at in the first place, slots 2-7 should also be looked at. Jan
[xen-unstable test] 162095: tolerable FAIL - PUSHED
flight 162095 xen-unstable real [real] flight 162101 xen-unstable real-retest [real] http://logs.test-lab.xenproject.org/osstest/logs/162095/ http://logs.test-lab.xenproject.org/osstest/logs/162101/ Failures :-/ but no regressions. Tests which are failing intermittently (not blocking): test-armhf-armhf-xl 8 xen-bootfail pass in 162101-retest Tests which did not succeed, but are not blocking: test-armhf-armhf-xl 15 migrate-support-check fail in 162101 never pass test-armhf-armhf-xl 16 saverestore-support-check fail in 162101 never pass test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 162078 test-armhf-armhf-libvirt 16 saverestore-support-checkfail like 162078 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 162078 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 162078 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 162078 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 162078 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 162078 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 162078 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail like 162078 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 162078 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 162078 test-amd64-i386-xl-pvshim14 guest-start fail never pass test-amd64-i386-libvirt-xsm 15 migrate-support-checkfail never pass test-amd64-amd64-libvirt 15 migrate-support-checkfail never pass test-amd64-i386-libvirt 15 migrate-support-checkfail never pass test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-seattle 15 migrate-support-checkfail never pass test-arm64-arm64-xl-seattle 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit1 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit1 16 saverestore-support-checkfail never pass test-arm64-arm64-xl 15 migrate-support-checkfail never pass test-arm64-arm64-xl 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit2 15 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail never pass test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-arm64-arm64-xl-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 15 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit1 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit1 16 saverestore-support-checkfail never pass test-armhf-armhf-libvirt 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-rtds 15 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-vhd 14 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 15 saverestore-support-checkfail never pass test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail never pass version targeted for testing: xen 935abe1cc463917c697c1451ec8d313a5d75f7de baseline version: xen caa9c4471d1d74b2d236467aaf7e63a806ac11a4 Last test of basis 162078 2021-05-19 05:04:34 Z1 days Testing same since 162095 2021-05-19 16:38:54 Z0 days1 attempts People who touched revisions under test: Andrew Cooper Julien Grall Wei Liu jobs:
Re: [PATCH 03/10] xen/arm: introduce PGC_reserved
On 19.05.2021 21:49, Julien Grall wrote: > On 19/05/2021 10:49, Jan Beulich wrote: >> On 19.05.2021 05:16, Penny Zheng wrote: From: Julien Grall Sent: Tuesday, May 18, 2021 5:46 PM On 18/05/2021 06:21, Penny Zheng wrote: > --- a/xen/include/asm-arm/mm.h > +++ b/xen/include/asm-arm/mm.h > @@ -88,7 +88,15 @@ struct page_info > */ >u32 tlbflush_timestamp; >}; > -u64 pad; > + > +/* Page is reserved. */ > +struct { > +/* > + * Reserved Owner of this page, > + * if this page is reserved to a specific domain. > + */ > +struct domain *domain; > +} reserved; The space in page_info is quite tight, so I would like to avoid introducing new fields unless we can't get away from it. In this case, it is not clear why we need to differentiate the "Owner" vs the "Reserved Owner". It might be clearer if this change is folded in the first user of the field. As an aside, for 32-bit Arm, you need to add a 4-byte padding. >>> >>> Yeah, I may delete this change. I imported this change as considering the >>> functionality >>> of rebooting domain on static allocation. >>> >>> A little more discussion on rebooting domain on static allocation. >>> Considering the major user cases for domain on static allocation >>> are system has a total pre-defined, static behavior all the time. No domain >>> allocation >>> on runtime, while there still exists domain rebooting. >>> >>> And when rebooting domain on static allocation, all these reserved pages >>> could >>> not go back to heap when freeing them. So I am considering to use one >>> global >>> `struct page_info*[DOMID]` value to store. >> >> Except such a separate array will consume quite a bit of space for >> no real gain: v.free has 32 bits of padding space right now on >> Arm64, so there's room for a domid_t there already. Even on Arm32 >> this could be arranged for, as I doubt "order" needs to be 32 bits >> wide. > > I agree we shouldn't need 32-bit to cover the "order". Although, I would > like to see any user reading the field before introducing it. Of course, but I thought the plan was to mark static pages with their designated domid, which would happen prior to domain creation. The reader of the field would then naturally appear during domain creation. Jan
Re: regression in recent pvops kernels, dom0 crashes early
On 19.05.2021 20:42, Olaf Hering wrote: > Am Mon, 17 May 2021 12:54:02 +0200 > schrieb Jan Beulich : > >> x86/Xen: swap NX determination and GDT setup on BSP >> >> xen_setup_gdt(), via xen_load_gdt_boot(), wants to adjust page tables. >> For this to work when NX is not available, x86_configure_nx() needs to >> be called first. > > > Thanks. I tried this patch on-top of the SLE15-SP3 kernel branch. > Without the patch booting fails as reported. > With the patch the dom0 starts as expected. Just to be sure - you did not need the other patch that I said I suspect is needed as a prereq? Jan
Re: [PATCH v7 01/15] swiotlb: Refactor swiotlb init functions
On Thu, May 20, 2021 at 2:50 AM Florian Fainelli wrote: > > > > On 5/17/2021 11:42 PM, Claire Chang wrote: > > Add a new function, swiotlb_init_io_tlb_mem, for the io_tlb_mem struct > > initialization to make the code reusable. > > > > Note that we now also call set_memory_decrypted in swiotlb_init_with_tbl. > > > > Signed-off-by: Claire Chang > > --- > > kernel/dma/swiotlb.c | 51 ++-- > > 1 file changed, 25 insertions(+), 26 deletions(-) > > > > diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c > > index 8ca7d505d61c..d3232fc19385 100644 > > --- a/kernel/dma/swiotlb.c > > +++ b/kernel/dma/swiotlb.c > > @@ -168,9 +168,30 @@ void __init swiotlb_update_mem_attributes(void) > > memset(vaddr, 0, bytes); > > } > > > > -int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int > > verbose) > > +static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t > > start, > > + unsigned long nslabs, bool late_alloc) > > { > > + void *vaddr = phys_to_virt(start); > > unsigned long bytes = nslabs << IO_TLB_SHIFT, i; > > + > > + mem->nslabs = nslabs; > > + mem->start = start; > > + mem->end = mem->start + bytes; > > + mem->index = 0; > > + mem->late_alloc = late_alloc; > > + spin_lock_init(>lock); > > + for (i = 0; i < mem->nslabs; i++) { > > + mem->slots[i].list = IO_TLB_SEGSIZE - io_tlb_offset(i); > > + mem->slots[i].orig_addr = INVALID_PHYS_ADDR; > > + mem->slots[i].alloc_size = 0; > > + } > > + > > + set_memory_decrypted((unsigned long)vaddr, bytes >> PAGE_SHIFT); > > + memset(vaddr, 0, bytes); > > You are doing an unconditional set_memory_decrypted() followed by a > memset here, and then: > > > +} > > + > > +int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int > > verbose) > > +{ > > struct io_tlb_mem *mem; > > size_t alloc_size; > > > > @@ -186,16 +207,8 @@ int __init swiotlb_init_with_tbl(char *tlb, unsigned > > long nslabs, int verbose) > > if (!mem) > > panic("%s: Failed to allocate %zu bytes align=0x%lx\n", > > __func__, alloc_size, PAGE_SIZE); > > - mem->nslabs = nslabs; > > - mem->start = __pa(tlb); > > - mem->end = mem->start + bytes; > > - mem->index = 0; > > - spin_lock_init(>lock); > > - for (i = 0; i < mem->nslabs; i++) { > > - mem->slots[i].list = IO_TLB_SEGSIZE - io_tlb_offset(i); > > - mem->slots[i].orig_addr = INVALID_PHYS_ADDR; > > - mem->slots[i].alloc_size = 0; > > - } > > + > > + swiotlb_init_io_tlb_mem(mem, __pa(tlb), nslabs, false); > > You convert this call site with swiotlb_init_io_tlb_mem() which did not > do the set_memory_decrypted()+memset(). Is this okay or should > swiotlb_init_io_tlb_mem() add an additional argument to do this > conditionally? I'm actually not sure if this it okay. If not, will add an additional argument for it. > -- > Florian
Re: Preserving transactions accross Xenstored Live-Update
On 19.05.21 19:10, Julien Grall wrote: Hi Juergen, On 19/05/2021 13:50, Juergen Gross wrote: On 19.05.21 14:33, Julien Grall wrote: On 19/05/2021 13:32, Julien Grall wrote: Hi Juergen, On 19/05/2021 10:09, Juergen Gross wrote: On 18.05.21 20:11, Julien Grall wrote: I have started to look at preserving transaction accross Live-update in C Xenstored. So far, I managed to transfer transaction that read/write existing nodes. Now, I am running into trouble to transfer new/deleted node within a transaction with the existing migration format. C Xenstored will keep track of nodes accessed during the transaction but not the children (AFAICT for performance reason). Not performance reasons, but because there isn't any need for that: The children are either unchanged (so the non-transaction node records apply), or they will be among the tracked nodes (transaction node records apply). So in both cases all children should be known. In theory, opening a new transaction means you will not see any modification in the global database until the transaction has been committed. What you describe would break that because a client would be able to see new nodes added outside of the transaction. However, C Xenstored implements neither of the two. Currently, when a node is accessed within the transaction, we will also store the names of the current children. To give an example with access to the global DB (prefixed with TID0) and within a transaction (TID1) 1) TID0: MKDIR "data/bar" 2) Start transaction TID1 3) TID1: DIRECTORY "data" -> This will cache thenode data 4) TID0: MKDIR "data/foo" -> This will create "foo" in the global database 5) TID1: MKDIR "data/fish" -> This will create "fish" inthe transaction 5) TID1: DIRECTORY "data" -> This will only return "bar" and "fish" If we Live-Update between 4) and 5). Then we should make sure that "bar" cannot be seen in the listing by TID1. I meant "foo" here. Sorry for the confusion. Therefore, I don't think we can restore the children using the global node here. Instead we need to find a way to transfer the list of known children within the transaction. As a fun fact, C Xenstored implements weirdly the transaction, so TID1 will be able to access "bar" if it knows the name but not list it. And this is the basic problem, I think. C Xenstored should be repaired by adding all (remaining) children of a node into the TID's database when the list of children is modified either globally or in a transaction. A child having been added globally needs to be added as "deleted" into the TID's database. IIUC, for every modifications in the global database, we would need to walk every single transactions and check whether a parent was accessed. Am I correct? Not really. When a node is being read during a transaction and it is found in the global data base only, its gen-count can be tested for being older or newer than the transaction start. If it is newer we can traverse the path up to "/" and treat each parent the same way (so if a parent is found in the transaction data base, the presence of the child can be verified, and if it is global only, the gen-count can be tested against the transaction again). If so, I don't think this is a workable solution because of the cost to execute a single command. My variant will affect the transaction internal reads only, and the additional cost will be limited by the distance of the read node from the root node. Is it something you plan to address differently with your rework of the DB? Yes. I want to have the transaction specific variants of nodes linked to the global ones, which solves this problem in an easy way. Juergen OpenPGP_0xB0DE9DD628BF132F.asc Description: OpenPGP public key OpenPGP_signature Description: OpenPGP digital signature
Re: [PATCH v7 04/15] swiotlb: Add restricted DMA pool initialization
On Thu, May 20, 2021 at 2:54 AM Florian Fainelli wrote: > > > > On 5/17/2021 11:42 PM, Claire Chang wrote: > > Add the initialization function to create restricted DMA pools from > > matching reserved-memory nodes. > > > > Signed-off-by: Claire Chang > > --- > > include/linux/device.h | 4 +++ > > include/linux/swiotlb.h | 3 +- > > kernel/dma/swiotlb.c| 76 + > > 3 files changed, 82 insertions(+), 1 deletion(-) > > > > diff --git a/include/linux/device.h b/include/linux/device.h > > index 38a2071cf776..4987608ea4ff 100644 > > --- a/include/linux/device.h > > +++ b/include/linux/device.h > > @@ -416,6 +416,7 @@ struct dev_links_info { > > * @dma_pools: Dma pools (if dma'ble device). > > * @dma_mem: Internal for coherent mem override. > > * @cma_area:Contiguous memory area for dma allocations > > + * @dma_io_tlb_mem: Internal for swiotlb io_tlb_mem override. > > * @archdata:For arch-specific additions. > > * @of_node: Associated device tree node. > > * @fwnode: Associated device node supplied by platform firmware. > > @@ -521,6 +522,9 @@ struct device { > > #ifdef CONFIG_DMA_CMA > > struct cma *cma_area; /* contiguous memory area for dma > > allocations */ > > +#endif > > +#ifdef CONFIG_DMA_RESTRICTED_POOL > > + struct io_tlb_mem *dma_io_tlb_mem; > > #endif > > /* arch specific additions */ > > struct dev_archdata archdata; > > diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h > > index 216854a5e513..03ad6e3b4056 100644 > > --- a/include/linux/swiotlb.h > > +++ b/include/linux/swiotlb.h > > @@ -72,7 +72,8 @@ extern enum swiotlb_force swiotlb_force; > > * range check to see if the memory was in fact allocated by this > > * API. > > * @nslabs: The number of IO TLB blocks (in groups of 64) between @start > > and > > - * @end. This is command line adjustable via setup_io_tlb_npages. > > + * @end. For default swiotlb, this is command line adjustable via > > + * setup_io_tlb_npages. > > * @used:The number of used IO TLB block. > > * @list:The free list describing the number of free entries available > > * from each index. > > diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c > > index b849b01a446f..1d8eb4de0d01 100644 > > --- a/kernel/dma/swiotlb.c > > +++ b/kernel/dma/swiotlb.c > > @@ -39,6 +39,13 @@ > > #ifdef CONFIG_DEBUG_FS > > #include > > #endif > > +#ifdef CONFIG_DMA_RESTRICTED_POOL > > +#include > > +#include > > +#include > > +#include > > +#include > > +#endif > > > > #include > > #include > > @@ -690,3 +697,72 @@ static int __init swiotlb_create_default_debugfs(void) > > late_initcall(swiotlb_create_default_debugfs); > > > > #endif > > + > > +#ifdef CONFIG_DMA_RESTRICTED_POOL > > +static int rmem_swiotlb_device_init(struct reserved_mem *rmem, > > + struct device *dev) > > +{ > > + struct io_tlb_mem *mem = rmem->priv; > > + unsigned long nslabs = rmem->size >> IO_TLB_SHIFT; > > + > > + if (dev->dma_io_tlb_mem) > > + return 0; > > + > > + /* > > + * Since multiple devices can share the same pool, the private data, > > + * io_tlb_mem struct, will be initialized by the first device attached > > + * to it. > > + */ > > + if (!mem) { > > + mem = kzalloc(struct_size(mem, slots, nslabs), GFP_KERNEL); > > + if (!mem) > > + return -ENOMEM; > > + > > + if (PageHighMem(pfn_to_page(PHYS_PFN(rmem->base { > > + kfree(mem); > > + return -EINVAL; > > This could probably deserve a warning here to indicate that the reserved > area must be accessible within the linear mapping as I would expect a > lot of people to trip over that. Ok. Will add it. > > Reviewed-by: Florian Fainelli > -- > Florian
RE: [PATCH 10/10] xen/arm: introduce allocate_static_memory
Hi Julien > -Original Message- > From: Julien Grall > Sent: Thursday, May 20, 2021 4:10 AM > To: Penny Zheng ; xen-devel@lists.xenproject.org; > sstabell...@kernel.org > Cc: Bertrand Marquis ; Wei Chen > ; nd > Subject: Re: [PATCH 10/10] xen/arm: introduce allocate_static_memory > > > > On 19/05/2021 08:27, Penny Zheng wrote: > > Hi Julien > > > >> -Original Message- > >> From: Julien Grall > >> Sent: Tuesday, May 18, 2021 8:06 PM > >> To: Penny Zheng ; > >> xen-devel@lists.xenproject.org; sstabell...@kernel.org > >> Cc: Bertrand Marquis ; Wei Chen > >> ; nd > >> Subject: Re: [PATCH 10/10] xen/arm: introduce allocate_static_memory > >> > >> Hi Penny, > >> > >> On 18/05/2021 06:21, Penny Zheng wrote: > >>> This commit introduces allocate_static_memory to allocate static > >>> memory as guest RAM for domain on Static Allocation. > >>> > >>> It uses alloc_domstatic_pages to allocate pre-defined static memory > >>> banks for this domain, and uses guest_physmap_add_page to set up P2M > >>> table, guest starting at fixed GUEST_RAM0_BASE, GUEST_RAM1_BASE. > >>> > >>> Signed-off-by: Penny Zheng > >>> --- > >>>xen/arch/arm/domain_build.c | 157 > >> +++- > >>>1 file changed, 155 insertions(+), 2 deletions(-) > >>> > >>> diff --git a/xen/arch/arm/domain_build.c > >>> b/xen/arch/arm/domain_build.c index 30b55588b7..9f662313ad 100644 > >>> --- a/xen/arch/arm/domain_build.c > >>> +++ b/xen/arch/arm/domain_build.c > >>> @@ -437,6 +437,50 @@ static bool __init allocate_bank_memory(struct > >> domain *d, > >>>return true; > >>>} > >>> > >>> +/* > >>> + * #ram_index and #ram_index refer to the index and starting > >>> +address of guest > >>> + * memory kank stored in kinfo->mem. > >>> + * Static memory at #smfn of #tot_size shall be mapped #sgfn, and > >>> + * #sgfn will be next guest address to map when returning. > >>> + */ > >>> +static bool __init allocate_static_bank_memory(struct domain *d, > >>> + struct kernel_info *kinfo, > >>> + int ram_index, > >> > >> Please use unsigned. > >> > >>> + paddr_t ram_addr, > >>> + gfn_t* sgfn, > >> > >> I am confused, what is the difference between ram_addr and sgfn? > >> > > > > We need to constructing kinfo->mem(guest RAM banks) here, and we are > > indexing in static_mem(physical ram banks). Multiple physical ram > > banks consist of one guest ram bank(like, GUEST_RAM0). > > > > ram_addr here will either be GUEST_RAM0_BASE, or GUEST_RAM1_BASE, > for > > now. I kinds struggled in how to name it. And maybe it shall not be a > > parameter here. > > > > Maybe I should switch.. case.. on the ram_index, if its 0, its > > GUEST_RAM0_BASE, And if its 1, its GUEST_RAM1_BASE. > > You only need to set kinfo->mem.bank[ram_index].start once. This is when > you know the bank is first used. > > AFAICT, this function will map the memory for a range start at ``sgfn``. > It doesn't feel this belongs to the function. > > The same remark is valid for kinfo->mem.nr_banks. > Ok. I finally totally understand what you suggest here. I'll try to let the action related to setting kinfo->mem.bank[ram_index].start/ kinfo->mem.bank[ram_index].size/ kinfo->mem. nr_banks out of this function, and only keep the simple functionality of mapping the memory for a range start at ``sgfn``. > >>> + mfn_t smfn, > >>> + paddr_t tot_size) { > >>> +int res; > >>> +struct membank *bank; > >>> +paddr_t _size = tot_size; > >>> + > >>> +bank = >mem.bank[ram_index]; > >>> +bank->start = ram_addr; > >>> +bank->size = bank->size + tot_size; > >>> + > >>> +while ( tot_size > 0 ) > >>> +{ > >>> +unsigned int order = get_allocation_size(tot_size); > >>> + > >>> +res = guest_physmap_add_page(d, *sgfn, smfn, order); > >>> +if ( res ) > >>> +{ > >>> +dprintk(XENLOG_ERR, "Failed map pages to DOMU: %d", res); > >>> +return false; > >>> +} > >>> + > >>> +*sgfn = gfn_add(*sgfn, 1UL << order); > >>> +smfn = mfn_add(smfn, 1UL << order); > >>> +tot_size -= (1ULL << (PAGE_SHIFT + order)); > >>> +} > >>> + > >>> +kinfo->mem.nr_banks = ram_index + 1; > >>> +kinfo->unassigned_mem -= _size; > >>> + > >>> +return true; > >>> +} > >>> + > >>>static void __init allocate_memory(struct domain *d, struct > >>> kernel_info > >> *kinfo) > >>>{ > >>>unsigned int i; > >>> @@ -480,6 +524,116 @@ fail: > >>> (unsigned long)kinfo->unassigned_mem >> 10); > >>>} > >>> > >>> +/* Allocate memory from static memory as RAM for one specific > >>> +domain d. */ static void __init allocate_static_memory(struct domain *d, > >>> +
RE: [PATCH 03/10] xen/arm: introduce PGC_reserved
Hi Julien > -Original Message- > From: Julien Grall > Sent: Thursday, May 20, 2021 3:46 AM > To: Penny Zheng ; xen-devel@lists.xenproject.org; > sstabell...@kernel.org > Cc: Bertrand Marquis ; Wei Chen > ; nd > Subject: Re: [PATCH 03/10] xen/arm: introduce PGC_reserved > > > > On 19/05/2021 04:16, Penny Zheng wrote: > > Hi Julien > > Hi Penny, > > > > >> -Original Message- > >> From: Julien Grall > >> Sent: Tuesday, May 18, 2021 5:46 PM > >> To: Penny Zheng ; > >> xen-devel@lists.xenproject.org; sstabell...@kernel.org > >> Cc: Bertrand Marquis ; Wei Chen > >> ; nd > >> Subject: Re: [PATCH 03/10] xen/arm: introduce PGC_reserved > >> > >> > >> > >> On 18/05/2021 06:21, Penny Zheng wrote: > >>> In order to differentiate pages of static memory, from those > >>> allocated from heap, this patch introduces a new page flag PGC_reserved > to tell. > >>> > >>> New struct reserved in struct page_info is to describe reserved page > >>> info, that is, which specific domain this page is reserved to. > > >>> Helper page_get_reserved_owner and page_set_reserved_owner are > >>> designated to get/set reserved page's owner. > >>> > >>> Struct domain is enlarged to more than PAGE_SIZE, due to > >>> newly-imported struct reserved in struct page_info. > >> > >> struct domain may embed a pointer to a struct page_info but never > >> directly embed the structure. So can you clarify what you mean? > >> > >>> > >>> Signed-off-by: Penny Zheng > >>> --- > >>>xen/include/asm-arm/mm.h | 16 +++- > >>>1 file changed, 15 insertions(+), 1 deletion(-) > >>> > >>> diff --git a/xen/include/asm-arm/mm.h b/xen/include/asm-arm/mm.h > >> index > >>> 0b7de3102e..d8922fd5db 100644 > >>> --- a/xen/include/asm-arm/mm.h > >>> +++ b/xen/include/asm-arm/mm.h > >>> @@ -88,7 +88,15 @@ struct page_info > >>> */ > >>>u32 tlbflush_timestamp; > >>>}; > >>> -u64 pad; > >>> + > >>> +/* Page is reserved. */ > >>> +struct { > >>> +/* > >>> + * Reserved Owner of this page, > >>> + * if this page is reserved to a specific domain. > >>> + */ > >>> +struct domain *domain; > >>> +} reserved; > >> > >> The space in page_info is quite tight, so I would like to avoid > >> introducing new fields unless we can't get away from it. > >> > >> In this case, it is not clear why we need to differentiate the "Owner" > >> vs the "Reserved Owner". It might be clearer if this change is folded > >> in the first user of the field. > >> > >> As an aside, for 32-bit Arm, you need to add a 4-byte padding. > >> > > > > Yeah, I may delete this change. I imported this change as considering > > the functionality of rebooting domain on static allocation. > > > > A little more discussion on rebooting domain on static allocation. > > Considering the major user cases for domain on static allocation are > > system has a total pre-defined, static behavior all the time. No > > domain allocation on runtime, while there still exists domain rebooting. > > Hmmm... With this series it is still possible to allocate memory at runtime > outside of the static allocation (see my comment on the design document [1]). > So is it meant to be complete? > I'm guessing that the "allocate memory at runtime outside of the static allocation" is referring to XEN heap on static allocation, that is, users pre-defining heap in device tree configuration to let the whole system more static and predictable. And I've replied you in the design there, sorry for the late reply. Save your time, and I’ll paste here: "Right now, on AArch64, all RAM, except reserved memory, will be finally given to buddy allocator as heap, like you said, guest RAM for normal domain will be allocated from there, xmalloc eventually is get memory from there, etc. So we want to refine the heap here, not iterating through `bootinfo.mem` to set up XEN heap, but like iterating `bootinfo. reserved_heap` to set up XEN heap. On ARM32, xen heap and domain heap are separately mapped, which is more complicated here. That's why I only talking about implementing these features on AArch64 as first step." Above implementation will be delivered through another patch Serie. This patch Serie Is only focusing on Domain on Static Allocation. > > > > And when rebooting domain on static allocation, all these reserved > > pages could not go back to heap when freeing them. So I am > > considering to use one global `struct page_info*[DOMID]` value to store. > > > > As Jan suggested, when domain get rebooted, struct domain will not exist > anymore. > > But I think DOMID info could last. > You would need to make sure the domid to be reserved. But I am not entirely > convinced this is necessary here. > > When recreating the domain, you need a way to know its configuration. > Mostly likely this will come from the Device-Tree. At which point, you can > also > find the static region from there. > True, true. I'll dig more
RE: [PATCH 01/10] xen/arm: introduce domain on Static Allocation
Hi Julien > -Original Message- > From: Julien Grall > Sent: Thursday, May 20, 2021 2:27 AM > To: Penny Zheng ; xen-devel@lists.xenproject.org; > sstabell...@kernel.org > Cc: Bertrand Marquis ; Wei Chen > ; nd > Subject: Re: [PATCH 01/10] xen/arm: introduce domain on Static Allocation > > On 19/05/2021 03:22, Penny Zheng wrote: > > Hi Julien > > Hi Penny, > > >> -Original Message- > >> From: Julien Grall > >> Sent: Tuesday, May 18, 2021 4:58 PM > >> To: Penny Zheng ; > >> xen-devel@lists.xenproject.org; sstabell...@kernel.org > >> Cc: Bertrand Marquis ; Wei Chen > >> ; nd > >> Subject: Re: [PATCH 01/10] xen/arm: introduce domain on Static > >> Allocation > >>> +beginning, shall never go to heap allocator or boot allocator for any > >>> use. > >>> + > >>> +Domains on Static Allocation is supported through device tree > >>> +property `xen,static-mem` specifying reserved RAM banks as this > >>> +domain's > >> guest RAM. > >> > >> I would suggest to use "physical RAM" when you refer to the host memory. > >> > >>> +By default, they shall be mapped to the fixed guest RAM address > >>> +`GUEST_RAM0_BASE`, `GUEST_RAM1_BASE`. > >> > >> There are a few bits that needs to clarified or part of the description: > >> 1) "By default" suggests there is an alternative possibility. > >> However, I don't see any. > >> 2) Will the first region of xen,static-mem be mapped to > >> GUEST_RAM0_BASE and the second to GUEST_RAM1_BASE? What if a third > region is specificed? > >> 3) We don't guarantee the base address and the size of the banks. > >> Wouldn't it be better to let the admin select the region he/she wants? > >> 4) How do you determine the number of cells for the address and the > >> size? > >> > > > > The specific implementation on this part could be traced to the last > > commit > > https://patchew.org/Xen/20210518052113.725808-1- > penny.zh...@arm.com/20 > > 210518052113.725808-11-penny.zh...@arm.com/ > > Right. My point is an admin should not have to read the code in order to > figure > out how the allocation works. > > > > > It will exhaust the GUEST_RAM0_SIZE, then seek to the GUEST_RAM1_BASE. > > GUEST_RAM0 may take up several regions. > > Can this be clarified in the commit message. > Ok, I will expand commit to let it be clarified more clearly. > > Yes, I may add the 1:1 direct-map scenario here to explain the > > alternative possibility > > Ok. So I would suggest to remove "by default" until the implementation for > direct-map is added. > Ok, will do. Thx. > > For the third point, are you suggesting that we could provide an > > option that let user also define guest memory base address/size? > > When reading the document, I originally thought that the first region would be > mapped to bank1, and then the second region mapped to bank2... > > But from your reply, this is not the expected behavior. Therefore, please > ignore > my suggestion here. > > > I'm confused on the fourth point, you mean the address cell and size cell > > for > xen,static-mem = <...>? > > Yes. This should be clarified in the document. See more below why? > > > It will be consistent with the ones defined in the parent node, domUx. > Hmmm... To take the example you provided, the parent would be chosen. > However, from the example, I would expect the property #{address, size}-cells > in domU1 to be used. What did I miss? > Yeah, the property #{address, size}-cells in domU1 will be used. And the parent node will be domU1. The dtb property should look like as follows: chosen { domU1 { compatible = "xen,domain"; #address-cells = <0x2>; #size-cells = <0x2>; cpus = <2>; xen,static-mem = <0x0 0x3000 0x0 0x2000>; ... }; }; > > +DOMU1 on Static Allocation has reserved RAM bank at 0x3000 of 512MB > > size > >>> +Static Allocation is only supported on AArch64 for now. > >> > >> The code doesn't seem to be AArch64 specific. So why can't this be > >> used for 32-bit Arm? > >> > > > > True, we have plans to make it also workable in AArch32 in the future. > > Because we considered XEN on cortex-R52. > > All the code seems to be implemented in arm generic code. So isn't it already > working? > > >>>static int __init early_scan_node(const void *fdt, > >>> int node, const char *name, int > >>> depth, > >>> u32 address_cells, u32 > >>> size_cells, @@ -345,6 +394,9 @@ static int __init early_scan_node(const > void *fdt, > >>>process_multiboot_node(fdt, node, name, address_cells, > >>> size_cells); > >>>else if ( depth == 1 && device_tree_node_matches(fdt, node, > "chosen") ) > >>>process_chosen_node(fdt, node, name, address_cells, > >>> size_cells); > >>> +else if ( depth == 2 && fdt_get_property(fdt, node, > >>> + "xen,static-mem",