Re: [PATCH v4 2/2] xen: delay xen_hvm_init_time_ops() if kdump is boot on vcpu>=32
On 3/2/22 11:40 AM, Dongli Zhang wrote: The sched_clock() can be used very early since commit 857baa87b642 ("sched/clock: Enable sched clock early"). In addition, with commit 38669ba205d1 ("x86/xen/time: Output xen sched_clock time from 0"), kdump kernel in Xen HVM guest may panic at very early stage when accessing &__this_cpu_read(xen_vcpu)->time as in below: setup_arch() -> init_hypervisor_platform() -> x86_init.hyper.init_platform = xen_hvm_guest_init() -> xen_hvm_init_time_ops() -> xen_clocksource_read() -> src = &__this_cpu_read(xen_vcpu)->time; This is because Xen HVM supports at most MAX_VIRT_CPUS=32 'vcpu_info' embedded inside 'shared_info' during early stage until xen_vcpu_setup() is used to allocate/relocate 'vcpu_info' for boot cpu at arbitrary address. However, when Xen HVM guest panic on vcpu >= 32, since xen_vcpu_info_reset(0) would set per_cpu(xen_vcpu, cpu) = NULL when vcpu >= 32, xen_clocksource_read() on vcpu >= 32 would panic. This patch calls xen_hvm_init_time_ops() again later in xen_hvm_smp_prepare_boot_cpu() after the 'vcpu_info' for boot vcpu is registered when the boot vcpu is >= 32. This issue can be reproduced on purpose via below command at the guest side when kdump/kexec is enabled: "taskset -c 33 echo c > /proc/sysrq-trigger" The bugfix for PVM is not implemented due to the lack of testing environment. Cc: Joe Jin Signed-off-by: Dongli Zhang Reviewed-by: Boris Ostrovsky Applied to for-linus-5.18 (with return path adjusted)
Re: [PATCH v4 2/2] xen: delay xen_hvm_init_time_ops() if kdump is boot on vcpu>=32
Hi Boris, On 3/2/22 6:11 PM, Boris Ostrovsky wrote: > > On 3/2/22 7:31 PM, Dongli Zhang wrote: >> Hi Boris, >> >> On 3/2/22 4:20 PM, Boris Ostrovsky wrote: >>> On 3/2/22 11:40 AM, Dongli Zhang wrote: void __init xen_hvm_init_time_ops(void) { + static bool hvm_time_initialized; + + if (hvm_time_initialized) + return; + /* * vector callback is needed otherwise we cannot receive interrupts * on cpu > 0 and at this point we don't know how many cpus are * available. */ if (!xen_have_vector_callback) - return; + goto exit; >>> >>> Why not just return? Do we expect the value of xen_have_vector_callback to >>> change? >> I just want to keep above sync with >> >>> >>> -boris >>> >>> if (!xen_feature(XENFEAT_hvm_safe_pvclock)) { pr_info("Xen doesn't support pvclock on HVM, disable pv timer"); + goto exit; + } >> ... here. >> >> That is, I want the main logic of xen_hvm_init_time_ops() to run for at most >> once. Both of above two if statements will "go to exit". > > > I didn't notice this actually. > > > I think both of them should return early, there is no reason to set > hvm_time_initialized to true when, in fact, we have not initialized anything. > And to avoid printing the warning twice we can just replace it with > pr_info_once(). > > > I can fix it up when committing so no need to resend. So unless you disagree Thank you very much for fixing it during committing. Dongli Zhang
Re: [PATCH v4 2/2] xen: delay xen_hvm_init_time_ops() if kdump is boot on vcpu>=32
On 3/2/22 7:31 PM, Dongli Zhang wrote: Hi Boris, On 3/2/22 4:20 PM, Boris Ostrovsky wrote: On 3/2/22 11:40 AM, Dongli Zhang wrote: void __init xen_hvm_init_time_ops(void) { + static bool hvm_time_initialized; + + if (hvm_time_initialized) + return; + /* * vector callback is needed otherwise we cannot receive interrupts * on cpu > 0 and at this point we don't know how many cpus are * available. */ if (!xen_have_vector_callback) - return; + goto exit; Why not just return? Do we expect the value of xen_have_vector_callback to change? I just want to keep above sync with -boris if (!xen_feature(XENFEAT_hvm_safe_pvclock)) { pr_info("Xen doesn't support pvclock on HVM, disable pv timer"); + goto exit; + } ... here. That is, I want the main logic of xen_hvm_init_time_ops() to run for at most once. Both of above two if statements will "go to exit". I didn't notice this actually. I think both of them should return early, there is no reason to set hvm_time_initialized to true when, in fact, we have not initialized anything. And to avoid printing the warning twice we can just replace it with pr_info_once(). I can fix it up when committing so no need to resend. So unless you disagree Reviewed-by: Boris Ostrovsky Thank you very much! Dongli Zhang + + /* + * Only MAX_VIRT_CPUS 'vcpu_info' are embedded inside 'shared_info'. + * The __this_cpu_read(xen_vcpu) is still NULL when Xen HVM guest + * boots on vcpu >= MAX_VIRT_CPUS (e.g., kexec), To access + * __this_cpu_read(xen_vcpu) via xen_clocksource_read() will panic. + * + * The xen_hvm_init_time_ops() should be called again later after + * __this_cpu_read(xen_vcpu) is available. + */ + if (!__this_cpu_read(xen_vcpu)) { + pr_info("Delay xen_init_time_common() as kernel is running on vcpu=%d\n", + xen_vcpu_nr(0)); return; } @@ -577,6 +597,9 @@ void __init xen_hvm_init_time_ops(void) x86_cpuinit.setup_percpu_clockev = xen_hvm_setup_cpu_clockevents; x86_platform.set_wallclock = xen_set_wallclock; + +exit: + hvm_time_initialized = true; } #endif
Re: [PATCH v4 2/2] xen: delay xen_hvm_init_time_ops() if kdump is boot on vcpu>=32
Hi Boris, On 3/2/22 4:20 PM, Boris Ostrovsky wrote: > > On 3/2/22 11:40 AM, Dongli Zhang wrote: >> void __init xen_hvm_init_time_ops(void) >> { >> + static bool hvm_time_initialized; >> + >> + if (hvm_time_initialized) >> + return; >> + >> /* >> * vector callback is needed otherwise we cannot receive interrupts >> * on cpu > 0 and at this point we don't know how many cpus are >> * available. >> */ >> if (!xen_have_vector_callback) >> - return; >> + goto exit; > > > Why not just return? Do we expect the value of xen_have_vector_callback to > change? I just want to keep above sync with > > > -boris > > >> if (!xen_feature(XENFEAT_hvm_safe_pvclock)) { >> pr_info("Xen doesn't support pvclock on HVM, disable pv timer"); >> + goto exit; >> + } ... here. That is, I want the main logic of xen_hvm_init_time_ops() to run for at most once. Both of above two if statements will "go to exit". Thank you very much! Dongli Zhang >> + >> + /* >> + * Only MAX_VIRT_CPUS 'vcpu_info' are embedded inside 'shared_info'. >> + * The __this_cpu_read(xen_vcpu) is still NULL when Xen HVM guest >> + * boots on vcpu >= MAX_VIRT_CPUS (e.g., kexec), To access >> + * __this_cpu_read(xen_vcpu) via xen_clocksource_read() will panic. >> + * >> + * The xen_hvm_init_time_ops() should be called again later after >> + * __this_cpu_read(xen_vcpu) is available. >> + */ >> + if (!__this_cpu_read(xen_vcpu)) { >> + pr_info("Delay xen_init_time_common() as kernel is running on >> vcpu=%d\n", >> + xen_vcpu_nr(0)); >> return; >> } >> @@ -577,6 +597,9 @@ void __init xen_hvm_init_time_ops(void) >> x86_cpuinit.setup_percpu_clockev = xen_hvm_setup_cpu_clockevents; >> x86_platform.set_wallclock = xen_set_wallclock; >> + >> +exit: >> + hvm_time_initialized = true; >> } >> #endif >>
Re: [PATCH v4 2/2] xen: delay xen_hvm_init_time_ops() if kdump is boot on vcpu>=32
On 3/2/22 11:40 AM, Dongli Zhang wrote: void __init xen_hvm_init_time_ops(void) { + static bool hvm_time_initialized; + + if (hvm_time_initialized) + return; + /* * vector callback is needed otherwise we cannot receive interrupts * on cpu > 0 and at this point we don't know how many cpus are * available. */ if (!xen_have_vector_callback) - return; + goto exit; Why not just return? Do we expect the value of xen_have_vector_callback to change? -boris if (!xen_feature(XENFEAT_hvm_safe_pvclock)) { pr_info("Xen doesn't support pvclock on HVM, disable pv timer"); + goto exit; + } + + /* +* Only MAX_VIRT_CPUS 'vcpu_info' are embedded inside 'shared_info'. +* The __this_cpu_read(xen_vcpu) is still NULL when Xen HVM guest +* boots on vcpu >= MAX_VIRT_CPUS (e.g., kexec), To access +* __this_cpu_read(xen_vcpu) via xen_clocksource_read() will panic. +* +* The xen_hvm_init_time_ops() should be called again later after +* __this_cpu_read(xen_vcpu) is available. +*/ + if (!__this_cpu_read(xen_vcpu)) { + pr_info("Delay xen_init_time_common() as kernel is running on vcpu=%d\n", + xen_vcpu_nr(0)); return; } @@ -577,6 +597,9 @@ void __init xen_hvm_init_time_ops(void) x86_cpuinit.setup_percpu_clockev = xen_hvm_setup_cpu_clockevents; x86_platform.set_wallclock = xen_set_wallclock; + +exit: + hvm_time_initialized = true; } #endif