Hi Jan, thanks for your patch. I've just tried but I have the same issue: I start a kvm guest (windows 10 iot 2019) and it works, but when I start a latency test the whole system hangs completely.
Could i give you more info in some way? R. Il gio 4 apr 2019, 21:05 Jan Kiszka <jan.kis...@siemens.com> ha scritto: > On 21.03.19 09:01, Jan Kiszka wrote: > > On 21.03.19 08:04, cagnulein wrote: > >> I've got a similar issue even with the 4.9.146 with a kvm guest on and > latency > >> on too. It's quite deterministic. Any idea? > >> > > > > I didn't trigger your trace yet, but I can know study this splash: > > > > [ 140.794470] I-pipe: Detected illicit call from head domain 'Xenomai' > > [ 140.794470] into a regular Linux service > > [ 140.797855] CPU: 0 PID: 1021 Comm: qemu-system-x86 Not tainted > 4.14.103+ #43 > > [ 140.799644] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS > > rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014 > > [ 140.799648] I-pipe domain: Xenomai > > [ 140.799650] Call Trace: > > [ 140.799654] <IRQ> > > [ 140.799670] ipipe_root_only+0xfe/0x130 > > [ 140.799678] ipipe_stall_root+0xe/0x60 > > [ 140.799685] lock_acquire+0x62/0x1a0 > > [ 140.799692] ? __switch_to_asm+0x40/0x70 > > [ 140.799703] kvm_arch_vcpu_put+0xb0/0x1a0 > > [ 140.799707] ? kvm_arch_vcpu_put+0x6e/0x1a0 > > [ 140.799717] __ipipe_handle_vm_preemption+0x2a/0x50 > > [ 140.799723] ___xnsched_run.part.76+0x371/0x590 > > [ 140.799733] xnintr_core_clock_handler+0x3f5/0x420 > > [ 140.799745] dispatch_irq_head+0x9a/0x150 > > [ 140.799757] __ipipe_handle_irq+0x7e/0x210 > > [ 140.799768] apic_timer_interrupt+0x7f/0xb0 > > [...] > > > > Will let you know when I have details. > > > > Jan > > > > I've got 4.14 working again with kvm using these changes: > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index 31469b638286..f49247be061b 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -3023,6 +3023,15 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu) > vcpu->arch.preempted_in_kernel = > !kvm_x86_ops->get_cpl(vcpu); > > flags = hard_cond_local_irq_save(); > + > + /* > + * Do not update steal time accounting while running over the head > + * domain as this may introduce high latencies and will also issue > + * context violation reports. > + */ > + if (!ipipe_root_p) > + goto skip_steal_time_update; > + > /* > * Disable page faults because we're in atomic context here. > * kvm_write_guest_offset_cached() would call might_fault() > @@ -3040,6 +3049,7 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu) > kvm_steal_time_set_preempted(vcpu); > srcu_read_unlock(&vcpu->kvm->srcu, idx); > pagefault_enable(); > +skip_steal_time_update: > kvm_x86_ops->vcpu_put(vcpu); > vcpu->arch.last_host_tsc = rdtsc(); > /* > @@ -3064,7 +3074,9 @@ void __ipipe_handle_vm_preemption(struct > ipipe_vm_notifier *nfy) > struct kvm_vcpu *vcpu; > > vcpu = container_of(nfy, struct kvm_vcpu, ipipe_notifier); > + preempt_disable(); > kvm_arch_vcpu_put(vcpu); > + preempt_enable_no_resched(); > kvm_restore_shared_msrs(smsr); > __ipipe_exit_vm(); > } > @@ -7169,6 +7181,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) > || need_resched() || signal_pending(current)) { > vcpu->mode = OUTSIDE_GUEST_MODE; > smp_wmb(); > + __ipipe_exit_vm(); > + hard_cond_local_irq_enable(); > local_irq_enable(); > preempt_enable(); > vcpu->srcu_idx = srcu_read_lock(&vcpu->kvm->srcu); > @@ -7237,6 +7251,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) > > guest_exit_irqoff(); > > + hard_cond_local_irq_enable(); > local_irq_enable(); > preempt_enable(); > > > Could you give that a try as well? Besides stability reports, I would > specifically be interested in latency numbers, if they are excessive. > > I've also fixed kvm on 4.4 which had less issues but also didn't work > out of the box. That branch will be updated later. Moreover, I need to > check SVM again, at least offline. > > Jan > > -- > Siemens AG, Corporate Technology, CT RDA IOT SES-DE > Corporate Competence Center Embedded Linux >