Re: Clock stops working on OpenBSD qemu/kvm guest

2024-01-30 Thread Dave Voutila


Lévai, Dániel  writes:

> Turns out the clock stopped every night at the time when backups were
> running and thus the VM was paused (saved, or 'managedsaved' if
> someone uses libvirt) for a minute.
> Not sure why, though; while I was testing pause/resume the clock
> didn't stop, it just failed to get synced by ntpd(8). Maybe over time
> the drift was too much?

>From my experience, ntp daemons will try not to just jump the clock
forward or backwards by too great an amount and instead increase or
decrease time advancement to try to sync. It's indeed possible you
drifted too far for ntpd to handle through this method.

You might want to look into installing the Qemu guest agent in OpenBSD
vms:

 https://marc.info/?l=openbsd-ports=158936392710472=2

Usually these agents handle properly setting the rtc after a
suspend/resume cycle of the vm. (That's what the vmmci(4) driver does,
for instance, in OpenBSD guests running atop OpenBSD's vmd(8).)

>
> Anyway, the rather curious thing was ntpd(8) not syncing the clock
> properly after resume, so I ended up giving the 'trusted' option to
> the server I'm using here. Strangely, it still took quite some time
> [1], but in the end it managed to sync - so I guess this should work
> in the long run.
>
>
> [1]
> Jan 30 11:50:33  ntpd[83421]: peer 148.6.0.1 now valid
> Jan 30 11:54:37  ntpd[4758]: adjusting local clock by 23.831836s
> Jan 30 11:54:37  ntpd[83421]: clock is now synced
> Jan 30 11:54:37  ntpd[83421]: constraint reply from 9.9.9.9: offset 23.653130
> Jan 30 11:57:48  ntpd[4758]: adjusting local clock by 22.879877s
> Jan 30 11:57:48  ntpd[83421]: clock is now unsynced
> Jan 30 12:01:34  ntpd[4758]: adjusting local clock by 21.754396s
> Jan 30 12:04:51  ntpd[4758]: adjusting local clock by 20.774539s
> Jan 30 12:08:33  ntpd[4758]: adjusting local clock by 19.670413s
> Jan 30 12:12:43  ntpd[4758]: adjusting local clock by 18.426017s
> Jan 30 12:17:04  ntpd[4758]: adjusting local clock by 17.127167s
> Jan 30 12:21:19  ntpd[4758]: adjusting local clock by 15.857846s
> Jan 30 12:21:53  ntpd[4758]: adjusting local clock by 15.688043s
> Jan 30 12:25:30  ntpd[4758]: adjusting local clock by 14.613690s
> Jan 30 12:29:49  ntpd[4758]: adjusting local clock by 13.323883s
> Jan 30 12:33:32  ntpd[4758]: adjusting local clock by 12.204646s
> Jan 30 12:34:06  ntpd[4758]: adjusting local clock by 12.036162s
> Jan 30 12:35:10  ntpd[4758]: adjusting local clock by 11.712658s
> Jan 30 12:36:13  ntpd[4758]: adjusting local clock by 11.412870s
> Jan 30 12:39:55  ntpd[4758]: adjusting local clock by 10.308062s
> Jan 30 12:43:34  ntpd[4758]: adjusting local clock by 9.208613s
> Jan 30 12:44:07  ntpd[4758]: adjusting local clock by 9.048595s
> Jan 30 12:47:48  ntpd[4758]: adjusting local clock by 7.950845s
> Jan 30 12:49:27  ntpd[4758]: adjusting local clock by 7.460912s
> Jan 30 12:53:08  ntpd[4758]: adjusting local clock by 6.360250s
> Jan 30 12:56:22  ntpd[4758]: adjusting local clock by 5.385971s
> Jan 30 12:56:53  ntpd[4758]: adjusting local clock by 5.241883s
> Jan 30 13:01:13  ntpd[4758]: adjusting local clock by 3.951414s
> Jan 30 13:04:22  ntpd[4758]: adjusting local clock by 3.009970s
> Jan 30 13:07:05  ntpd[4758]: adjusting local clock by 2.201024s
> Jan 30 13:11:18  ntpd[4758]: adjusting local clock by 0.937320s
> Jan 30 13:12:22  ntpd[4758]: adjusting local clock by 0.613777s
> Jan 30 13:13:27  ntpd[4758]: adjusting local clock by 0.285335s
> Jan 30 13:14:32  ntpd[83421]: clock is now synced



Re: Clock stops working on OpenBSD qemu/kvm guest

2024-01-30 Thread Lévai , Dániel
Turns out the clock stopped every night at the time when backups were running 
and thus the VM was paused (saved, or 'managedsaved' if someone uses libvirt) 
for a minute.
Not sure why, though; while I was testing pause/resume the clock didn't stop, 
it just failed to get synced by ntpd(8). Maybe over time the drift was too much?

Anyway, the rather curious thing was ntpd(8) not syncing the clock properly 
after resume, so I ended up giving the 'trusted' option to the server I'm using 
here. Strangely, it still took quite some time [1], but in the end it managed 
to sync - so I guess this should work in the long run.


[1]
Jan 30 11:50:33  ntpd[83421]: peer 148.6.0.1 now valid
Jan 30 11:54:37  ntpd[4758]: adjusting local clock by 23.831836s
Jan 30 11:54:37  ntpd[83421]: clock is now synced
Jan 30 11:54:37  ntpd[83421]: constraint reply from 9.9.9.9: offset 23.653130
Jan 30 11:57:48  ntpd[4758]: adjusting local clock by 22.879877s
Jan 30 11:57:48  ntpd[83421]: clock is now unsynced
Jan 30 12:01:34  ntpd[4758]: adjusting local clock by 21.754396s
Jan 30 12:04:51  ntpd[4758]: adjusting local clock by 20.774539s
Jan 30 12:08:33  ntpd[4758]: adjusting local clock by 19.670413s
Jan 30 12:12:43  ntpd[4758]: adjusting local clock by 18.426017s
Jan 30 12:17:04  ntpd[4758]: adjusting local clock by 17.127167s
Jan 30 12:21:19  ntpd[4758]: adjusting local clock by 15.857846s
Jan 30 12:21:53  ntpd[4758]: adjusting local clock by 15.688043s
Jan 30 12:25:30  ntpd[4758]: adjusting local clock by 14.613690s
Jan 30 12:29:49  ntpd[4758]: adjusting local clock by 13.323883s
Jan 30 12:33:32  ntpd[4758]: adjusting local clock by 12.204646s
Jan 30 12:34:06  ntpd[4758]: adjusting local clock by 12.036162s
Jan 30 12:35:10  ntpd[4758]: adjusting local clock by 11.712658s
Jan 30 12:36:13  ntpd[4758]: adjusting local clock by 11.412870s
Jan 30 12:39:55  ntpd[4758]: adjusting local clock by 10.308062s
Jan 30 12:43:34  ntpd[4758]: adjusting local clock by 9.208613s
Jan 30 12:44:07  ntpd[4758]: adjusting local clock by 9.048595s
Jan 30 12:47:48  ntpd[4758]: adjusting local clock by 7.950845s
Jan 30 12:49:27  ntpd[4758]: adjusting local clock by 7.460912s
Jan 30 12:53:08  ntpd[4758]: adjusting local clock by 6.360250s
Jan 30 12:56:22  ntpd[4758]: adjusting local clock by 5.385971s
Jan 30 12:56:53  ntpd[4758]: adjusting local clock by 5.241883s
Jan 30 13:01:13  ntpd[4758]: adjusting local clock by 3.951414s
Jan 30 13:04:22  ntpd[4758]: adjusting local clock by 3.009970s
Jan 30 13:07:05  ntpd[4758]: adjusting local clock by 2.201024s
Jan 30 13:11:18  ntpd[4758]: adjusting local clock by 0.937320s
Jan 30 13:12:22  ntpd[4758]: adjusting local clock by 0.613777s
Jan 30 13:13:27  ntpd[4758]: adjusting local clock by 0.285335s
Jan 30 13:14:32  ntpd[83421]: clock is now synced



Re: Clock stops working on OpenBSD qemu/kvm guest

2024-01-28 Thread Lévai , Dániel
On Friday, January 26th, 2024 at 13:40, Dave Voutila  wrote:
> 
> Lévai, Dániel l...@ecentrum.hu writes:
> 
> > Hi all!
> > 
> > I have this OpenBSD 7.4 qemu/kvm VM managed by libvirt on an Ubuntu 22.04 
> > host.
[...]
> > Anyway, the symptoms are funny, it always involves the clock stopping/not 
> > working after some period of time.
> 
> What is your Linux kernel version?

It's at 6.5.0-15-generic atm, like the upgrade log showed, that was the latest 
update to date.

> > This has been set on the guest, though (defaults):
> > kern.timecounter.tick=1
> > kern.timecounter.timestepwarnings=0
> > kern.timecounter.hardware=pvclock0
> > kern.timecounter.choice=i8254(0) pvclock0(1500) acpitimer0(1000)
> 
> 
> So pvclock should be relying on KVM to properly deal with TSC
> paravirtualzation. Do you see this issue with Linux guests using
> kvmclock? (Or do your Linux guests decide on a different clocksource?)

Nowhere else, and even this has been working fine until I think the beginning 
of this January.
There's a bunch of OSs running there, Fedora, Ubuntu, Arch, all using 
'kvm-clock' as their clock source, they work fine.
No problem with FreeBSD (using kvmclock0) and Win11 (no clue what it's using) 
either.



Re: Clock stops working on OpenBSD qemu/kvm guest

2024-01-26 Thread Dave Voutila


Lévai, Dániel  writes:

> Hi all!
>
> I have this OpenBSD 7.4 qemu/kvm VM managed by libvirt on an Ubuntu 22.04 
> host.
>
> I started to notice this month that it started to act weird, it seems
> like the clock stops every night. I couldn't pinpoint exactly what
> caused the change in behavior, the host had two package updates that
> raised suspicion:
> 2024-01-11 06:51:04 upgrade linux-image-generic-hwe-22.04:amd64 
> 6.2.0.39.40~22.04.16 6.5.0.14.14~22.04.7
> 2024-01-12 09:10:36 upgrade libvirt-daemon:amd64 8.0.0-1ubuntu7.7 
> 8.0.0-1ubuntu7.8
>
> But none of the changelogs /seemed/ relevant.
>
> Anyway, the symptoms are funny, it always involves the clock stopping/not 
> working after some period of time.
>
> When this happens, I cannot login with SSH. The ssh client connects,
> it even asks for the private key, but after confirmation it times out.
>
> The really funny things happen when I log in on the console - that I can do:
>
> When I try to ping anything from the host, it stops after the first
> successful packets (echo/reply) and then hangs (I can CTRL+C).
> Interestingly I can ping the VM from the hypervisor host indefinitely,
> but running tcpdump on the guest doesn't show anything immediately. In
> fact, looking at tcpdump while doing *anything* network related on the
> VM or to the VM doesn't result in any output right away.
> That being said, after a couple of minutes, output from tcpdump starts
> to flood the screen but I cannot say exactly why or when, it just
> suddenly happens.
>
> Running `sleep 1` just hangs.
>
> When I run `date` consecutively it shows:
> Fri Jan 26 04:20:42 CET 2024
> Fri Jan 26 04:20:39 CET 2024
> Fri Jan 26 04:20:40 CET 2024
> Fri Jan 26 04:20:41 CET 2024
> Fri Jan 26 04:20:42 CET 2024
> Fri Jan 26 04:20:43 CET 2024
> Fri Jan 26 04:20:41 CET 2024
> Fri Jan 26 04:20:42 CET 2024

Definitely should not be seeing time moving backwards. That's not
good.

>
> It always works again after a reboot - forced reset, because it cannot shut 
> down gracefully.
>
> Originally I was using SP kernel but tried with MP recently too, just out of 
> curiosity - no luck.
>
> I found two old posts seemingly related:
> https://marc.info/?t=15294229612=1=2
> ^^ I don't have that sysctl on the host and that kernel is very old there.
>

What is your Linux kernel version?

> https://www.reddit.com/r/openbsd/comments/13c9nh1/clock_issue_with_vmm_guest_on_73/
> This is on an OpenBSD host, so I can't try that sysctl either.
>
> This has been set on the guest, though (defaults):
> kern.timecounter.tick=1
> kern.timecounter.timestepwarnings=0
> kern.timecounter.hardware=pvclock0
> kern.timecounter.choice=i8254(0) pvclock0(1500) acpitimer0(1000)
>
>

So pvclock should be relying on KVM to properly deal with TSC
paravirtualzation. Do you see this issue with Linux guests using
kvmclock? (Or do your Linux guests decide on a different clocksource?)

> Any clues would be appreciated,
> Daniel
>
>
> dmesg:
> OpenBSD 7.4 (GENERIC.MP) #1397: Tue Oct 10 09:02:37 MDT 2023
> dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> real mem = 519929856 (495MB)
> avail mem = 484507648 (462MB)
> random: good seed from bootblocks
> mpath0 at root
> scsibus0 at mpath0: 256 targets
> mainbus0 at root
> bios0 at mainbus0: SMBIOS rev. 2.8 @ 0xf5960 (10 entries)
> bios0: vendor SeaBIOS version "1.15.0-1" date 04/01/2014
> bios0: QEMU Standard PC (i440FX + PIIX, 1996)
> acpi0 at bios0: ACPI 1.0
> acpi0: sleep states S5
> acpi0: tables DSDT FACP APIC WAET
> acpi0: wakeup devices
> acpitimer0 at acpi0: 3579545 Hz, 24 bits
> acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
> cpu0 at mainbus0: apid 0 (boot processor)
> cpu0: 12th Gen Intel(R) Core(TM) i7-12700K, 3609.77 MHz, 06-97-02
> cpu0:
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SS,SSE3,PCLMUL,VMX,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,HV,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,RDSEED,ADX,SMAP,CLFLUSHOPT,CLWB,SHA,UMIP,PKU,WAITPKG,MD_CLEAR,IBRS,IBPB,STIBP,SSBD,ARAT,IBRS_ALL,SKIP_L1DFL,MDS_NO,IF_PSCHANGE,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
> cpu0: 32KB 64b/line 8-way D-cache, 32KB 64b/line 8-way I-cache, 4MB
> 64b/line 16-way L2 cache, 16MB 64b/line 16-way L3 cache
> cpu0: smt 0, core 0, package 0
> mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges
> cpu0: apic clock running at 1000MHz
> cpu1 at mainbus0: apid 1 (application processor)
> cpu1: 12th Gen Intel(R) Core(TM) i7-12700K, 3609.78 MHz, 06-97-02
> cpu1:
> 

Clock stops working on OpenBSD qemu/kvm guest

2024-01-26 Thread Lévai , Dániel
Hi all!

I have this OpenBSD 7.4 qemu/kvm VM managed by libvirt on an Ubuntu 22.04 host.

I started to notice this month that it started to act weird, it seems like the 
clock stops every night. I couldn't pinpoint exactly what caused the change in 
behavior, the host had two package updates that raised suspicion:
2024-01-11 06:51:04 upgrade linux-image-generic-hwe-22.04:amd64 
6.2.0.39.40~22.04.16 6.5.0.14.14~22.04.7
2024-01-12 09:10:36 upgrade libvirt-daemon:amd64 8.0.0-1ubuntu7.7 
8.0.0-1ubuntu7.8

But none of the changelogs /seemed/ relevant.

Anyway, the symptoms are funny, it always involves the clock stopping/not 
working after some period of time.

When this happens, I cannot login with SSH. The ssh client connects, it even 
asks for the private key, but after confirmation it times out.

The really funny things happen when I log in on the console - that I can do:

When I try to ping anything from the host, it stops after the first successful 
packets (echo/reply) and then hangs (I can CTRL+C).
Interestingly I can ping the VM from the hypervisor host indefinitely, but 
running tcpdump on the guest doesn't show anything immediately. In fact, 
looking at tcpdump while doing *anything* network related on the VM or to the 
VM doesn't result in any output right away.
That being said, after a couple of minutes, output from tcpdump starts to flood 
the screen but I cannot say exactly why or when, it just suddenly happens.

Running `sleep 1` just hangs.

When I run `date` consecutively it shows:
Fri Jan 26 04:20:42 CET 2024
Fri Jan 26 04:20:39 CET 2024
Fri Jan 26 04:20:40 CET 2024
Fri Jan 26 04:20:41 CET 2024
Fri Jan 26 04:20:42 CET 2024
Fri Jan 26 04:20:43 CET 2024
Fri Jan 26 04:20:41 CET 2024
Fri Jan 26 04:20:42 CET 2024

It always works again after a reboot - forced reset, because it cannot shut 
down gracefully.

Originally I was using SP kernel but tried with MP recently too, just out of 
curiosity - no luck.

I found two old posts seemingly related:
https://marc.info/?t=15294229612=1=2
^^ I don't have that sysctl on the host and that kernel is very old there.

https://www.reddit.com/r/openbsd/comments/13c9nh1/clock_issue_with_vmm_guest_on_73/
This is on an OpenBSD host, so I can't try that sysctl either.

This has been set on the guest, though (defaults):
kern.timecounter.tick=1
kern.timecounter.timestepwarnings=0
kern.timecounter.hardware=pvclock0
kern.timecounter.choice=i8254(0) pvclock0(1500) acpitimer0(1000)


Any clues would be appreciated,
Daniel


dmesg:
OpenBSD 7.4 (GENERIC.MP) #1397: Tue Oct 10 09:02:37 MDT 2023
dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 519929856 (495MB)
avail mem = 484507648 (462MB)
random: good seed from bootblocks
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.8 @ 0xf5960 (10 entries)
bios0: vendor SeaBIOS version "1.15.0-1" date 04/01/2014
bios0: QEMU Standard PC (i440FX + PIIX, 1996)
acpi0 at bios0: ACPI 1.0
acpi0: sleep states S5
acpi0: tables DSDT FACP APIC WAET
acpi0: wakeup devices
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: 12th Gen Intel(R) Core(TM) i7-12700K, 3609.77 MHz, 06-97-02
cpu0: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SS,SSE3,PCLMUL,VMX,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,HV,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,RDSEED,ADX,SMAP,CLFLUSHOPT,CLWB,SHA,UMIP,PKU,WAITPKG,MD_CLEAR,IBRS,IBPB,STIBP,SSBD,ARAT,IBRS_ALL,SKIP_L1DFL,MDS_NO,IF_PSCHANGE,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
cpu0: 32KB 64b/line 8-way D-cache, 32KB 64b/line 8-way I-cache, 4MB 64b/line 
16-way L2 cache, 16MB 64b/line 16-way L3 cache
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges
cpu0: apic clock running at 1000MHz
cpu1 at mainbus0: apid 1 (application processor)
cpu1: 12th Gen Intel(R) Core(TM) i7-12700K, 3609.78 MHz, 06-97-02
cpu1: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SS,SSE3,PCLMUL,VMX,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,HV,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,RDSEED,ADX,SMAP,CLFLUSHOPT,CLWB,SHA,UMIP,PKU,WAITPKG,MD_CLEAR,IBRS,IBPB,STIBP,SSBD,ARAT,IBRS_ALL,SKIP_L1DFL,MDS_NO,IF_PSCHANGE,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
cpu1: 32KB 64b/line 8-way D-cache, 32KB 64b/line 8-way I-cache, 4MB 64b/line 
16-way L2 cache, 16MB 64b/line 16-way L3 cache
cpu1: smt 0, core 0, package 1
ioapic0 at mainbus0: apid 0 pa 0xfec0, version 11, 24 pins
acpiprt0 at acpi0: bus 0 (PCI0)
"ACPI0006" at acpi0 not configured
acpipci0 at acpi0 PCI0
com0 at acpi0 COM1 addr 0x3f8/0x8 irq 4: ns16550a, 16 byte fifo
acpicmos0 at acpi0
"PNP0A06" at acpi0 not configured
"PNP0A06"