I'm running prometheus at home, to monitor a few VMs on my homelab, as well as my Linodes. My homelab hypervisor runs Arch Linux and libvirtd.
Sometimes I need to patch the OS on the hypervisor and reboot it. The process up until now has been to suspend the guests, reboot the server, and restore the guests. (Yeah, I know, this is not recommended for prometheus - but it's a homelab, not a production environment . And I've seen weird clock-skew errors on production VMware before... :). This has been working for a couple of years at this point. But recently (like, the last month or so) I've been getting prometheus "Error on ingesting out-of-order samples" errors spewing in the logs after this operation, and no data stored in prometheus. Restarting the prometheus process fixes it immediately; the errors stop and data is stored in the database again. The problem has something to do with the combination of suspending/resuming the VM, and clock synchronization (hypervisor -> guest, or NTP is unknown). Looking for advice on how to debug the problem (or someone who has already encountered this...) Thanks! -- Harald -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/5a03f52b-c718-4b05-99e6-4330fb67608d%40www.fastmail.com.

