I'm running prometheus at home, to monitor a few VMs on my homelab, as well as 
my Linodes. My homelab hypervisor runs Arch Linux and libvirtd.

Sometimes I need to patch the OS on the hypervisor and reboot it. The process 
up until now has been to suspend the guests, reboot the server, and restore the 
guests. (Yeah, I know, this is not recommended for prometheus - but it's a 
homelab, not a production environment . And I've seen weird clock-skew errors 
on production VMware before... :).

This has been working for a couple of years at this point. But recently (like, 
the last month or so) I've been getting prometheus "Error on ingesting 
out-of-order samples" errors spewing in the logs after this operation, and no 
data stored in prometheus. Restarting the prometheus process fixes it 
immediately; the errors stop and data is stored in the database again.

The problem has something to do with the combination of suspending/resuming the 
VM, and clock synchronization (hypervisor -> guest, or NTP is unknown). Looking 
for advice on how to debug the problem (or someone who has already encountered 
this...)

Thanks!

-- 
Harald

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/5a03f52b-c718-4b05-99e6-4330fb67608d%40www.fastmail.com.

Reply via email to