Hi! I've recently observed that the tolerance to a Remote Write outage is variable. The amount of lost data depends on the volume of existing data in WAL segments, which WAL segment is currently being tailed by Remote Write, and how much time remains before the next WAL checkpoint is created.
I've written a proposal to try to address this and make tolerance to an outage more predictable. Feedback would be appreciated: https://docs.google.com/document/d/1DcaHoWZnA-N5UlQ7sJ0IlPKe4ul2nimWlzWaZSxPkNU/edit# I plan to contribute the changes for this myself. Best, Robert -- You received this message because you are subscribed to the Google Groups "Prometheus Developers" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/448f74a4-d6e8-45a4-b3ea-b31a1c480884n%40googlegroups.com.

