[prometheus-developers] RFC: Increasing Resistance to Remote Write Outages

Robert Fratto Tue, 16 Feb 2021 10:43:30 -0800

Hi! 

I've recently observed that the tolerance to a Remote Write outage is 
variable. The amount of lost data depends on the volume of existing data in 
WAL segments, which WAL segment is currently being tailed by Remote Write, 
and how much time remains before the next WAL checkpoint is created.


I've written a proposal to try to address this and make tolerance to an 
outage more predictable. Feedback 
would be appreciated: 
https://docs.google.com/document/d/1DcaHoWZnA-N5UlQ7sJ0IlPKe4ul2nimWlzWaZSxPkNU/edit#

I plan to contribute the changes for this myself.

Best,
Robert

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/448f74a4-d6e8-45a4-b3ea-b31a1c480884n%40googlegroups.com.

[prometheus-developers] RFC: Increasing Resistance to Remote Write Outages

Reply via email to