Hi,
I have a situation where I have a few "local" Prometheus servers sending
data to a "global" server using the remote write API. I get errors that
look like this on the remote write receiver:
ts=2022-02-03T12:41:11.244Z caller=write_handler.go:57 level=error
component=web msg="Out of order sample from remote write" err="duplicate
sample for timestamp"
The senders get the same error from the receiver, with a 400 HTML code.
After much trial and error I figured out that it happens because I have the
same recording rules on all servers, on both senders and receiver.
recording-rules.yaml looks like this:
```
groups:
- name: node-exporter
rules:
# CPU cores per node
- record: instance:node_cpus:count
expr: count(node_cpu_seconds_total{mode="idle"}) without (cpu,mode)
# CPU in use by CPU
- record: instance_cpu:node_cpu_seconds_not_idle:rate5m
expr: sum(rate(node_cpu_seconds_total{mode!="idle"}[5m])) without
(mode)
```
However, if I delete the second rule, the errors are gone. So if I change
recording-rules.yaml on all servers to:
```
groups:
- name: node-exporter
rules:
# CPU cores per node
- record: instance:node_cpus:count
expr: count(node_cpu_seconds_total{mode="idle"}) without (cpu,mode)
```
Why?
1. Why are there duplicates in the first case, does the remote write
receiver also run the rules when it receives data?
2. Why aren't there errors any more when the only rule is the CPU count?
Shouldn't there be duplicates in that case too?
--
You received this message because you are subscribed to the Google Groups
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/prometheus-users/ff37682b-cc2d-46b4-9010-c7617d41b068n%40googlegroups.com.