Reading the code, it looks like these warnings are produced because of
another error that lead to no successful sending of samples to the remote
end in the recent past (longer than 2x5s batch send deadline). In that
case, resharding is skipped. In case the error in sending is a
non-recoverable one, it should be logged at ERROR level (
https://github.com/prometheus/prometheus/blob/8224ddec23598152d7506b7b39f5235a77b5e036/storage/remote/queue_manager.go#L840).
Since your logs don't contain any such "non-recoverable error" message,
another option is that the sending encounters an error that is classified
as recoverable, which is retried. Those errors are only logged at DEBUG
level though:
https://github.com/prometheus/prometheus/blob/8224ddec23598152d7506b7b39f5235a77b5e036/storage/remote/queue_manager.go#L884.
This could be an error like a broken network connection or 5xx status code.

So maybe it will help you to turn on debug-level logging
(--log.level=debug) for a brief while to see the error that's being retried.

On Mon, Apr 13, 2020 at 11:12 PM Federico Buti <[email protected]> wrote:

> Hi all.
>
> As the title implies we are seeing *tons* of logs in our Prometheus
> instances about failed resharding. Here is an excerpt from the logs of an
> instance:
>
>
> ts=2020-04-13T20:30:54.507Z caller=dedupe.go:112 component=remote
> level=warn remote_name=5a17e1 url="
> http://10.10.3.212:58186/api/v1/prom/write?db=prometheus"; msg="Skipping
> resharding, last successful send was beyond threshold"
> lastSendTimestamp=1586808573 minSendTimestamp=1586809844,
> ts=2020-04-13T20:31:04.507Z caller=dedupe.go:112 component=remote
> level=warn remote_name=5a17e1 url="
> http://10.10.3.212:58186/api/v1/prom/write?db=prometheus"; msg="Skipping
> resharding, last successful send was beyond threshold"
> lastSendTimestamp=1586808573 minSendTimestamp=1586809854,
> ts=2020-04-13T20:31:14.507Z caller=dedupe.go:112 component=remote
> level=warn remote_name=5a17e1 url="
> http://10.10.3.212:58186/api/v1/prom/write?db=prometheus"; msg="Skipping
> resharding, last successful send was beyond threshold"
> lastSendTimestamp=1586808573 minSendTimestamp=1586809864,
> ts=2020-04-13T20:31:24.507Z caller=dedupe.go:112 component=remote
> level=warn remote_name=5a17e1 url="
> http://10.10.3.212:58186/api/v1/prom/write?db=prometheus"; msg="Skipping
> resharding, last successful send was beyond threshold"
> lastSendTimestamp=1586808573 minSendTimestamp=1586809874,
> ts=2020-04-13T20:31:34.507Z caller=dedupe.go:112 component=remote
> level=warn remote_name=5a17e1 url="
> http://10.10.3.212:58186/api/v1/prom/write?db=prometheus"; msg="Skipping
> resharding, last successful send was beyond threshold"
> lastSendTimestamp=1586808573 minSendTimestamp=1586809884,
> ts=2020-04-13T20:31:44.507Z caller=dedupe.go:112 component=remote
> level=warn remote_name=5a17e1 url="
> http://10.10.3.212:58186/api/v1/prom/write?db=prometheus"; msg="Skipping
> resharding, last successful send was beyond threshold"
> lastSendTimestamp=1586808573 minSendTimestamp=1586809894,
> ts=2020-04-13T20:31:54.507Z caller=dedupe.go:112 component=remote
> level=warn remote_name=5a17e1 url="
> http://10.10.3.212:58186/api/v1/prom/write?db=prometheus"; msg="Skipping
> resharding, last successful send was beyond threshold"
> lastSendTimestamp=1586808573 minSendTimestamp=1586809904,
> ts=2020-04-13T20:32:04.507Z caller=dedupe.go:112 component=remote
> level=warn remote_name=5a17e1 url="
> http://10.10.3.212:58186/api/v1/prom/write?db=prometheus"; msg="Skipping
> resharding, last successful send was beyond threshold"
> lastSendTimestamp=1586808573 minSendTimestamp=1586809914,
> ts=2020-04-13T20:32:14.507Z caller=dedupe.go:112 component=remote
> level=warn remote_name=5a17e1 url="
> http://10.10.3.212:58186/api/v1/prom/write?db=prometheus"; msg="Skipping
> resharding, last successful send was beyond threshold"
> lastSendTimestamp=1586808573 minSendTimestamp=1586809924,
> ts=2020-04-13T20:32:24.506Z caller=dedupe.go:112 component=remote
> level=warn remote_name=5a17e1 url="
> http://10.10.3.212:58186/api/v1/prom/write?db=prometheus"; msg="Skipping
> resharding, last successful send was beyond threshold"
> lastSendTimestamp=1586808573 minSendTimestamp=1586809934,
> ts=2020-04-13T20:32:34.507Z caller=dedupe.go:112 component=remote
> level=warn remote_name=5a17e1 url="
> http://10.10.3.212:58186/api/v1/prom/write?db=prometheus"; msg="Skipping
> resharding, last successful send was beyond threshold"
> lastSendTimestamp=1586808573 minSendTimestamp=1586809944,
> ts=2020-04-13T20:32:44.507Z caller=dedupe.go:112 component=remote
> level=warn remote_name=5a17e1 url="
> http://10.10.3.212:58186/api/v1/prom/write?db=prometheus"; msg="Skipping
> resharding, last successful send was beyond threshold"
> lastSendTimestamp=1586808573 minSendTimestamp=1586809954,
> ts=2020-04-13T20:32:54.507Z caller=dedupe.go:112 component=remote
> level=warn remote_name=5a17e1 url="
> http://10.10.3.212:58186/api/v1/prom/write?db=prometheus"; msg="Skipping
> resharding, last successful send was beyond threshold"
> lastSendTimestamp=1586808573 minSendTimestamp=1586809964,
> ts=2020-04-13T20:33:04.507Z caller=dedupe.go:112 component=remote
> level=warn remote_name=5a17e1 url="
> http://10.10.3.212:58186/api/v1/prom/write?db=prometheus"; msg="Skipping
> resharding, last successful send was beyond threshold"
> lastSendTimestamp=1586808573 minSendTimestamp=1586809974,
> ts=2020-04-13T20:33:14.507Z caller=dedupe.go:112 component=remote
> level=warn remote_name=5a17e1 url="
> http://10.10.3.212:58186/api/v1/prom/write?db=prometheus"; msg="Skipping
> resharding, last successful send was beyond threshold"
> lastSendTimestamp=1586808573 minSendTimestamp=1586809984,
> ts=2020-04-13T20:33:24.507Z caller=dedupe.go:112 component=remote
> level=warn remote_name=5a17e1 url="
> http://10.10.3.212:58186/api/v1/prom/write?db=prometheus"; msg="Skipping
> resharding, last successful send was beyond threshold"
> lastSendTimestamp=1586808573 minSendTimestamp=1586809994,
> ts=2020-04-13T20:33:34.507Z caller=dedupe.go:112 component=remote
> level=warn remote_name=5a17e1 url="
> http://10.10.3.212:58186/api/v1/prom/write?db=prometheus"; msg="Skipping
> resharding, last successful send was beyond threshold"
> lastSendTimestamp=1586808573 minSendTimestamp=1586810004,
> ts=2020-04-13T20:33:44.507Z caller=dedupe.go:112 component=remote
> level=warn remote_name=5a17e1 url="
> http://10.10.3.212:58186/api/v1/prom/write?db=prometheus"; msg="Skipping
> resharding, last successful send was beyond threshold"
> lastSendTimestamp=1586808573 minSendTimestamp=1586810014,
> ts=2020-04-13T20:33:54.507Z caller=dedupe.go:112 component=remote
> level=warn remote_name=5a17e1 url="
> http://10.10.3.212:58186/api/v1/prom/write?db=prometheus"; msg="Skipping
> resharding, last successful send was beyond threshold"
> lastSendTimestamp=1586808573 minSendTimestamp=1586810024,
> ts=2020-04-13T20:34:04.507Z caller=dedupe.go:112 component=remote
> level=warn remote_name=5a17e1 url="
> http://10.10.3.212:58186/api/v1/prom/write?db=prometheus"; msg="Skipping
> resharding, last successful send was beyond threshold"
> lastSendTimestamp=1586808573 minSendTimestamp=1586810034,
> ts=2020-04-13T20:34:24.507Z caller=dedupe.go:112 component=remote
> level=warn remote_name=5a17e1 url="
> http://10.10.3.212:58186/api/v1/prom/write?db=prometheus"; msg="Skipping
> resharding, last successful send was beyond threshold"
> lastSendTimestamp=1586810048 minSendTimestamp=1586810054,
> ts=2020-04-13T20:34:34.521Z caller=dedupe.go:112 component=remote
> level=warn remote_name=5a17e1 url="
> http://10.10.3.212:58186/api/v1/prom/write?db=prometheus"; msg="Skipping
> resharding, last successful send was beyond threshold"
> lastSendTimestamp=1586810048 minSendTimestamp=1586810064,
> ts=2020-04-13T20:34:44.507Z caller=dedupe.go:112 component=remote
> level=warn remote_name=5a17e1 url="
> http://10.10.3.212:58186/api/v1/prom/write?db=prometheus"; msg="Skipping
> resharding, last successful send was beyond threshold"
> lastSendTimestamp=1586810048 minSendTimestamp=1586810074,
> ts=2020-04-13T20:34:54.507Z caller=dedupe.go:112 component=remote
> level=warn remote_name=5a17e1 url="
> http://10.10.3.212:58186/api/v1/prom/write?db=prometheus"; msg="Skipping
> resharding, last successful send was beyond threshold"
> lastSendTimestamp=1586810048 minSendTimestamp=1586810084,
> ts=2020-04-13T20:35:04.507Z caller=dedupe.go:112 component=remote
> level=warn remote_name=5a17e1 url="
> http://10.10.3.212:58186/api/v1/prom/write?db=prometheus"; msg="Skipping
> resharding, last successful send was beyond threshold"
> lastSendTimestamp=1586810048 minSendTimestamp=1586810094,
> ts=2020-04-13T20:35:14.507Z caller=dedupe.go:112 component=remote
> level=warn remote_name=5a17e1 url="
> http://10.10.3.212:58186/api/v1/prom/write?db=prometheus"; msg="Skipping
> resharding, last successful send was beyond threshold"
> lastSendTimestamp=1586810048 minSendTimestamp=1586810104,
>
>
> The logs on the other instance are a bit more diluted in time:
>
> ts=2020-04-13T19:23:35.283Z caller=dedupe.go:112 component=remote
> level=warn remote_name=5a17e1 url="
> http://10.10.3.212:58186/api/v1/prom/write?db=prometheus"; msg="Skipping
> resharding, last successful send was beyond threshold"
> lastSendTimestamp=1586805515 minSendTimestamp=1586805805,
> ts=2020-04-13T19:23:45.283Z caller=dedupe.go:112 component=remote
> level=warn remote_name=5a17e1 url="
> http://10.10.3.212:58186/api/v1/prom/write?db=prometheus"; msg="Skipping
> resharding, last successful send was beyond threshold"
> lastSendTimestamp=1586805515 minSendTimestamp=1586805815,
> ts=2020-04-13T19:23:55.283Z caller=dedupe.go:112 component=remote
> level=warn remote_name=5a17e1 url="
> http://10.10.3.212:58186/api/v1/prom/write?db=prometheus"; msg="Skipping
> resharding, last successful send was beyond threshold"
> lastSendTimestamp=1586805515 minSendTimestamp=1586805825,
> ts=2020-04-13T19:24:05.283Z caller=dedupe.go:112 component=remote
> level=warn remote_name=5a17e1 url="
> http://10.10.3.212:58186/api/v1/prom/write?db=prometheus"; msg="Skipping
> resharding, last successful send was beyond threshold"
> lastSendTimestamp=1586805515 minSendTimestamp=1586805835,
> ts=2020-04-13T19:24:15.283Z caller=dedupe.go:112 component=remote
> level=warn remote_name=5a17e1 url="
> http://10.10.3.212:58186/api/v1/prom/write?db=prometheus"; msg="Skipping
> resharding, last successful send was beyond threshold"
> lastSendTimestamp=1586805515 minSendTimestamp=1586805845,
> ts=2020-04-13T19:24:25.283Z caller=dedupe.go:112 component=remote
> level=warn remote_name=5a17e1 url="
> http://10.10.3.212:58186/api/v1/prom/write?db=prometheus"; msg="Skipping
> resharding, last successful send was beyond threshold"
> lastSendTimestamp=1586805515 minSendTimestamp=1586805855,
>
>
> Our current remote write configuration is as follows:
>
> remote_write:
>   - url: http://10.10.3.212:8428/api/v1/write
>     queue_config:
>       max_samples_per_send: 10000
>   - url: "http://10.10.3.212:58186/api/v1/prom/write?db=prometheus";
>     write_relabel_configs:
>       - source_labels: [__name__, check]
>         regex: "xxxx_xx"
>         action: keep
>
>
>
> Since Remote write tuning
> <https://prometheus.io/docs/practices/remote_write/> documentation says
> that "Prometheus implements sane defaults for remote write" should I just
> remove the setting for max_samples_per_send or are there any other advice
> that applies here?
> I skimmed the linked page but apart from adjusting capacity on the basis
> of the chosen max_samples_per_send, I'm not sure whatever else can really
> help here.
>
> Any advice really appreciated.
> Thanks in advance,
> F.
>
> --
> You received this message because you are subscribed to the Google Groups
> "Prometheus Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/prometheus-users/978d0750-db67-45ef-89fd-90d25e5ba793%40googlegroups.com
> <https://groups.google.com/d/msgid/prometheus-users/978d0750-db67-45ef-89fd-90d25e5ba793%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/CA%2BT6Yox99Sk-RVS78Smj-K0zu9ru5uccRx9rv3xu-4wt0zsg5A%40mail.gmail.com.

Reply via email to