Hi Julius! Thanks a lot for the quick reply! Yep, I don't have error logs. Or better, I have a remaining error log on one of the instances but it's related to a missing json file. Other error logs that we had in the past have been fixed so far and I've *never* seen that specific unrecoverable error.
As soon as I have time I'll try to switch on debug logs and I'll post the result! Many thanks again. F. On Tuesday, 14 April 2020 12:13:01 UTC+2, Julius Volz wrote: > > Reading the code, it looks like these warnings are produced because of > another error that lead to no successful sending of samples to the remote > end in the recent past (longer than 2x5s batch send deadline). In that > case, resharding is skipped. In case the error in sending is a > non-recoverable one, it should be logged at ERROR level ( > https://github.com/prometheus/prometheus/blob/8224ddec23598152d7506b7b39f5235a77b5e036/storage/remote/queue_manager.go#L840 > > <https://www.google.com/url?q=https%3A%2F%2Fgithub.com%2Fprometheus%2Fprometheus%2Fblob%2F8224ddec23598152d7506b7b39f5235a77b5e036%2Fstorage%2Fremote%2Fqueue_manager.go%23L840&sa=D&sntz=1&usg=AFQjCNG4ELY6Zhi_FxZpM4BiQrToT68hAA>). > > Since your logs don't contain any such "non-recoverable error" message, > another option is that the sending encounters an error that is classified > as recoverable, which is retried. Those errors are only logged at DEBUG > level though: > https://github.com/prometheus/prometheus/blob/8224ddec23598152d7506b7b39f5235a77b5e036/storage/remote/queue_manager.go#L884. > > This could be an error like a broken network connection or 5xx status code. > > So maybe it will help you to turn on debug-level logging > (--log.level=debug) for a brief while to see the error that's being retried. > > On Mon, Apr 13, 2020 at 11:12 PM Federico Buti <[email protected] > <javascript:>> wrote: > >> Hi all. >> >> As the title implies we are seeing *tons* of logs in our Prometheus >> instances about failed resharding. Here is an excerpt from the logs of an >> instance: >> >> >> ts=2020-04-13T20:30:54.507Z caller=dedupe.go:112 component=remote >> level=warn remote_name=5a17e1 url=" >> http://10.10.3.212:58186/api/v1/prom/write?db=prometheus" msg="Skipping >> resharding, last successful send was beyond threshold" >> lastSendTimestamp=1586808573 minSendTimestamp=1586809844, >> ts=2020-04-13T20:31:04.507Z caller=dedupe.go:112 component=remote >> level=warn remote_name=5a17e1 url=" >> http://10.10.3.212:58186/api/v1/prom/write?db=prometheus" msg="Skipping >> resharding, last successful send was beyond threshold" >> lastSendTimestamp=1586808573 minSendTimestamp=1586809854, >> ts=2020-04-13T20:31:14.507Z caller=dedupe.go:112 component=remote >> level=warn remote_name=5a17e1 url=" >> http://10.10.3.212:58186/api/v1/prom/write?db=prometheus" msg="Skipping >> resharding, last successful send was beyond threshold" >> lastSendTimestamp=1586808573 minSendTimestamp=1586809864, >> ts=2020-04-13T20:31:24.507Z caller=dedupe.go:112 component=remote >> level=warn remote_name=5a17e1 url=" >> http://10.10.3.212:58186/api/v1/prom/write?db=prometheus" msg="Skipping >> resharding, last successful send was beyond threshold" >> lastSendTimestamp=1586808573 minSendTimestamp=1586809874, >> ts=2020-04-13T20:31:34.507Z caller=dedupe.go:112 component=remote >> level=warn remote_name=5a17e1 url=" >> http://10.10.3.212:58186/api/v1/prom/write?db=prometheus" msg="Skipping >> resharding, last successful send was beyond threshold" >> lastSendTimestamp=1586808573 minSendTimestamp=1586809884, >> ts=2020-04-13T20:31:44.507Z caller=dedupe.go:112 component=remote >> level=warn remote_name=5a17e1 url=" >> http://10.10.3.212:58186/api/v1/prom/write?db=prometheus" msg="Skipping >> resharding, last successful send was beyond threshold" >> lastSendTimestamp=1586808573 minSendTimestamp=1586809894, >> ts=2020-04-13T20:31:54.507Z caller=dedupe.go:112 component=remote >> level=warn remote_name=5a17e1 url=" >> http://10.10.3.212:58186/api/v1/prom/write?db=prometheus" msg="Skipping >> resharding, last successful send was beyond threshold" >> lastSendTimestamp=1586808573 minSendTimestamp=1586809904, >> ts=2020-04-13T20:32:04.507Z caller=dedupe.go:112 component=remote >> level=warn remote_name=5a17e1 url=" >> http://10.10.3.212:58186/api/v1/prom/write?db=prometheus" msg="Skipping >> resharding, last successful send was beyond threshold" >> lastSendTimestamp=1586808573 minSendTimestamp=1586809914, >> ts=2020-04-13T20:32:14.507Z caller=dedupe.go:112 component=remote >> level=warn remote_name=5a17e1 url=" >> http://10.10.3.212:58186/api/v1/prom/write?db=prometheus" msg="Skipping >> resharding, last successful send was beyond threshold" >> lastSendTimestamp=1586808573 minSendTimestamp=1586809924, >> ts=2020-04-13T20:32:24.506Z caller=dedupe.go:112 component=remote >> level=warn remote_name=5a17e1 url=" >> http://10.10.3.212:58186/api/v1/prom/write?db=prometheus" msg="Skipping >> resharding, last successful send was beyond threshold" >> lastSendTimestamp=1586808573 minSendTimestamp=1586809934, >> ts=2020-04-13T20:32:34.507Z caller=dedupe.go:112 component=remote >> level=warn remote_name=5a17e1 url=" >> http://10.10.3.212:58186/api/v1/prom/write?db=prometheus" msg="Skipping >> resharding, last successful send was beyond threshold" >> lastSendTimestamp=1586808573 minSendTimestamp=1586809944, >> ts=2020-04-13T20:32:44.507Z caller=dedupe.go:112 component=remote >> level=warn remote_name=5a17e1 url=" >> http://10.10.3.212:58186/api/v1/prom/write?db=prometheus" msg="Skipping >> resharding, last successful send was beyond threshold" >> lastSendTimestamp=1586808573 minSendTimestamp=1586809954, >> ts=2020-04-13T20:32:54.507Z caller=dedupe.go:112 component=remote >> level=warn remote_name=5a17e1 url=" >> http://10.10.3.212:58186/api/v1/prom/write?db=prometheus" msg="Skipping >> resharding, last successful send was beyond threshold" >> lastSendTimestamp=1586808573 minSendTimestamp=1586809964, >> ts=2020-04-13T20:33:04.507Z caller=dedupe.go:112 component=remote >> level=warn remote_name=5a17e1 url=" >> http://10.10.3.212:58186/api/v1/prom/write?db=prometheus" msg="Skipping >> resharding, last successful send was beyond threshold" >> lastSendTimestamp=1586808573 minSendTimestamp=1586809974, >> ts=2020-04-13T20:33:14.507Z caller=dedupe.go:112 component=remote >> level=warn remote_name=5a17e1 url=" >> http://10.10.3.212:58186/api/v1/prom/write?db=prometheus" msg="Skipping >> resharding, last successful send was beyond threshold" >> lastSendTimestamp=1586808573 minSendTimestamp=1586809984, >> ts=2020-04-13T20:33:24.507Z caller=dedupe.go:112 component=remote >> level=warn remote_name=5a17e1 url=" >> http://10.10.3.212:58186/api/v1/prom/write?db=prometheus" msg="Skipping >> resharding, last successful send was beyond threshold" >> lastSendTimestamp=1586808573 minSendTimestamp=1586809994, >> ts=2020-04-13T20:33:34.507Z caller=dedupe.go:112 component=remote >> level=warn remote_name=5a17e1 url=" >> http://10.10.3.212:58186/api/v1/prom/write?db=prometheus" msg="Skipping >> resharding, last successful send was beyond threshold" >> lastSendTimestamp=1586808573 minSendTimestamp=1586810004, >> ts=2020-04-13T20:33:44.507Z caller=dedupe.go:112 component=remote >> level=warn remote_name=5a17e1 url=" >> http://10.10.3.212:58186/api/v1/prom/write?db=prometheus" msg="Skipping >> resharding, last successful send was beyond threshold" >> lastSendTimestamp=1586808573 minSendTimestamp=1586810014, >> ts=2020-04-13T20:33:54.507Z caller=dedupe.go:112 component=remote >> level=warn remote_name=5a17e1 url=" >> http://10.10.3.212:58186/api/v1/prom/write?db=prometheus" msg="Skipping >> resharding, last successful send was beyond threshold" >> lastSendTimestamp=1586808573 minSendTimestamp=1586810024, >> ts=2020-04-13T20:34:04.507Z caller=dedupe.go:112 component=remote >> level=warn remote_name=5a17e1 url=" >> http://10.10.3.212:58186/api/v1/prom/write?db=prometheus" msg="Skipping >> resharding, last successful send was beyond threshold" >> lastSendTimestamp=1586808573 minSendTimestamp=1586810034, >> ts=2020-04-13T20:34:24.507Z caller=dedupe.go:112 component=remote >> level=warn remote_name=5a17e1 url=" >> http://10.10.3.212:58186/api/v1/prom/write?db=prometheus" msg="Skipping >> resharding, last successful send was beyond threshold" >> lastSendTimestamp=1586810048 minSendTimestamp=1586810054, >> ts=2020-04-13T20:34:34.521Z caller=dedupe.go:112 component=remote >> level=warn remote_name=5a17e1 url=" >> http://10.10.3.212:58186/api/v1/prom/write?db=prometheus" msg="Skipping >> resharding, last successful send was beyond threshold" >> lastSendTimestamp=1586810048 minSendTimestamp=1586810064, >> ts=2020-04-13T20:34:44.507Z caller=dedupe.go:112 component=remote >> level=warn remote_name=5a17e1 url=" >> http://10.10.3.212:58186/api/v1/prom/write?db=prometheus" msg="Skipping >> resharding, last successful send was beyond threshold" >> lastSendTimestamp=1586810048 minSendTimestamp=1586810074, >> ts=2020-04-13T20:34:54.507Z caller=dedupe.go:112 component=remote >> level=warn remote_name=5a17e1 url=" >> http://10.10.3.212:58186/api/v1/prom/write?db=prometheus" msg="Skipping >> resharding, last successful send was beyond threshold" >> lastSendTimestamp=1586810048 minSendTimestamp=1586810084, >> ts=2020-04-13T20:35:04.507Z caller=dedupe.go:112 component=remote >> level=warn remote_name=5a17e1 url=" >> http://10.10.3.212:58186/api/v1/prom/write?db=prometheus" msg="Skipping >> resharding, last successful send was beyond threshold" >> lastSendTimestamp=1586810048 minSendTimestamp=1586810094, >> ts=2020-04-13T20:35:14.507Z caller=dedupe.go:112 component=remote >> level=warn remote_name=5a17e1 url=" >> http://10.10.3.212:58186/api/v1/prom/write?db=prometheus" msg="Skipping >> resharding, last successful send was beyond threshold" >> lastSendTimestamp=1586810048 minSendTimestamp=1586810104, >> >> >> The logs on the other instance are a bit more diluted in time: >> >> ts=2020-04-13T19:23:35.283Z caller=dedupe.go:112 component=remote >> level=warn remote_name=5a17e1 url=" >> http://10.10.3.212:58186/api/v1/prom/write?db=prometheus" msg="Skipping >> resharding, last successful send was beyond threshold" >> lastSendTimestamp=1586805515 minSendTimestamp=1586805805, >> ts=2020-04-13T19:23:45.283Z caller=dedupe.go:112 component=remote >> level=warn remote_name=5a17e1 url=" >> http://10.10.3.212:58186/api/v1/prom/write?db=prometheus" msg="Skipping >> resharding, last successful send was beyond threshold" >> lastSendTimestamp=1586805515 minSendTimestamp=1586805815, >> ts=2020-04-13T19:23:55.283Z caller=dedupe.go:112 component=remote >> level=warn remote_name=5a17e1 url=" >> http://10.10.3.212:58186/api/v1/prom/write?db=prometheus" msg="Skipping >> resharding, last successful send was beyond threshold" >> lastSendTimestamp=1586805515 minSendTimestamp=1586805825, >> ts=2020-04-13T19:24:05.283Z caller=dedupe.go:112 component=remote >> level=warn remote_name=5a17e1 url=" >> http://10.10.3.212:58186/api/v1/prom/write?db=prometheus" msg="Skipping >> resharding, last successful send was beyond threshold" >> lastSendTimestamp=1586805515 minSendTimestamp=1586805835, >> ts=2020-04-13T19:24:15.283Z caller=dedupe.go:112 component=remote >> level=warn remote_name=5a17e1 url=" >> http://10.10.3.212:58186/api/v1/prom/write?db=prometheus" msg="Skipping >> resharding, last successful send was beyond threshold" >> lastSendTimestamp=1586805515 minSendTimestamp=1586805845, >> ts=2020-04-13T19:24:25.283Z caller=dedupe.go:112 component=remote >> level=warn remote_name=5a17e1 url=" >> http://10.10.3.212:58186/api/v1/prom/write?db=prometheus" msg="Skipping >> resharding, last successful send was beyond threshold" >> lastSendTimestamp=1586805515 minSendTimestamp=1586805855, >> >> >> Our current remote write configuration is as follows: >> >> remote_write: >> - url: http://10.10.3.212:8428/api/v1/write >> queue_config: >> max_samples_per_send: 10000 >> - url: "http://10.10.3.212:58186/api/v1/prom/write?db=prometheus" >> write_relabel_configs: >> - source_labels: [__name__, check] >> regex: "xxxx_xx" >> action: keep >> >> >> >> Since Remote write tuning >> <https://prometheus.io/docs/practices/remote_write/> documentation says >> that "Prometheus implements sane defaults for remote write" should I just >> remove the setting for max_samples_per_send or are there any other >> advice that applies here? >> I skimmed the linked page but apart from adjusting capacity on the basis >> of the chosen max_samples_per_send, I'm not sure whatever else can >> really help here. >> >> Any advice really appreciated. >> Thanks in advance, >> F. >> >> -- >> You received this message because you are subscribed to the Google Groups >> "Prometheus Users" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/prometheus-users/978d0750-db67-45ef-89fd-90d25e5ba793%40googlegroups.com >> >> <https://groups.google.com/d/msgid/prometheus-users/978d0750-db67-45ef-89fd-90d25e5ba793%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> > -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/e10e0ec4-4ba7-4487-8eed-60de79b18338%40googlegroups.com.

