Hi Julius!

Thanks a lot for the quick reply!
Yep, I don't have error logs. Or better, I have a remaining error log on 
one of the instances but it's related to a missing json file. Other error 
logs that we had in the past have been fixed so far and I've *never* seen 
that specific unrecoverable error.

As soon as I have time I'll try to switch on debug logs and I'll post the 
result!
Many thanks again.
F.

On Tuesday, 14 April 2020 12:13:01 UTC+2, Julius Volz wrote:
>
> Reading the code, it looks like these warnings are produced because of 
> another error that lead to no successful sending of samples to the remote 
> end in the recent past (longer than 2x5s batch send deadline). In that 
> case, resharding is skipped. In case the error in sending is a 
> non-recoverable one, it should be logged at ERROR level (
> https://github.com/prometheus/prometheus/blob/8224ddec23598152d7506b7b39f5235a77b5e036/storage/remote/queue_manager.go#L840
>  
> <https://www.google.com/url?q=https%3A%2F%2Fgithub.com%2Fprometheus%2Fprometheus%2Fblob%2F8224ddec23598152d7506b7b39f5235a77b5e036%2Fstorage%2Fremote%2Fqueue_manager.go%23L840&sa=D&sntz=1&usg=AFQjCNG4ELY6Zhi_FxZpM4BiQrToT68hAA>).
>  
> Since your logs don't contain any such "non-recoverable error" message, 
> another option is that the sending encounters an error that is classified 
> as recoverable, which is retried. Those errors are only logged at DEBUG 
> level though: 
> https://github.com/prometheus/prometheus/blob/8224ddec23598152d7506b7b39f5235a77b5e036/storage/remote/queue_manager.go#L884.
>  
> This could be an error like a broken network connection or 5xx status code.
>
> So maybe it will help you to turn on debug-level logging 
> (--log.level=debug) for a brief while to see the error that's being retried.
>
> On Mon, Apr 13, 2020 at 11:12 PM Federico Buti <[email protected] 
> <javascript:>> wrote:
>
>> Hi all.
>>
>> As the title implies we are seeing *tons* of logs in our Prometheus 
>> instances about failed resharding. Here is an excerpt from the logs of an 
>> instance:
>>
>>
>> ts=2020-04-13T20:30:54.507Z caller=dedupe.go:112 component=remote 
>> level=warn remote_name=5a17e1 url="
>> http://10.10.3.212:58186/api/v1/prom/write?db=prometheus"; msg="Skipping 
>> resharding, last successful send was beyond threshold" 
>> lastSendTimestamp=1586808573 minSendTimestamp=1586809844,
>> ts=2020-04-13T20:31:04.507Z caller=dedupe.go:112 component=remote 
>> level=warn remote_name=5a17e1 url="
>> http://10.10.3.212:58186/api/v1/prom/write?db=prometheus"; msg="Skipping 
>> resharding, last successful send was beyond threshold" 
>> lastSendTimestamp=1586808573 minSendTimestamp=1586809854,
>> ts=2020-04-13T20:31:14.507Z caller=dedupe.go:112 component=remote 
>> level=warn remote_name=5a17e1 url="
>> http://10.10.3.212:58186/api/v1/prom/write?db=prometheus"; msg="Skipping 
>> resharding, last successful send was beyond threshold" 
>> lastSendTimestamp=1586808573 minSendTimestamp=1586809864,
>> ts=2020-04-13T20:31:24.507Z caller=dedupe.go:112 component=remote 
>> level=warn remote_name=5a17e1 url="
>> http://10.10.3.212:58186/api/v1/prom/write?db=prometheus"; msg="Skipping 
>> resharding, last successful send was beyond threshold" 
>> lastSendTimestamp=1586808573 minSendTimestamp=1586809874,
>> ts=2020-04-13T20:31:34.507Z caller=dedupe.go:112 component=remote 
>> level=warn remote_name=5a17e1 url="
>> http://10.10.3.212:58186/api/v1/prom/write?db=prometheus"; msg="Skipping 
>> resharding, last successful send was beyond threshold" 
>> lastSendTimestamp=1586808573 minSendTimestamp=1586809884,
>> ts=2020-04-13T20:31:44.507Z caller=dedupe.go:112 component=remote 
>> level=warn remote_name=5a17e1 url="
>> http://10.10.3.212:58186/api/v1/prom/write?db=prometheus"; msg="Skipping 
>> resharding, last successful send was beyond threshold" 
>> lastSendTimestamp=1586808573 minSendTimestamp=1586809894,
>> ts=2020-04-13T20:31:54.507Z caller=dedupe.go:112 component=remote 
>> level=warn remote_name=5a17e1 url="
>> http://10.10.3.212:58186/api/v1/prom/write?db=prometheus"; msg="Skipping 
>> resharding, last successful send was beyond threshold" 
>> lastSendTimestamp=1586808573 minSendTimestamp=1586809904,
>> ts=2020-04-13T20:32:04.507Z caller=dedupe.go:112 component=remote 
>> level=warn remote_name=5a17e1 url="
>> http://10.10.3.212:58186/api/v1/prom/write?db=prometheus"; msg="Skipping 
>> resharding, last successful send was beyond threshold" 
>> lastSendTimestamp=1586808573 minSendTimestamp=1586809914,
>> ts=2020-04-13T20:32:14.507Z caller=dedupe.go:112 component=remote 
>> level=warn remote_name=5a17e1 url="
>> http://10.10.3.212:58186/api/v1/prom/write?db=prometheus"; msg="Skipping 
>> resharding, last successful send was beyond threshold" 
>> lastSendTimestamp=1586808573 minSendTimestamp=1586809924,
>> ts=2020-04-13T20:32:24.506Z caller=dedupe.go:112 component=remote 
>> level=warn remote_name=5a17e1 url="
>> http://10.10.3.212:58186/api/v1/prom/write?db=prometheus"; msg="Skipping 
>> resharding, last successful send was beyond threshold" 
>> lastSendTimestamp=1586808573 minSendTimestamp=1586809934,
>> ts=2020-04-13T20:32:34.507Z caller=dedupe.go:112 component=remote 
>> level=warn remote_name=5a17e1 url="
>> http://10.10.3.212:58186/api/v1/prom/write?db=prometheus"; msg="Skipping 
>> resharding, last successful send was beyond threshold" 
>> lastSendTimestamp=1586808573 minSendTimestamp=1586809944,
>> ts=2020-04-13T20:32:44.507Z caller=dedupe.go:112 component=remote 
>> level=warn remote_name=5a17e1 url="
>> http://10.10.3.212:58186/api/v1/prom/write?db=prometheus"; msg="Skipping 
>> resharding, last successful send was beyond threshold" 
>> lastSendTimestamp=1586808573 minSendTimestamp=1586809954,
>> ts=2020-04-13T20:32:54.507Z caller=dedupe.go:112 component=remote 
>> level=warn remote_name=5a17e1 url="
>> http://10.10.3.212:58186/api/v1/prom/write?db=prometheus"; msg="Skipping 
>> resharding, last successful send was beyond threshold" 
>> lastSendTimestamp=1586808573 minSendTimestamp=1586809964,
>> ts=2020-04-13T20:33:04.507Z caller=dedupe.go:112 component=remote 
>> level=warn remote_name=5a17e1 url="
>> http://10.10.3.212:58186/api/v1/prom/write?db=prometheus"; msg="Skipping 
>> resharding, last successful send was beyond threshold" 
>> lastSendTimestamp=1586808573 minSendTimestamp=1586809974,
>> ts=2020-04-13T20:33:14.507Z caller=dedupe.go:112 component=remote 
>> level=warn remote_name=5a17e1 url="
>> http://10.10.3.212:58186/api/v1/prom/write?db=prometheus"; msg="Skipping 
>> resharding, last successful send was beyond threshold" 
>> lastSendTimestamp=1586808573 minSendTimestamp=1586809984,
>> ts=2020-04-13T20:33:24.507Z caller=dedupe.go:112 component=remote 
>> level=warn remote_name=5a17e1 url="
>> http://10.10.3.212:58186/api/v1/prom/write?db=prometheus"; msg="Skipping 
>> resharding, last successful send was beyond threshold" 
>> lastSendTimestamp=1586808573 minSendTimestamp=1586809994,
>> ts=2020-04-13T20:33:34.507Z caller=dedupe.go:112 component=remote 
>> level=warn remote_name=5a17e1 url="
>> http://10.10.3.212:58186/api/v1/prom/write?db=prometheus"; msg="Skipping 
>> resharding, last successful send was beyond threshold" 
>> lastSendTimestamp=1586808573 minSendTimestamp=1586810004,
>> ts=2020-04-13T20:33:44.507Z caller=dedupe.go:112 component=remote 
>> level=warn remote_name=5a17e1 url="
>> http://10.10.3.212:58186/api/v1/prom/write?db=prometheus"; msg="Skipping 
>> resharding, last successful send was beyond threshold" 
>> lastSendTimestamp=1586808573 minSendTimestamp=1586810014,
>> ts=2020-04-13T20:33:54.507Z caller=dedupe.go:112 component=remote 
>> level=warn remote_name=5a17e1 url="
>> http://10.10.3.212:58186/api/v1/prom/write?db=prometheus"; msg="Skipping 
>> resharding, last successful send was beyond threshold" 
>> lastSendTimestamp=1586808573 minSendTimestamp=1586810024,
>> ts=2020-04-13T20:34:04.507Z caller=dedupe.go:112 component=remote 
>> level=warn remote_name=5a17e1 url="
>> http://10.10.3.212:58186/api/v1/prom/write?db=prometheus"; msg="Skipping 
>> resharding, last successful send was beyond threshold" 
>> lastSendTimestamp=1586808573 minSendTimestamp=1586810034,
>> ts=2020-04-13T20:34:24.507Z caller=dedupe.go:112 component=remote 
>> level=warn remote_name=5a17e1 url="
>> http://10.10.3.212:58186/api/v1/prom/write?db=prometheus"; msg="Skipping 
>> resharding, last successful send was beyond threshold" 
>> lastSendTimestamp=1586810048 minSendTimestamp=1586810054,
>> ts=2020-04-13T20:34:34.521Z caller=dedupe.go:112 component=remote 
>> level=warn remote_name=5a17e1 url="
>> http://10.10.3.212:58186/api/v1/prom/write?db=prometheus"; msg="Skipping 
>> resharding, last successful send was beyond threshold" 
>> lastSendTimestamp=1586810048 minSendTimestamp=1586810064,
>> ts=2020-04-13T20:34:44.507Z caller=dedupe.go:112 component=remote 
>> level=warn remote_name=5a17e1 url="
>> http://10.10.3.212:58186/api/v1/prom/write?db=prometheus"; msg="Skipping 
>> resharding, last successful send was beyond threshold" 
>> lastSendTimestamp=1586810048 minSendTimestamp=1586810074,
>> ts=2020-04-13T20:34:54.507Z caller=dedupe.go:112 component=remote 
>> level=warn remote_name=5a17e1 url="
>> http://10.10.3.212:58186/api/v1/prom/write?db=prometheus"; msg="Skipping 
>> resharding, last successful send was beyond threshold" 
>> lastSendTimestamp=1586810048 minSendTimestamp=1586810084,
>> ts=2020-04-13T20:35:04.507Z caller=dedupe.go:112 component=remote 
>> level=warn remote_name=5a17e1 url="
>> http://10.10.3.212:58186/api/v1/prom/write?db=prometheus"; msg="Skipping 
>> resharding, last successful send was beyond threshold" 
>> lastSendTimestamp=1586810048 minSendTimestamp=1586810094,
>> ts=2020-04-13T20:35:14.507Z caller=dedupe.go:112 component=remote 
>> level=warn remote_name=5a17e1 url="
>> http://10.10.3.212:58186/api/v1/prom/write?db=prometheus"; msg="Skipping 
>> resharding, last successful send was beyond threshold" 
>> lastSendTimestamp=1586810048 minSendTimestamp=1586810104,
>>
>>
>> The logs on the other instance are a bit more diluted in time:
>>
>> ts=2020-04-13T19:23:35.283Z caller=dedupe.go:112 component=remote 
>> level=warn remote_name=5a17e1 url="
>> http://10.10.3.212:58186/api/v1/prom/write?db=prometheus"; msg="Skipping 
>> resharding, last successful send was beyond threshold" 
>> lastSendTimestamp=1586805515 minSendTimestamp=1586805805,
>> ts=2020-04-13T19:23:45.283Z caller=dedupe.go:112 component=remote 
>> level=warn remote_name=5a17e1 url="
>> http://10.10.3.212:58186/api/v1/prom/write?db=prometheus"; msg="Skipping 
>> resharding, last successful send was beyond threshold" 
>> lastSendTimestamp=1586805515 minSendTimestamp=1586805815,
>> ts=2020-04-13T19:23:55.283Z caller=dedupe.go:112 component=remote 
>> level=warn remote_name=5a17e1 url="
>> http://10.10.3.212:58186/api/v1/prom/write?db=prometheus"; msg="Skipping 
>> resharding, last successful send was beyond threshold" 
>> lastSendTimestamp=1586805515 minSendTimestamp=1586805825,
>> ts=2020-04-13T19:24:05.283Z caller=dedupe.go:112 component=remote 
>> level=warn remote_name=5a17e1 url="
>> http://10.10.3.212:58186/api/v1/prom/write?db=prometheus"; msg="Skipping 
>> resharding, last successful send was beyond threshold" 
>> lastSendTimestamp=1586805515 minSendTimestamp=1586805835,
>> ts=2020-04-13T19:24:15.283Z caller=dedupe.go:112 component=remote 
>> level=warn remote_name=5a17e1 url="
>> http://10.10.3.212:58186/api/v1/prom/write?db=prometheus"; msg="Skipping 
>> resharding, last successful send was beyond threshold" 
>> lastSendTimestamp=1586805515 minSendTimestamp=1586805845,
>> ts=2020-04-13T19:24:25.283Z caller=dedupe.go:112 component=remote 
>> level=warn remote_name=5a17e1 url="
>> http://10.10.3.212:58186/api/v1/prom/write?db=prometheus"; msg="Skipping 
>> resharding, last successful send was beyond threshold" 
>> lastSendTimestamp=1586805515 minSendTimestamp=1586805855,
>>
>>
>> Our current remote write configuration is as follows:
>>
>> remote_write:
>>   - url: http://10.10.3.212:8428/api/v1/write
>>     queue_config:
>>       max_samples_per_send: 10000
>>   - url: "http://10.10.3.212:58186/api/v1/prom/write?db=prometheus";
>>     write_relabel_configs:
>>       - source_labels: [__name__, check]
>>         regex: "xxxx_xx"
>>         action: keep
>>
>>
>>
>> Since Remote write tuning 
>> <https://prometheus.io/docs/practices/remote_write/> documentation says 
>> that "Prometheus implements sane defaults for remote write" should I just 
>> remove the setting for max_samples_per_send or are there any other 
>> advice that applies here?
>> I skimmed the linked page but apart from adjusting capacity on the basis 
>> of the chosen max_samples_per_send, I'm not sure whatever else can 
>> really help here.
>>
>> Any advice really appreciated.
>> Thanks in advance,
>> F.
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Prometheus Users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/prometheus-users/978d0750-db67-45ef-89fd-90d25e5ba793%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/prometheus-users/978d0750-db67-45ef-89fd-90d25e5ba793%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/e10e0ec4-4ba7-4487-8eed-60de79b18338%40googlegroups.com.

Reply via email to