Hi 
I have been facing an issue with our prometheus (installed from helm). 
After the implementation of the remoteWrite towards our metricbeat we 
receive just like 5 mins of metrics and after that the connection stops. I 
can that the prometheus_remote_storage_shards are reaching their max shards 
and in the logs of the operator I see:

*2021-03-04T17:56:05.977Z caller=dedupe.go:111 component=remote level=info 
remote_name=8a9071 url=http://prometheusmetricbeataks:9201/write 
msg="Remote storage resharding" from=3 to=7*
*2021-03-04T17:56:05.977Z caller=dedupe.go:111 component=remote level=debug 
remote_name=8a9071 url=http://prometheusmetricbeataks:9201/write 
msg="Flushing samples to remote storage..." count=8577*
*2021-03-04T17:56:05.977Z caller=dedupe.go:111 component=remote level=debug 
remote_name=8a9071 url=http://prometheusmetricbeataks:9201/write 
msg="Flushing samples to remote storage..." count=8580*
*2021-03-04T17:56:05.988Z caller=dedupe.go:111 component=remote level=debug 
remote_name=8a9071 url=http://prometheusmetricbeataks:9201/write 
msg="Flushing samples to remote storage..." count=8585*
*2021-03-04T17:56:15.976Z caller=dedupe.go:111 component=remote level=warn 
remote_name=8a9071 url=http://prometheusmetricbeataks:9201/write 
msg="Skipping resharding, last successful send was beyond threshold" 
lastSendTimestamp=161488*
*0564 minSendTimestamp=1614880565*
*2021-03-04T17:56:25.976Z caller=dedupe.go:111 component=remote level=warn 
remote_name=8a9071 url=http://prometheusmetricbeataks:9201/write 
msg="Skipping resharding, last successful send was beyond threshold" 
lastSendTimestamp=161488*
*0564 minSendTimestamp=1614880575*
*2021-03-04T17:56:35.977Z caller=dedupe.go:111 component=remote level=warn 
remote_name=8a9071 url=http://prometheusmetricbeataks:9201/write 
msg="Skipping resharding, last successful send was beyond threshold" 
lastSendTimestamp=161488*
*0564 minSendTimestamp=1614880585*
*2021-03-04T17:56:45.976Z caller=dedupe.go:111 component=remote level=warn 
remote_name=8a9071 url=http://prometheusmetricbeataks:9201/write 
msg="Skipping resharding, last successful send was beyond threshold" 
lastSendTimestamp=161488*
*0564 minSendTimestamp=1614880595*
*2021-03-04T17:56:55.976Z caller=dedupe.go:111 component=remote level=warn 
remote_name=8a9071 url=http://prometheusmetricbeataks:9201/write 
msg="Skipping resharding, last successful send was beyond threshold" 
lastSendTimestamp=161488*
*0564 minSendTimestamp=1614880605*
*2021-03-04T17:57:02.039Z caller=dedupe.go:111 component=remote level=debug 
remote_name=8a9071 url=http://prometheusmetricbeataks:9201/write msg="Done 
flushing."*
*2021-03-04T17:57:05.976Z caller=dedupe.go:111 component=remote level=debug 
remote_name=8a9071 url=http://prometheusmetricbeataks:9201/write 
msg=QueueManager.calculateDesiredShards samplesInRate=1684.132190739004 
samplesOutRate=261.7*
*8113035956926 samplesKeptRatio=0.444250380874013 
samplesPendingRate=486.3952368184191 samplesPending=44890.5820306793 
samplesOutDuration=1.4510985561508776 timePerSample=0.005543174766484209 
desiredShards=11.527856913093473 highestSent=1.614880565e+09 
highestRecv=1.614880625e+09 inte*
*gralAccumulator=395.5166384472137*
*2021-03-04T17:57:05.977Z caller=dedupe.go:111 component=remote level=debug 
remote_name=8a9071 url=http://prometheusmetricbeataks:9201/write 
msg=QueueManager.updateShardsLoop lowerBound=4.8999999999999995 
desiredShards=11.52785691309*
*3473 upperBound=9.1*
*2021-03-04T17:57:05.977Z caller=dedupe.go:111 component=remote level=info 
remote_name=8a9071 url=http://prometheusmetricbeataks:9201/write 
msg="Currently resharding, skipping."*
*2021-03-04T17:57:05.977Z caller=dedupe.go:111 component=remote level=error 
remote_name=8a9071 url=http://prometheusmetricbeataks:9201/write 
msg="Failed to flush all samples on shutdown"*

We have ChartName": "prometheus-operator" and "helmChartVersion": "8.12.3", 
that installs prometheus version 2.15.2 and the config of the remote write 
is:

*remoteWrite:*
*      - url: " http://prometheusmetricbeataks:9201/write  "*
*        writeRelabelConfigs:*
*        - sourceLabels: [observability]*
*          regex: 'true'*
*          action: keep*
*        queueConfig:*
*          capacity: 3000*
*          maxSamplesPerSend: 1000*
*          maxShards: 2000*

I have been reading a bit and there are different views over the 
queueConfig paramethers but none of them have been good to our case. Did 
this happen to any of you ?

Thanks in advace for your help
Ruben

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/077e468b-bd07-4cf4-8e6b-85610486973dn%40googlegroups.com.

Reply via email to