If a remote write receiver is unable to ingest, wouldn't this be something
to fix on the receiver side? The receiver could have a policy where it
drops data rather than returning an error.

This way Prometheus sends, but doesn't have to need to know or deal with
ingestion policies. It sends a bit more data over the wire, but that part
is cheap compared to the ingestion costs.

On Mon, Mar 1, 2021 at 11:13 AM Stuart Clark <[email protected]>
wrote:

> On 01/03/2021 07:25, Harkishen Singh wrote:
> > Hi Tom,
> >
> > I have tried to answer the comments. Please comment on their
> > satisfactoriness. I am happy for a call if required (or discussion
> > gets tough).
> >
> > I think, the lossless nature can be controlled by the user based on
> > the config (limit_retries), and let the users have more control, as to
> > whether they are happy to compromise a bit, if the retry is too much,
> > since as such, if the retrying happens forever, then I don't think
> > that is helpful (it will never be accepted by the remote storage).
> > Also as Chris mentioned, some users might prefer to have few gaps and
> > give more priority to recent data, like for alerting. So, I think this
> > approach gives more flexibility to the user, at the same time, making
> > it optional (or by setting the retry count high enough).
> >
> Under what situations would retries happen forever?
>
> If the receiver is available but cannot accept the data (for example due
> to metric size limits or age of the samples) I would expect it to reject
> with a 4XX code (permanent failure) which wouldn't trigger any retries.
>
> Alternatively if the receiver is either unavailable or broken it could
> result in "infinite" retries, but in that situation it feels like an age
> based limit instead of retry limit would be better - a short retry limit
> will reject samples that have just been scraped just as quickly as
> samples that are days old. Instead it sounds like an age based limit
> would be better - some systems have restrictions over what age can be
> ingested (e.g. Timestream) or administrators could decide older data has
> no usefulness (e.g. if the receiver is used for alerting or anomaly
> detection. While the system should still reject such old samples once it
> is working again a time based limit would at least reduce the network
> impact once the receiver is back online (no need to send tons of data
> that we know will be rejected).
>
> --
> Stuart Clark
>
> --
> You received this message because you are subscribed to the Google Groups
> "Prometheus Developers" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/prometheus-developers/cd97f615-e479-e4be-e85d-672b15c337d8%40Jahingo.com
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/CABbyFmpOC8EPAnHsj0Zyh5JSworYLciDL6nCXyzSSnHAX981RA%40mail.gmail.com.

Reply via email to