Hey Stuart,

Thank you for your suggestion.

Yes, I think an age-based can be implemented as well. I think we should 
keep both max retry and age limit. Age limit would be helpful for 
time-based remote-storages, and non-age based can be in general (like non 
time-based storage systems) that will help in situations of too much 
congestion in the network.

On Monday, March 1, 2021 at 3:43:21 PM UTC+5:30 Stuart Clark wrote:

> On 01/03/2021 07:25, Harkishen Singh wrote:
> > Hi Tom,
> >
> > I have tried to answer the comments. Please comment on their 
> > satisfactoriness. I am happy for a call if required (or discussion 
> > gets tough).
> >
> > I think, the lossless nature can be controlled by the user based on 
> > the config (limit_retries), and let the users have more control, as to 
> > whether they are happy to compromise a bit, if the retry is too much, 
> > since as such, if the retrying happens forever, then I don't think 
> > that is helpful (it will never be accepted by the remote storage). 
> > Also as Chris mentioned, some users might prefer to have few gaps and 
> > give more priority to recent data, like for alerting. So, I think this 
> > approach gives more flexibility to the user, at the same time, making 
> > it optional (or by setting the retry count high enough).
> >
> Under what situations would retries happen forever?
>
> If the receiver is available but cannot accept the data (for example due 
> to metric size limits or age of the samples) I would expect it to reject 
> with a 4XX code (permanent failure) which wouldn't trigger any retries.
>
> Alternatively if the receiver is either unavailable or broken it could 
> result in "infinite" retries, but in that situation it feels like an age 
> based limit instead of retry limit would be better - a short retry limit 
> will reject samples that have just been scraped just as quickly as 
> samples that are days old. Instead it sounds like an age based limit 
> would be better - some systems have restrictions over what age can be 
> ingested (e.g. Timestream) or administrators could decide older data has 
> no usefulness (e.g. if the receiver is used for alerting or anomaly 
> detection. While the system should still reject such old samples once it 
> is working again a time based limit would at least reduce the network 
> impact once the receiver is back online (no need to send tons of data 
> that we know will be rejected).
>
> -- 
> Stuart Clark
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/ea98ba09-928f-46d3-8618-f875f984fe48n%40googlegroups.com.

Reply via email to