Hey Stuart, Thank you for your suggestion.
Yes, I think an age-based can be implemented as well. I think we should keep both max retry and age limit. Age limit would be helpful for time-based remote-storages, and non-age based can be in general (like non time-based storage systems) that will help in situations of too much congestion in the network. On Monday, March 1, 2021 at 3:43:21 PM UTC+5:30 Stuart Clark wrote: > On 01/03/2021 07:25, Harkishen Singh wrote: > > Hi Tom, > > > > I have tried to answer the comments. Please comment on their > > satisfactoriness. I am happy for a call if required (or discussion > > gets tough). > > > > I think, the lossless nature can be controlled by the user based on > > the config (limit_retries), and let the users have more control, as to > > whether they are happy to compromise a bit, if the retry is too much, > > since as such, if the retrying happens forever, then I don't think > > that is helpful (it will never be accepted by the remote storage). > > Also as Chris mentioned, some users might prefer to have few gaps and > > give more priority to recent data, like for alerting. So, I think this > > approach gives more flexibility to the user, at the same time, making > > it optional (or by setting the retry count high enough). > > > Under what situations would retries happen forever? > > If the receiver is available but cannot accept the data (for example due > to metric size limits or age of the samples) I would expect it to reject > with a 4XX code (permanent failure) which wouldn't trigger any retries. > > Alternatively if the receiver is either unavailable or broken it could > result in "infinite" retries, but in that situation it feels like an age > based limit instead of retry limit would be better - a short retry limit > will reject samples that have just been scraped just as quickly as > samples that are days old. Instead it sounds like an age based limit > would be better - some systems have restrictions over what age can be > ingested (e.g. Timestream) or administrators could decide older data has > no usefulness (e.g. if the receiver is used for alerting or anomaly > detection. While the system should still reject such old samples once it > is working again a time based limit would at least reduce the network > impact once the receiver is back online (no need to send tons of data > that we know will be rejected). > > -- > Stuart Clark > > -- You received this message because you are subscribed to the Google Groups "Prometheus Developers" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/ea98ba09-928f-46d3-8618-f875f984fe48n%40googlegroups.com.

