On Tuesday, April 21, 2020 at 9:39:49 PM UTC+5:30, Brian Brazil wrote: > > On Tue, 21 Apr 2020 at 16:48, 'GANESH VERNEKAR' via Prometheus Developers < > [email protected] <javascript:>> wrote: > >> >> >> On Tuesday, April 21, 2020 at 8:59:12 PM UTC+5:30, Brian Brazil wrote: >>> >>> On Tue, 21 Apr 2020 at 15:17, 'GANESH VERNEKAR' via Prometheus >>> Developers <[email protected]> wrote: >>> >>>> Hi All, >>>> >>>> I would like to propose a callback for the TSDB for series creation and >>>> deletion. This would help external projects like Cortex and Thanos in >>>> implementing limits on series while having no performance impact on >>>> Prometheus with minimal code addition. >>>> >>>> The callback would look something like this (names TBD) with no >>>> performance impact on TSDB. >>>> >>>> >>>> type SeriesLifecycleCallback interface { >>>> >>>> PreCreationCallback(labels.Labels) error >>>> >>>> PostCreationCallback(labels.Labels) >>>> >>>> PostDeletionCallback(labels.Labels) >>>> >>>> } >>>> >>>> *1. SeriesLifecycleCallback* >>>> >>>> It would be a part of DB.Options >>>> <https://github.com/prometheus/prometheus/blob/fc3fb3265a0f0061d07d0bf5fb4103e2144565b2/tsdb/db.go#L73-L116> >>>> >>>> *2. PreCreationCallback* >>>> >>>> It would be called just before holding the lock here >>>> <https://github.com/prometheus/prometheus/blob/fc3fb3265a0f0061d07d0bf5fb4103e2144565b2/tsdb/head.go#L1654>. >>>> >>>> And if the method returns an error, it means we don’t want to create a new >>>> series. >>>> >>>> Further the hashes would be checked as usual >>>> <https://github.com/prometheus/prometheus/blob/fc3fb3265a0f0061d07d0bf5fb4103e2144565b2/tsdb/head.go#L1656-L1659>, >>>> >>>> and if we reach the step of adding a new hash >>>> <https://github.com/prometheus/prometheus/blob/fc3fb3265a0f0061d07d0bf5fb4103e2144565b2/tsdb/head.go#L1660>, >>>> >>>> it would be either executed or skipped based on the result of >>>> `PreCreationCallback`. >>>> >>>> If the series creation was skipped, the error returned by >>>> `PreCreationCallback` would be propagated back into the call stack and the >>>> `Add` or `AddFast` method would finally return back the error. (This means >>>> additional return values for the methods in that call stack). >>>> >>>> *3. PostCreationCallback* >>>> >>>> This would be called after releasing the lock here >>>> <https://github.com/prometheus/prometheus/blob/fc3fb3265a0f0061d07d0bf5fb4103e2144565b2/tsdb/head.go#L1661> >>>> >>>> letting the caller know that the series was created. >>>> >>>> *4. PostDeletionCallback * >>>> >>>> >>>> It would be called after deleting a series during garbage collection here >>>> (after unlocking) >>>> <https://github.com/prometheus/prometheus/blob/fc3fb3265a0f0061d07d0bf5fb4103e2144565b2/tsdb/head.go#L1625> >>>> >>>> so that external projects can do the bookkeeping. >>>> >>>> ---- >>>> >>>> In the context of Prometheus, the interface would be a noop, hence no >>>> performance impact. >>>> >>> >>> It'd still cost a few cycles. >>> >> >> But it would be very less to even notice. >> >> >>> >>>> >>>> *Some reasoning behind the design decisions and what guarantees it >>>> provides:* >>>> >>>> The idea was to avoid having the callbacks inside the locked section >>>> which could lead to performance degradation. Hence the Series creation >>>> callback was split into 2 methods. PreCreationCallback method tells >>>> whether to create the series if we need to create and the next method >>>> `PostCreationCallback` to let the user know if it was created, so that >>>> appropriate bookkeeping can be done. >>>> >>>> Since all the callbacks happen outside the locks (even if it was inside >>>> the lock, there could be concurrent calls for different shards of the >>>> stripe series), the user can expect some soft consistencies here. For >>>> example, the PreCreationCallback could be called concurrently by >>>> multiple series creation when you are at the edge of the series limit and >>>> all the series creation would be allowed, which might cause the limit to >>>> cross by a small number. Similarly during the deletion phase: some >>>> creation >>>> of series would fail as the deletion callback is called a little later. >>>> Hence the user is required to expect soft consistencies while using these >>>> callbacks. >>>> >>>> What do you all think about this addition in TSDB? >>>> >>> >>> This seems to require adding error handling for cases where Prometheus >>> cannot have errors, what happens to a scrape or rule evaluation when this >>> fails? >>> >> >> The Add >> <https://github.com/prometheus/prometheus/blob/4b5e7d4984a135abd3729b9a2a5904336d35bc21/tsdb/head.go#L931> >> >> and AddFast >> <https://github.com/prometheus/prometheus/blob/4b5e7d4984a135abd3729b9a2a5904336d35bc21/tsdb/head.go#L957> >> >> methods already returns an error. So failure in series creation would >> follow the same procedure. But thinking about it more, it will be a no-op >> for Prometheus, and I don't think the external users who would use this >> feature will have scraping in it (considering it as a "TSDB" package and >> not "Prometheus" package). >> >> >>> I'm also not sure it's considering that the semantics of how this is all >>> managed inside the tsdb is not what you might expect. For example a failed >>> scrape due to hitting the sample limit can still cause series to be >>> created, which usually doesn't come up but might now. >>> >> >> Yes, all the changes in TSDB are not totally explored yet, I will work on >> a draft to see what all needs changing. >> > > > >> Additionally, series is still created currently if the appender Commit() >> fails (a failed scrape?). >> > > A failed scrape would be a Rollback. A WAL error is what would cause a > Commit to fail, which is likely a full filesystem. >
Yes, but series is still created in case of Rollback and not deleted till next compaction. > -- > Brian Brazil > www.robustperception.io > -- You received this message because you are subscribed to the Google Groups "Prometheus Developers" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/ba8f5815-aeae-413a-bd9f-3131e54d1265%40googlegroups.com.

