Hi All,

I would like to propose a callback for the TSDB for series creation and 
deletion. This would help external projects like Cortex and Thanos in 
implementing limits on series while having no performance impact on 
Prometheus with minimal code addition.

The callback would look something like this (names TBD) with no performance 
impact on TSDB.


type SeriesLifecycleCallback interface {

PreCreationCallback(labels.Labels) error

PostCreationCallback(labels.Labels)

PostDeletionCallback(labels.Labels)

}

*1. SeriesLifecycleCallback*

It would be a part of DB.Options 
<https://github.com/prometheus/prometheus/blob/fc3fb3265a0f0061d07d0bf5fb4103e2144565b2/tsdb/db.go#L73-L116>

*2. PreCreationCallback*

It would be called just before holding the lock here 
<https://github.com/prometheus/prometheus/blob/fc3fb3265a0f0061d07d0bf5fb4103e2144565b2/tsdb/head.go#L1654>.
 
And if the method returns an error, it means we don’t want to create a new 
series. 

Further the hashes would be checked as usual 
<https://github.com/prometheus/prometheus/blob/fc3fb3265a0f0061d07d0bf5fb4103e2144565b2/tsdb/head.go#L1656-L1659>,
 
and if we reach the step of adding a new hash 
<https://github.com/prometheus/prometheus/blob/fc3fb3265a0f0061d07d0bf5fb4103e2144565b2/tsdb/head.go#L1660>,
 
it would be either executed or skipped based on the result of 
`PreCreationCallback`.

If the series creation was skipped, the error returned by 
`PreCreationCallback` would be propagated back into the call stack and the 
`Add` or `AddFast` method would finally return back the error. (This means 
additional return values for the methods in that call stack).

*3. PostCreationCallback*

This would be called after releasing the lock here 
<https://github.com/prometheus/prometheus/blob/fc3fb3265a0f0061d07d0bf5fb4103e2144565b2/tsdb/head.go#L1661>
 
letting the caller know that the series was created.

*4. PostDeletionCallback *


It would be called after deleting a series during garbage collection here 
(after unlocking) 
<https://github.com/prometheus/prometheus/blob/fc3fb3265a0f0061d07d0bf5fb4103e2144565b2/tsdb/head.go#L1625>
 
so that external projects can do the bookkeeping.

----

In the context of Prometheus, the interface would be a noop, hence no 
performance impact.

*Some reasoning behind the design decisions and what guarantees it 
provides:*

The idea was to avoid having the callbacks inside the locked section which 
could lead to performance degradation. Hence the Series creation callback 
was split into 2 methods. PreCreationCallback method tells whether to 
create the series if we need to create and the next method 
`PostCreationCallback` to let the user know if it was created, so that 
appropriate bookkeeping can be done.

Since all the callbacks happen outside the locks (even if it was inside the 
lock, there could be concurrent calls for different shards of the stripe 
series), the user can expect some soft consistencies here. For example, the 
PreCreationCallback could be called concurrently by multiple series 
creation when you are at the edge of the series limit and all the series 
creation would be allowed, which might cause the limit to cross by a small 
number. Similarly during the deletion phase: some creation of series would 
fail as the deletion callback is called a little later. Hence the user is 
required to expect soft consistencies while using these callbacks.

What do you all think about this addition in TSDB?

Thanks,

Ganesh

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/8e9cd9e7-fe23-4564-b525-235af38f24e6%40googlegroups.com.

Reply via email to