On Tuesday, April 21, 2020 at 9:39:49 PM UTC+5:30, Brian Brazil wrote:
>
> On Tue, 21 Apr 2020 at 16:48, 'GANESH VERNEKAR' via Prometheus Developers <
> [email protected] <javascript:>> wrote:
>
>>
>>
>> On Tuesday, April 21, 2020 at 8:59:12 PM UTC+5:30, Brian Brazil wrote:
>>>
>>> On Tue, 21 Apr 2020 at 15:17, 'GANESH VERNEKAR' via Prometheus 
>>> Developers <[email protected]> wrote:
>>>
>>>> Hi All,
>>>>
>>>> I would like to propose a callback for the TSDB for series creation and 
>>>> deletion. This would help external projects like Cortex and Thanos in 
>>>> implementing limits on series while having no performance impact on 
>>>> Prometheus with minimal code addition.
>>>>
>>>> The callback would look something like this (names TBD) with no 
>>>> performance impact on TSDB.
>>>>
>>>>
>>>> type SeriesLifecycleCallback interface {
>>>>
>>>> PreCreationCallback(labels.Labels) error
>>>>
>>>> PostCreationCallback(labels.Labels)
>>>>
>>>> PostDeletionCallback(labels.Labels)
>>>>
>>>> }
>>>>
>>>> *1. SeriesLifecycleCallback*
>>>>
>>>> It would be a part of DB.Options 
>>>> <https://github.com/prometheus/prometheus/blob/fc3fb3265a0f0061d07d0bf5fb4103e2144565b2/tsdb/db.go#L73-L116>
>>>>
>>>> *2. PreCreationCallback*
>>>>
>>>> It would be called just before holding the lock here 
>>>> <https://github.com/prometheus/prometheus/blob/fc3fb3265a0f0061d07d0bf5fb4103e2144565b2/tsdb/head.go#L1654>.
>>>>  
>>>> And if the method returns an error, it means we don’t want to create a new 
>>>> series. 
>>>>
>>>> Further the hashes would be checked as usual 
>>>> <https://github.com/prometheus/prometheus/blob/fc3fb3265a0f0061d07d0bf5fb4103e2144565b2/tsdb/head.go#L1656-L1659>,
>>>>  
>>>> and if we reach the step of adding a new hash 
>>>> <https://github.com/prometheus/prometheus/blob/fc3fb3265a0f0061d07d0bf5fb4103e2144565b2/tsdb/head.go#L1660>,
>>>>  
>>>> it would be either executed or skipped based on the result of 
>>>> `PreCreationCallback`.
>>>>
>>>> If the series creation was skipped, the error returned by 
>>>> `PreCreationCallback` would be propagated back into the call stack and the 
>>>> `Add` or `AddFast` method would finally return back the error. (This means 
>>>> additional return values for the methods in that call stack).
>>>>
>>>> *3. PostCreationCallback*
>>>>
>>>> This would be called after releasing the lock here 
>>>> <https://github.com/prometheus/prometheus/blob/fc3fb3265a0f0061d07d0bf5fb4103e2144565b2/tsdb/head.go#L1661>
>>>>  
>>>> letting the caller know that the series was created.
>>>>
>>>> *4. PostDeletionCallback *
>>>>
>>>>
>>>> It would be called after deleting a series during garbage collection here 
>>>> (after unlocking) 
>>>> <https://github.com/prometheus/prometheus/blob/fc3fb3265a0f0061d07d0bf5fb4103e2144565b2/tsdb/head.go#L1625>
>>>>  
>>>> so that external projects can do the bookkeeping.
>>>>
>>>> ----
>>>>
>>>> In the context of Prometheus, the interface would be a noop, hence no 
>>>> performance impact.
>>>>
>>>
>>> It'd still cost a few cycles.
>>>
>>
>> But it would be very less to even notice.
>>
>>  
>>>
>>>>
>>>> *Some reasoning behind the design decisions and what guarantees it 
>>>> provides:*
>>>>
>>>> The idea was to avoid having the callbacks inside the locked section 
>>>> which could lead to performance degradation. Hence the Series creation 
>>>> callback was split into 2 methods. PreCreationCallback method tells 
>>>> whether to create the series if we need to create and the next method 
>>>> `PostCreationCallback` to let the user know if it was created, so that 
>>>> appropriate bookkeeping can be done.
>>>>
>>>> Since all the callbacks happen outside the locks (even if it was inside 
>>>> the lock, there could be concurrent calls for different shards of the 
>>>> stripe series), the user can expect some soft consistencies here. For 
>>>> example, the PreCreationCallback could be called concurrently by 
>>>> multiple series creation when you are at the edge of the series limit and 
>>>> all the series creation would be allowed, which might cause the limit to 
>>>> cross by a small number. Similarly during the deletion phase: some 
>>>> creation 
>>>> of series would fail as the deletion callback is called a little later. 
>>>> Hence the user is required to expect soft consistencies while using these 
>>>> callbacks.
>>>>
>>>> What do you all think about this addition in TSDB?
>>>>
>>>
>>> This seems to require adding error handling for cases where Prometheus 
>>> cannot have errors, what happens to a scrape or rule evaluation when this 
>>> fails?
>>>
>>
>> The Add 
>> <https://github.com/prometheus/prometheus/blob/4b5e7d4984a135abd3729b9a2a5904336d35bc21/tsdb/head.go#L931>
>>  
>> and AddFast 
>> <https://github.com/prometheus/prometheus/blob/4b5e7d4984a135abd3729b9a2a5904336d35bc21/tsdb/head.go#L957>
>>  
>> methods already returns an error. So failure in series creation would 
>> follow the same procedure. But thinking about it more, it will be a no-op 
>> for Prometheus, and I don't think the external users who would use this 
>> feature will have scraping in it (considering it as a "TSDB" package and 
>> not "Prometheus" package).
>>  
>>
>>> I'm also not sure it's considering that the semantics of how this is all 
>>> managed inside the tsdb is not what you might expect. For example a failed 
>>> scrape due to hitting the sample limit can still cause series to be 
>>> created, which usually doesn't come up but might now. 
>>>
>>
>> Yes, all the changes in TSDB are not totally explored yet, I will work on 
>> a draft to see what all needs changing. 
>>
>
>  
>
>> Additionally, series is still created currently if the appender Commit() 
>> fails (a failed scrape?).
>>
>
> A failed scrape would be a Rollback. A WAL error is what would cause a 
> Commit to fail, which is likely a full filesystem.
>

Yes, but series is still created in case of Rollback and not deleted till 
next compaction.


> -- 
> Brian Brazil
> www.robustperception.io
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/ba8f5815-aeae-413a-bd9f-3131e54d1265%40googlegroups.com.

Reply via email to