Re: [prometheus-users] Seeking advice on label values for Prometheus Metrics in Kubernetes

Peter Nguyễn Fri, 18 Aug 2023 04:18:22 -0700

Thank Ben for your tips on tuning GOGC.

In regards to the question why Prometheus memory does not go back to the 
initial setup even after inactive time series have been swiped away from 
TSDB when reaching retention time, do you have any comment?
On Friday, August 18, 2023 at 3:54:07 PM UTC+7 Ben Kochie wrote:


> And if you look, GC kicked in just after 15:20 to reduce the RSS from 
> 10GiB to a little over 8GiB. In your 3rd example, you're running with about 
> 3.5KiB of memory per head series. This is perfectly normal and within 
> expected results.
>
> Again, this is all related to Go memory garbage collection. The Go VM does 
> what it does.
>
> There are some tunables. For example, we found that in our larger 
> environment that GOGC=50 is more appropriate for our workloads compared to 
> the Go default of GOGC=100. This should reduce the RSS to around 1.5x the 
> go_memstats_alloc_bytes.
>
> On Fri, Aug 18, 2023 at 10:29 AM Peter Nguyễn <[email protected]> wrote:
>
>> > 2 million series is no big deal, should only take a few extra gigabytes 
>> of memory. This is not a huge amount and well within Prometheus capability. 
>>
>> 1) I have performed another test with 1M active timeseries. The memory 
>> usage of Prometheus with 1M is around 3Bil on my env. I then restarted 
>> target at around 18:10; The number of time series in HEAD block now jumped 
>> up to 2M, and the RAM usage was around 5Bil, *66% increase* compared to the 
>> prior point.
>>
>> [image: prometheus_instance_ip_port_concern_latest_v3.jpg]
>>
>> Looking at `go_memstats_alloc_bytes`, the number of allocated bytes go 
>> down at HEAD truncation but the Prometheus seems did not.
>>
>> 2) I then left the deployment running over night to see if the memory 
>> would go back to the previous low point or not. Here is what I got:
>>
>> [image: prometheus_instance_ip_port_concern_latest_v5.jpg]
>>
>> a) It seems that the memory did not go back to its 3Bil. I set retention 
>> time to 4h, inactive time series should be swiped out. I am confused why 
>> the memory does not return to its low point. Do Prometheus keep any info 
>> related to inactive time series in memory?
>>
>> b) When I performed target restart again at 09:38, the memory keeps 
>> jumping up. Now, the current value is at 6.7Bil, almost 100% increase 
>> compared to the previous value.
>>
>> 3) When I restarted the target one more time while HEAD block is not 
>> truncated yet, the memory jumps up to 10Bil. This is a huge memory 
>> increased to us comparing to the starting point.
>>
>> [image: prometheus_instance_ip_port_concern_latest_v6.jpg]
>> On Thursday, August 17, 2023 at 10:34:52 AM UTC+7 Ben Kochie wrote:
>>
>>> On Thu, Aug 17, 2023 at 4:42 AM Peter Nguyễn <[email protected]> wrote:
>>>
>>>> Thanks for your replies.
>>>>
>>>> > There is nothing to handle, the instance/pod IP is required for 
>>>> uniqueness tracking. Different instances of the same pod need to be 
>>>> tracked 
>>>> individually. In addition, most Deployment pods are going to get new 
>>>> generated pod names every time anyway.
>>>>
>>>> Then if we have a deployment with a large number of active time series 
>>>> like 01 million, every upgrade or fallback of the deployment would cause a 
>>>> significant memory increase because of the number time series is doubled, 
>>>> 02 millions in this case and Prometheus would get OOM if we don't reserve 
>>>> a 
>>>> huge memory for that scenario.
>>>>
>>>
>>> 2 million series is no big deal, should only take a few extra gigabytes 
>>> of memory. This is not a huge amount and well within Prometheus capability.
>>>
>>> For reference, I have deployments that generate more than 10M series and 
>>> can use upwards of 200GiB of memory when we go through a number of deploys 
>>> quickly. After things settle down, the memory is released, but it does take 
>>> a number of hours.
>>>
>>>
>>>> > Prometheus compacts memory every 2 hours, so old data is flushed out 
>>>> of memory.
>>>>
>>>> I have re-run the test with Prometheus's latest version, v.2.46.0, 
>>>> capturing Prometheus memory using container_memory_rss metric. To me, 
>>>> it looks like the memory is not dropped after cutting HEAD to persistent 
>>>> block. 
>>>>
>>>
>>>> [image: prometheus_instance_ip_port_concern_latest.jpg]
>>>>
>>>> Do you think it is expected? If yes, could you please share with us why 
>>>> the Memory is not freed up for inactive time series that are no longer in 
>>>> the HEAD block?
>>>>
>>>
>>> It will. Prometheus is written in Go, which is a garbage collected 
>>> language. It will release RSS memory as it needs to. You can see what Go is 
>>> currently using with go_memstats_alloc_bytes.
>>>  
>>>
>>>> On Wednesday, August 16, 2023 at 6:15:35 PM UTC+7 Ben Kochie wrote:
>>>>
>>>>> FYI, container_memory_working_set_bytes is a misleading metric. It 
>>>>> includes page cache memory, which can be unallocated any time, but 
>>>>> improves 
>>>>> performance of queries.
>>>>>
>>>>> If you want to know the real memory use, I would recommend using 
>>>>> container_memory_rss
>>>>>
>>>>> On Wed, Aug 16, 2023 at 9:31 AM Peter Nguyễn <[email protected]> wrote:
>>>>>
>>>>>> Hi Prometheus experts,
>>>>>>
>>>>>> I have a Prometheus Pod (v2.40.7) running on our Kubernetes (k8s) 
>>>>>> cluster for metric scraping from multiple k8s targets.
>>>>>>
>>>>>> Recently, I have observed that whenever I restart a target (a k8s 
>>>>>> Pod) or perform a Helm upgrade, the memory consumption of Prometheus 
>>>>>> keeps 
>>>>>> increasing. After investigating, I discovered that each time the pod 
>>>>>> gets 
>>>>>> restarted, new set of time series from that target is generated due to 
>>>>>> dynamic values of `instance` and `pod_name`.
>>>>>>
>>>>>> The instance label value we use is in the format <pod_IP>:port, and 
>>>>>> `pod_name` label value is the pod name. Consequently, whenever a Pod is 
>>>>>> restarted, it receives a new allocated IP address, and a new pod name 
>>>>>> (if 
>>>>>> not statefulset's Pod) resulting in new values for the instance & 
>>>>>> pod_name 
>>>>>> label. 
>>>>>>
>>>>>> When comes to HEAD truncation, and the number of time series in the 
>>>>>> HEAD block goes back to the previous low value, Prometheus memory still 
>>>>>> does not go back to the point before the target restarted. Here is the 
>>>>>> graph:
>>>>>>
>>>>>> [image: prometheus_instance_ip_port_concern.jpg]
>>>>>>
>>>>>> I am writing to seek advice on the best practices for handling these 
>>>>>> label values, particularly for the instance. Do you have any advice on 
>>>>>> what 
>>>>>> value format should be for those labels so we ge rid of the memory 
>>>>>> increased every time pod gets restarted? Any time e.g. after retention 
>>>>>> triggered, the memory would go back to the previous point? 
>>>>>>
>>>>>> Regards, Vu 
>>>>>>
>>>>>> -- 
>>>>>> You received this message because you are subscribed to the Google 
>>>>>> Groups "Prometheus Users" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>> send an email to [email protected].
>>>>>> To view this discussion on the web visit 
>>>>>> https://groups.google.com/d/msgid/prometheus-users/27961908-8362-42a7-b1ce-ab27dcece7b1n%40googlegroups.com
>>>>>>  
>>>>>> <https://groups.google.com/d/msgid/prometheus-users/27961908-8362-42a7-b1ce-ab27dcece7b1n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>> .
>>>>>>
>>>>> -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "Prometheus Users" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to [email protected].
>>>>
>>> To view this discussion on the web visit 
>>>> https://groups.google.com/d/msgid/prometheus-users/771be8ec-e37a-490b-bcf2-01de2cea591en%40googlegroups.com
>>>>  
>>>> <https://groups.google.com/d/msgid/prometheus-users/771be8ec-e37a-490b-bcf2-01de2cea591en%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Prometheus Users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected].
>>
> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/prometheus-users/9fcfe778-6add-482f-b160-1bc4903ffa6en%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/prometheus-users/9fcfe778-6add-482f-b160-1bc4903ffa6en%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/4359b12b-3b7a-49b8-8e45-c2bd46d3f543n%40googlegroups.com.

Re: [prometheus-users] Seeking advice on label values for Prometheus Metrics in Kubernetes

Reply via email to